The language detector stage uses OpenNLP (https://opennlp.apache.org/) and its language detector model to identify the language of a text block.
Operates On: Lexical Items with TEXT_BLOCK flag.
Library: saga-lang-detector-stage
It can detect 103 languages outputting ISO 639-3 language codes. (https://opennlp.apache.org/news/model-langdetect-183.html)
The model works better with longer texts containing at least two sentences. It is important to configure this stage earlier in the pipeline and before tokenizing the text.
$action.getHelper().renderConfluenceMacro("$codeS$body$codeE")
As you can see, the first sentence is tagged with "LANG_ENG" and the second sentence with "LANG_SPA".
In this case, a sentence breaker stage was configured before the language detector stage. As a result, language identification can occur at the sentence level.
LANG_??? - Flags all text blocks where a language was identified.
Notice '???' at the end of the Flag. This is replaced by an ISO three letter language code.
For example, if Spanish is detected, the three letter code is SPA, and the Flag will be "LANG_SPA"
Vertex Flags
No vertices are created in this stage