Excerpt |
---|
The language detector stage uses OpenNLP (https://opennlp.apache.org/) and its language detector model to identify the language of a text block. |
Operates On: Lexical Items with TEXT_BLOCK flag.
Tip |
---|
It can detect 103 languages outputting ISO 639-3 language codes. (https://opennlp.apache.org/news/model-langdetect-183.html) |
Note |
---|
It is important to note that the model works better with longer texts that have at least 2 sentences. So it is important to configure this stage earlier in the pipeline and before tokenizing the text |
...
. |
Library: saga-lang-detector-stage
...