...
It is important to note that the model works better with longer texts that have at least 2 sentences. So it is important to configure this stage earlier in the pipeline and before tokenizing the text.
...
As you can see the first sentence is tagged with "LANG_ENG" and the second sentence with "LANG_SPA". For these this case a sentence breaker stage was configured before the language detector stage. This way language identification could happen at sentence level.
...
...