Sentence Breaker

This processor is used to split text blocks by punctuation. The processor uses Apaches OpenNLP Sentence Detector to identify punctuation character marks to define the end of a sentence.

This is a plugin processor. Uses Sentence Breaker Stage.

The processor includes 4 pre-trained models for specific languages:

English model
Dutch model
German model
Portuguese model

Configuration

Language - Language ISO code, on the UI is represented by the language name and can be selected from a drop down list.

General Settings

The general settings can be accessed by clicking on

Enable - Enables the processor to be used in the pipeline.
Skip Flags ( optional ) - Lexical items flags to be ignored by this processor.
Boundary Flags ( optional ) - List of vertex flags that indicate the beginning and end of a text block.
Required Flags ( optional ) - Lexical items flags required by every token to be processed.
At Least One Flags ( optional ) - List of lexical item flags where at least one of them needs to be present to be processed.
Don't Process Flags ( optional ) - List of lexical items flags that are not processed. The difference with "Skip Flags" is that this will drop the path in the Saga graph, skip just skips the token and continues in the same path.
Confidence Adjustment - Adjustment factor to apply to the confidence value of 0.0 to 2.0 from (Applies for every match).
- 0.0 to < 1.0 decreases confidence value
- 1.0 confidence value remains the same
- > 1.0 to 2.0 increases confidence value
Debug - Enable debug logging.

Page tree

Sentence Breaker

Configuration

General Settings