Stages can be linked together into language processing pipelines which that process text and create interpretation graphs. See Pipelines Introduction for and Pipeline Configuration for more details.
This stages are contained inside the Saga Core library and available in all time
Readers read text streams and create text blocks to process.
Tokenizers read text blocks and divide them up into individual tokens to be processed.
Splitters split up tokens into multiple smaller tokens as an alternative interpretation.
CharacterSplitter - Tokens are split when any in a specified set of characters (typically punctuation) is encountered.
Normalizers create alternative normalized interpretations of tokens from original tokens.
Include Page | ||||
---|---|---|---|---|
|
Include Page | ||||
---|---|---|---|---|
|
Recognizers identify and flag tokens based on their character patterns.
Taggers create semantic tags which are added to the interpretation graph as alternative interpretations.
This stages are external libraries to the Saga Core library, and needs to be added as dependencies of your application.
Spell checkers process specific tokens identifying misspells and adding alternatives to the interpretation graph
SpellingAlternatives -