Stages can be linked together into language processing pipelines which process text and create interpretation graphs. See Pipelines Introduction for more details.

Built-in Stages

This stages are contained inside the Saga Core library and available in all time

Text Block Readers

Readers read text streams and create text blocks to process.

SimpleReader

Tokenizers

Tokenizers read text blocks and divide them up into individual tokens to be processed.

WhitespaceTokenizer - Divides text blocks into tokens based on white space.

Splitters

Splitters split up tokens into multiple smaller tokens as an alternative interpretation.

CharacterSplitter - Tokens are split when any in a specified set of characters (typically punctuation) is encountered.
CharChangeSplitter - Tokens are split when any difference between caharaters is encountered.

Include Page

	Built-in Stages
	Built-in Stages

Add-on Stages

Include Page

	Add-on Stages
	Add-on Stages

Normalizers

Normalizers create alternative normalized interpretations of tokens from original tokens.

CaseAnalysis - Analyzes and flags the case of tokens and then (optionally) normalizes the token to lower case.

Recognizers

Recognizers identify and flag tokens based on their character patterns.

NumberRecognizer - Identifies tokens which look like numbers and flags them with the "NUMBER" flag.

Taggers

Taggers create semantic tags which are added to the interpretation graph as alternative interpretations.

DictionaryTagger - Looks up all combinations of tokens in a dictionary and tags any that are found.
AdvancedPattern - Matches recursive pattern combinations of tokens and semantic tags.

Add-on Stages

This stages are external libraries to the Saga Core library, and needs to be added as dependencies of your application.

Spell Checkers

Spell checkers process specific tokens identifying misspells and adding alternatives to the interpretation graph

SpellingAlternatives -

Page tree

Versions Compared

Old Version 13

New Version 14

Key

Built-in Stages

Text Block Readers

Tokenizers

Splitters

Add-on Stages

Normalizers

Recognizers

Taggers

Add-on Stages

Spell Checkers

Page tree

Page History

Versions Compared

Old Version 13

New Version 14

Key

Built-in Stages

Text Block Readers

Tokenizers

Splitters

Add-on Stages

Normalizers

Recognizers

Taggers

Add-on Stages

Spell Checkers