You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

This stages are contained inside the Saga Core library and available in all time

Text Block Readers

Readers read text streams and create text blocks to process.

Tokenizers

Tokenizers read text blocks and divide them up into individual tokens to be processed.

Splitters

Splitters split up tokens into multiple smaller tokens as an alternative interpretation.

  • CharacterSplitter - Tokens are split when any in a specified set of characters (typically punctuation) is encountered.

  • CharChangeSplitter - Tokens are split when any difference between caharaters is encountered.

Normalizers

Normalizers create alternative normalized interpretations of tokens from original tokens.

  • CaseAnalysis - Analyzes and flags the case of tokens and then (optionally) normalizes the token to lower case.

Recognizers

Recognizers identify and flag tokens based on their character patterns.

  • NumberRecognizer - Identifies tokens which look like numbers and flags them with the "NUMBER" flag.

Taggers

Taggers create semantic tags which are added to the interpretation graph as alternative interpretations.

  • DictionaryTagger - Looks up all combinations of tokens in a dictionary and tags any that are found.
  • AdvancedPattern

    Error rendering macro 'excerpt-include'

    No link could be created for 'AdvancedPattern Stage'.

  • No labels