Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Readers read text streams and create text blocks to process.

  • SimpleReader - 
    Excerpt Include
    SimpleReader Stage
    SimpleReader Stage
    nopaneltrue

Tokenizers

Tokenizers read text blocks and divide them up into individual tokens to be processed.

  • WhitespaceTokenizer - Divides text blocks into tokens based on white space.
    Excerpt Include
    WhitespaceTokenizer Stage
    WhitespaceTokenizer Stage
    nopaneltrue

Splitters

Splitters split up tokens into multiple smaller tokens as an alternative interpretation.

  • CharacterSplitter - Tokens are split when any in a specified set of characters (typically punctuation) is encountered.

    CharChangeSplitter - Tokens are split when any difference between caharaters is encountered.

     

    Excerpt Include
    CharacterSplitter Stage
    CharacterSplitter Stage
    nopaneltrue

  • CharChangeSplitter 
    Excerpt Include
    CharChangeSplitter Stage
    CharChangeSplitter Stage
    nopaneltrue

Normalizers

Normalizers create alternative normalized interpretations of tokens from original tokens.

  • CaseAnalysis - Analyzes and flags the case of tokens and then (optionally) normalizes the token to lower case. 
    Excerpt Include
    CaseAnalysis Stage
    CaseAnalysis Stage
    nopaneltrue

Recognizers

Recognizers identify and flag tokens based on their character patterns.

  • NumberRecognizer - - Identifies tokens which look like numbers and flags them with the "NUMBER" flag.
    Excerpt Include
    NumberRecognizer Stage
    NumberRecognizer Stage
    nopaneltrue

Taggers

Taggers create semantic tags which are added to the interpretation graph as alternative interpretations.

  • RegexPattern
    Excerpt Include
    Regex Pattern Stage
    Regex Pattern Stage
    nopaneltrue
  • DictionaryTagger - Looks up all combinations of tokens in a dictionary and tags any that are found.
    Excerpt Include
    DictionaryTagger Stage
    DictionaryTagger Stage
    nopaneltrue
  • AdvancedPattern
    Excerpt Include
    AdvancedPattern Stage
    AdvancedPattern Stage
    nopaneltrue