You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 11 Next »

This stages are contained inside the Saga Core library and available in all time

Text Block Readers

Readers read text streams and create text blocks to process.

  • SimpleReader - 

    Error rendering macro 'excerpt-include'

    No link could be created for 'SimpleReader Stage'.

Text Block Breakers

Breakers read text blocks and breaks them into individual text blocks.

  • QuotationBreaker - Breaks TEXT_BLOCK tokens into other TEXT_BLOCK tokens, separating the non quoted text from the quoted one. This breaker respects the grammatical rules of quotes.

Tokenizers

Tokenizers read text blocks and divide them up into individual tokens to be processed.

  • WhitespaceTokenizer - 

    Error rendering macro 'excerpt-include'

    No link could be created for 'WhitespaceTokenizer Stage'.

Splitters

Splitters split up tokens into multiple smaller tokens as an alternative interpretation.

  • CharacterSplitter - 

    Error rendering macro 'excerpt-include'

    No link could be created for 'CharacterSplitter Stage'.

  • CharChangeSplitter 

    Error rendering macro 'excerpt-include'

    No link could be created for 'CharChangeSplitter Stage'.

Normalizers

Normalizers create alternative normalized interpretations of tokens from original tokens.

  • CaseAnalysis - 

    Error rendering macro 'excerpt-include'

    No link could be created for 'CaseAnalysis Stage'.

Recognizers

Recognizers identify and flag tokens based on their character patterns.

  • NumberRecognizer - 

    Error rendering macro 'excerpt-include'

    No link could be created for 'NumberRecognizer Stage'.

  • StopWords -

    Error rendering macro 'excerpt-include'

    No link could be created for 'StopWords Stage'.

  • Lemmatize - Match tokens to words in a dictionary then creates new LexItems for the token lemma and/or synonyms if configured.

Taggers

Taggers create semantic tags which are added to the interpretation graph as alternative interpretations.

  • RegexPatternLooks up matches to regular expressions in a dictionary across multiple tokens and then tags the match with one or more semantic tags as an alternative representation. For a simple regex expression where a match only needs to occur against a singe token, the Simple Regex Stage is recommended.
  • DictionaryTagger

    Error rendering macro 'excerpt-include'

    No link could be created for 'DictionaryTagger Stage'.

  • AdvancedPattern

    Error rendering macro 'excerpt-include'

    No link could be created for 'AdvancedPattern Stage'.

  • No labels