Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Excerpt

Creates n-grams from TOKEN lex items of size min-max. Breaks n-grams on SPLIT_FLAGS


Operates On:  Lexical Items with TOKEN and possibly other flags as specified below.

Saga_is_recognizer
Recognizerfalse

Include Page
Generic Configuration Parameters
Generic Configuration Parameters

Configuration Parameters

  • Parameter
    summaryMinimum number of tokens for ngram
    default2
    namemin
    typeinteger
  • Parameter
    summaryMaximum of tokens for ngram
    default2
    namemax
    typeinteger
  • Parameter
    summarySplit the N-Gram if any of the next token has one of this flags. (i.e. if a token has any of this flags, stops the building of the N-Gram)
    namesplitFlags
    typestring array
  • Parameter
    summarySpecifies flags for token that will not be added to an N-Gram
    defaultcheese
    nameignoreOnBoundaryFlags
    typestring array


Saga_config_stagecode
boundaryFlagstext block split
languagejs
"min": 2,
"max": 3,
"splitFlags": [],
"ignoreOnBoundaryFlags": [] 

Example Output

saga_graph
Code Block
languagetext
V--------------[abraham lincoln likes macaroni and cheese]--------------------V 
^--[abraham]--V--[lincoln]--V--[likes]--V--[macaroni]--V--[and]--V--[cheese]--^ 
^---[abraham              lincoln]---^---[likes macaroni]---^---{place}[and cheese]---^ 
            ^---[lincoln likes]---^-{food}---[macaroni and]---^            
^---{food}---^
^[abraham lincoln likes]-----^------{person}-[macaroni and cheese]--------^^ 
                        ^-----[likes macaroni and]-----^            
            ^-------{food}---[lincoln likes macaroni]-----------^^  

Output Flags

Image Added

Lex-Item Flags:

  • SEMANTIC_TAG - Identifies all lexical items which are semantic tags.
  • ALL_LOWER_CASE - All of the characters in the token are lower-case characters.
  • ALL_UPPER_CASE - All of the characters in the token are upper-case characters (for example, acronyms).
  • MIXED_CASE - Handles any mixed upper & lower case scenario not covered above.
  • TOKEN - All tokens produced are tagged as TOKEN 
  • NGRAM - All tokens produced are tagged as NGRAM

Vertex Flags:

Info

No vertices are created in this stage