Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Excerpt

Creates n-grams from TOKEN lex items of size min-max. Breaks n-grams on SPLIT_FLAGS


Operates On:  Lexical Items with TOKEN and possibly other flags as specified below.

Saga_is_recognizer
Recognizerfalse

Include Page
Generic Configuration Parameters
Generic Configuration Parameters

Configuration Parameters

  • Parameter
    summaryMinimum number of tokens for ngram
    default2
    namemin
    typeinteger
  • Parameter
    summaryMaximum of tokens for ngram
    default2
    namemax
    typeinteger
  • Parameter
    summarySplit the N-Gram if any of the next token has one of this flags. (i.e. if a token has any of this flags, stops the building of the N-Gram)
    namesplitFlags
    typestring array
  • Parameter
    summarySpecifies flags for token that will not be added to an N-Gram
    defaultcheese
    nameignoreOnBoundaryFlags
    typestring array


Saga_config_stage
boundaryFlagstext block split
"min": 2,
"max": 3,
"splitFlags": [],
"ignoreOnBoundaryFlags": [] 

Example Output

Saga_graph
V--------------[abraham lincoln likes macaroni and cheese]--------------------V
V 
 ^--[abraham]--V--[lincoln]--V--[likes]--V--[macaroni]--V--[and]--V--[cheese]--^ 
 ^---[abraham             lincoln]---^---[likes macaroni]---^---{place}[and cheese]---^ 
             ^----{food}-[lincoln likes]---^---[macaroni and]---^            
 ^---{food}---^
^[abraham lincoln likes]-----^------{person}----[macaroni and cheese]-----^ 
                         ^-----[likes macaroni and]-----^            
             ^-------{food}---------[lincoln likes macaroni]-----^  

Output Flags

Lex-Item Flags:

  • SEMANTIC_TAG - Identifies all lexical items which are semantic tags.
  • ALL_LOWER_CASE - All of the characters in the token are lower-case characters.
  • ALL_UPPER_CASE - All of the characters in the token are upper-case characters (for example, acronyms).
  • MIXED_CASE - Handles any mixed upper & lower case scenario not covered above.
  • TOKEN - All tokens produced are tagged as TOKEN 
  • NGRAM - All tokens produced are tagged as NGRAM

Vertex Flags:

Info

No vertices are created in this stage