Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Excerpt

Creates n-grams from TOKEN lex items of size min-max. Breaks n-grams on SPLIT_FLAGS


Operates On:  Lexical Items with TOKEN and possibly other flags as specified below.

Saga_is_recognizer
Recognizerfalse

Include Page
Generic Configuration Parameters
Generic Configuration Parameters

Configuration Parameters

  • Parameter
    summaryMinimum number of tokens for ngram
    default2
    namemin
    typeinteger
  • Parameter
    summaryMaximum of tokens for ngram
    default2
    namemax
    typeinteger
  • Parameter
    summaryIndicates additional flags of tokens to slip into ngramsSplit the N-Gram if any of the next token has one of this flags. (i.e. if a token has any of this flags, stops the building of the N-Gram)
    namesplitFlags
    typestring array
  • Parameter
    summarySpecifies flags for token that will not be break into ngramadded to an N-Gram
    defaultcheese
    nameignoreOnBoundaryFlags
    typestring array


Saga_config_stage
boundaryFlagstext block split
"min": 2,
"max": 3,
"splitFlags": [],
"ignoreOnBoundaryFlags": [] 

Example Output

Saga_graph
V--------------[abraham lincoln likes macaroni and cheese]--------------------V
^--[abraham]--V--[lincoln]--V--[likes]--V--[macaroni]--V--[and]--V--[cheese]--^
              ^---{place}---^           ^----{food}----^         ^---{food}---^
^----------{person}---------^           ^-----------------{food}--------------^

Output Flags

Lex-Item Flags:

  • SEMANTIC_TAG - Identifies all lexical items which are semantic tags.
  • ALL_LOWER_CASE - All of the characters in the token are lower-case characters.
  • ALL_UPPER_CASE - All of the characters in the token are upper-case characters (for example, acronyms).
  • MIXED_CASE - Handles any mixed upper & lower case scenario not covered above.
  • TOKEN - All tokens produced are tagged as TOKEN 
  • NGRAM - All tokens produced are tagged as NGRAM

Vertex Flags:

Info

No vertices are created in this stage