Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Operates On:  Lexical Items with TOKEN

...

Include Page
Generic Configuration Parameters

...

  • The tokens to process must be inside two vertex mark with this flags (e.g ["TEXT_BLOCK_SPLIT"])

...

  • Tokens marked with this flags will be ignore by this stage, and no process will be performed.

...

  • Tokens need to have all the specified flags, in order to be processed.

...

  • Tokens will need at least one of the flags specify in this array.

Generic Configuration Parameters

Configuration Parameters

  • splitChars (string, optional) - List of characters which should be used to split tokens.
    • If not present, then tokens are split on any sequence of punctuation. 
  • dontSplitChars (string, optional) - List of characters which will NOT be used to split tokens.
    • This is typically used to identify exceptions (characters which are not used to split tokens) when splitChars is missing.
    • These characters are included in the produced tokens.
  • splitFlag (string, optional) - The flag to be put on the vertex between the two tokens.
    • If missing, defaults to ALL_PUNCTUATION.

...