Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Operates On:  Lexical Items with TOKEN

Generic Configuration Parameters

  • boundaryFlags (string, optional) 
    • The tokens to process must be inside two vertex mark with this flags (e.g ["TEXT_BLOCK_SPLIT"])
  • skipFlags (string array, optional) - Flags to be skipped by this stage
    • Tokens marked with this flags will be ignore by this stage, and no process will be performed.
  • requiredFlags (string array, optional)
    • Tokens need to have all the specified flags, in order to be processed.
  • atLeastOneFlag (string array, optional)
    • Tokens will need at least one of the flags specify in this array.
  • debug (boolean, optional)
    • Enable all debug log functionality of the stage, if any.

Configuration Parameters

  • splitChars (string, optional) - List of characters which should be used to split tokens.
    • If not present, then tokens are split on any sequence of punctuation. 
  • dontSplitChars (string, optional) - List of characters which will NOT be used to split tokens.
    • This is typically used to identify exceptions (characters which are not used to split tokens) when splitChars is missing.
    • These characters are included in the produced tokens.
  • splitFlag (string, optional) - The flag to be put on the vertex between the two tokens.
    • If missing, defaults to ALL_PUNCTUATION.
  • skipFlags (string array, optional) - Flags to be skipped by this stage
    • Tokens marked with this flags will be ignore by this stage, and no process will be performed.
  • requiredFlags (string array, optional)

    • Tokens need to have all the specified flags, in order to be processed
  • debug (boolean, optional)
    • Enable all debug log functionality of the stage, if any.

Examples

Code Block
languagejs
themeEclipse
titleExample Configuration 1
{
 "type":"CharacterSplitter",
 "dontSplitChars":"."
}

...