Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Excerpt

Splits tokens on specified characters, typically punctuation. Multiple split characters in a row will create a single split (not multiple splits).

Info

Uses Character Splitter Stage

Configuration

Image Added

  • Split Flag - The flag to be put on the vertex between the two tokens.
    • If missing, defaults to ALL_PUNCTUATION.

Tokenize Tokens

  • Token Delimiters - List of characters which should be used to split tokens
    • If not present, then tokens are split on any sequence of punctuation. 
  • Allowed Token Characters - List of characters which will NOT be used to split tokens.
    • This is typically used to identify exceptions (characters which are not used to split tokens) when Token Delimiters is missing.
    • These characters are included in the produced tokens.

Split Characters (Before/After)

  • Before character - if any character in this list occurs inside a token, that token will be split just before that character
  • After character - if any character in this list occurs inside a token, that token will be split just after that character

Split Characters

  • At start of token - true/false whether to split on all punctuation (default: true)
  • At end of token - true/false whether to split on all punctuation (default: true)
Image Removed


General Settings

Include Page
Generic Processor Config
Generic Processor Config