...
Operates On: Lexical Items with TOKEN
Generic Configuration Parameters
- boundaryFlags (string, optional)
- The tokens to process must be inside two vertex mark with this flags (e.g ["TEXT_BLOCK_SPLIT"])
- skipFlags (string array, optional) - Flags to be skipped by this stage
- Tokens marked with this flags will be ignore by this stage, and no process will be performed.
- requiredFlags (string array, optional)
- Tokens need to have all the specified flags, in order to be processed.
- atLeastOneFlag (string array, optional)
- Tokens will need at least one of the flags specify in this array.
- debug (boolean, optional)
- Enable all debug log functionality of the stage, if any.
Configuration Parameters
- splitChars (string, optional) - List of characters which should be used to split tokens.
- If not present, then tokens are split on any sequence of punctuation.
- dontSplitChars (string, optional) - List of characters which will NOT be used to split tokens.
- This is typically used to identify exceptions (characters which are not used to split tokens) when splitChars is missing.
- These characters are included in the produced tokens.
- splitFlag (string, optional) - The flag to be put on the vertex between the two tokens.
- If missing, defaults to ALL_PUNCTUATION.
- skipFlags (string array, optional) - Flags to be skipped by this stage
- Tokens marked with this flags will be ignore by this stage, and no process will be performed.
requiredFlags (string array, optional)
- Tokens need to have all the specified flags, in order to be processed
- debug (boolean, optional)
- Enable all debug log functionality of the stage, if any.
Examples
Code Block |
---|
language | js |
---|
theme | Eclipse |
---|
title | Example Configuration 1 |
---|
|
{
"type":"CharacterSplitter",
"dontSplitChars":"."
} |
...