Creates N-Grams of TOKEN flagged lexical items. The size of the N-Gram can be specified by minimum and maximum settings. It will also break the N-Grams on SPLIT_FLAGS
Min N-Gram size( type=integer
| default=2
| required
)
- Minimum size of tokens for an N-Gram
Max- N-Gram size( type=integer
| default=2
| required
)
- Maximum size of tokens for an N-Gram
Split Flags( type=string array
| default=ALL_PUNCTUATION, ALL_DIGITS, HAS_PUNTUATION, HAS_DIGIT
| optional
)
- Split the N-Gram if any of the next token has one of this flags. (i.e. if a token has any of this flags, stops the building of the N-Gram)
Ignore On Boundary Flags( type=string
| optional
)
- Specifies flags for token that will not be added to an N-Gram
General Settings
The general settings can be accessed by clicking on
Enable -Enables the processor to be used in the pipeline.
Skip Flags ( optional ) - Lexical items flags to be ignored by this processor.
Boundary Flags ( optional ) - List of vertex flags that indicate the beginning and end of a text block.
Required Flags ( optional ) - Lexical items flags required by every token to be processed.
At Least One Flags ( optional ) - List of lexical item flags where at least one of them needs to be present to be processed.
Don't Process Flags ( optional ) - List of lexical items flags that are not processed. The difference with "Skip Flags" is that this will drop the path in the Saga graph, skip just skips the token and continues in the same path.
Confidence Adjustment - Adjustment factor to apply to the confidence value of 0.0 to 2.0 from (Applies for every match).