Excerpt |
---|
Looks up matches to regular expressions in a dictionary and then tags the match with one or more semantic tags as an alternative representation(s). |
Operates On: Lexical Items with TOKEN flag
Note |
---|
All possibilities are tagged, including overlaps and sub-patterns, with the expectation that later disambiguation stages will choose which tags are the correct interpretation. |
Include Page |
---|
| Generic Configuration Parameters |
---|
| Generic Configuration Parameters |
---|
|
Configuration Parameters
- patterns (string, required) - The resource which contains the pattern database
- See below for the format.
- maxLength (integer, optional) - The max length of text to test for regex, default is 25 characters.
- For each token, the stage will increase the size by adding tokens before and after, until a match or the 25 character limit is reach
- caseInsensitive (boolean, optional) - If true, all regex will be process as case insensitive (default = true)boundaryFlags (string, optional)
- The tokens to process must be inside two vertex mark with this flags (e.g ["TEXT_BLOCK_SPLIT"])
- skipFlags (string array, optional) - Flags to be skipped by this stage
- Tokens marked with this flags will be ignore by this stage, and no process will be performed.
- requiredFlags (string array, optional)
- Tokens need to have all the specified flags, in order to be processed
- debug (boolean, optional)Enable all debug log functionality of the stage, if any.
Code Block |
---|
language | js |
---|
theme | Eclipse |
---|
title | Example Configuration |
---|
|
{
"type":"RegexPatternStage",
"patterns":"regex-provider:patterns",
"maxLength": 25,
"caseInsensitive": true
} |
...