Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Excerpt

Looks up matches to regular expressions in a dictionary and then tags the match with one or more semantic tags as an alternative representation(s).

Operates On:  Lexical Items with TOKEN flag

Note

All possibilities are tagged, including overlaps and sub-patterns, with the expectation that later disambiguation stages will choose which tags are the correct interpretation.

Include Page
Generic Configuration Parameters
Generic Configuration Parameters

Configuration Parameters

  • patterns (string, required) - The resource which contains the pattern database
    • See below for the format.
  • maxLength (integer, optional) - The max length of text to test for regex, default is 25 characters.
    • For each token, the stage will increase the size by adding tokens before and after, until a match or the 25 character limit is reach
  • caseInsensitive (boolean, optional) - If true, all regex will be process as case insensitive (default = true)boundaryFlags (string, optional) 
    • The tokens to process must be inside two vertex mark with this flags (e.g ["TEXT_BLOCK_SPLIT"])
  • skipFlags (string array, optional) - Flags to be skipped by this stage
    • Tokens marked with this flags will be ignore by this stage, and no process will be performed.
  • requiredFlags (string array, optional)
    • Tokens need to have all the specified flags, in order to be processed
  • debug (boolean, optional)Enable all debug log functionality of the stage, if any.


Code Block
languagejs
themeEclipse
titleExample Configuration
{
 "type":"RegexPatternStage",
 "patterns":"regex-provider:patterns",
 "maxLength": 25,
 "caseInsensitive": true
}

...