Regex

Looks up matches to regular expressions in a dictionary across multiple tokens and then tags the match with one or more semantic tags as an alternative representation. For a simple regex expression, a match only needs to occur against a singe token. Simple Regex is recommended

Uses Regex Pattern Stage

This stage requires a lot of processing time. Please follow these recommendations:

Keep the amount at a minimum to regex patterns.
Try to use non greedy regex.
Set the maximum length to the bare minimum necessary for the expected matches.

Configuration

Max Length ( type=integer | default=25 | required ) - The maximum length of text to test for regex.
- For each token, the stage will increase the size by adding tokens before and after, until a match (or the 25 character limit) is reached.
Case Insensitive ( type=boolean | default=checked | optional ) - Indicates if the match to the regex can be case insensitive

Adding a Pattern

Click on the button to open the "Add new Pattern" dialog

Regex ( type=string | required ) - Regex patter to apply to the tokens
Options
- Split Match ( type=boolean | default=unchecked | required ) - Indicates the creation of a new tag, in case the regex gets a match with just a section of one or more tokens
- Case Insensitive ( type=boolean | default=unchecked | optional ) - Indicates if the match with the regex can be case insensitive
- Literal ( type=boolean | default=uncheck | optional ) - Indicates if the match to the regex must be a literal. (a better choice is use Entity Recognizer)
- Max Length ( type=integer | default=5 | required ) - The maximum length of text to test for regex.
  - For each token, the stage will increase the size by adding tokens before and after, until a match (or the 25 character limit) is reached.
Confidence Adjustment ( type=double | default=1 | required ) - Adjustment factor to apply to the confidence value of 0.0 to 2.0 from (Applies for every pattern match).
- 0.0 to < 1.0 decreases confidence value
- 1.0 confidence value remains the same
- > 1.0 to 2.0 increases confidence value

General Settings

The general settings can be accessed by clicking on

More settings could be displayed in the same dialog, it varies per recognizer.

Enable - Enable the processor to be use in pipelines.
Base Pipeline - Indicates the last stage, from a pipeline, needed by the recognizer.
Skip Flags ( optional ) - Lexical items flags to be ignored by this processor.
Boundary Flags ( optional ) - List of vertex flags that indicate the beginning and end of a text block.
Required Flags ( optional ) - Lexical items flags required by every token to be processed.
At Least One Flag ( optional ) - Lexical items flags needed by every token to be processed.
Don't Process Flags ( optional ) - List of lexical items flags that are not processed. The difference with "Skip Flags" is that this will drop the path in the Saga graph, skip just skips the token and continues in the same path.
Confidence Adjustment - Adjustment factor to apply to the confidence value of 0.0 to 2.0 from (Applies for every match).
- 0.0 to < 1.0 decreases confidence value
- 1.0 confidence value remains the same
- > 1.0 to 2.0 increases confidence value
Debug - Enable debug logging.

Page tree

Regex

Configuration

Adding a Pattern

General Settings