This stage works in a similar way to the Dictionary Tagger stage in the sense that looks up sequences of tokens in a dictionary to match the text being processed. The difference is that it will also include in the matching text N tokens to the right and/or left of the original matched text.
Operates On: Lexical Items with TOKEN and possibly other flags as specified below.
"dictionary": "dict-provider:token_matcher_patterns", "groupTokens": false, "leftGroupTagName": "_leftTokens_", "matchGroupTagName": "_matchedTokens_", "rightGroupTagName": "_rightTokens_"
V-----------------[call 333-4444 for pizza]------------------V ^------[call]------V-----[333-4444]------V-[for]-V--[pizza]--^ ^-[{_leftTokens_}]-^---[{phonenumber}]---^-[{_rightTokens_}]-^ ^-[{_matchedTokens_}]-^ ^----------------------[{pizza_phone}]-----------------------^
No vertices are created in this stage
Token matcher must have a dictionary of patterns which is a list of JSON records.
"_id" : "Nlp3kXYBOOvNPbzJXDcQ", "tag": "qVRZi3YBvyJs83wnkfik", "pattern": "{phonenumber}", "confAdjust": 0.95, "options" : { "maxTokensLeft": 1, "maxTokensRight": 1 }
These will all be added to the interpretation graph with the SEMANTIC_TAG flag.
Tags are hierarchical representations of the same intent. For example, {city} → {administrative-area} → {geographical-area}
pattern ( type=string | required ) - Pattern to match in the content
_id ( type=string | required ) - Identifies the entity by unique ID. This identifier must be unique across all entries (across all dictionaries).
confAdjust ( type=boolean | required ) - Adjustment factor to apply to the confidence value of 0.0 to 2.0