Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Excerpt

Looks up matches to regular expressions in a dictionary within a single token and then tags the match with one or more semantic tags as an alternative representation(s).

Operates On:  Lexical Items with TOKEN flag

...

  • patterns (string, required) - The resource which contains containing the pattern database.
    • See below for the format.

...

Note

Notice that on the example for Regex Pattern Stage the "self-name" tag would have a potential match with "What's your name" but . However, the Simple Regex Stage does not lookup for matches beyond a single token (as the Regex Patter Stage woulddoes).

Output Flags

Lex-Item Flags

...

  • SEMANTIC_TAG - Identifies all lexical items which that are semantic tags.
  • PROCESSED - Placed on all the tokens which composed that compose the semantic tag.

Resource Data

...

Pattern (Regex) Dictionary Format

The only required file which is absolutely required is the pattern dictionary. It is a series of JSON records, typically indexed by entity ID.

...

  1. Multiple patterns can have the same entry.
  2. Additional fielded data can be added to the record.
    • As needed by downstream processes.

Fields

  • id (required, string) - Identifies the entity by unique ID. This identifier must be unique across all entries (across all dictionaries).
    • Typically, this is an identifier with meaning to the larger application which that is using the Language Processing Toolkit.
  • tags (required, array of string) - The list of semantic tags which that will be added to the interpretation graph whenever any of the patterns are matched.
    • These will all be added to the interpretation graph with the SEMANTIC_TAG flag.
  • patterns (required, array of string) - A list of patterns to match in the content.
  • splitMatch (optional, boolean) - Indicates if the partialmatch will create a regex tag even if a full match was not met. 
  • confidence (optional, float) - Specifies the confidence level of the entity, independent of any patterns matched.
    • This is the confidence of the entry, in comparison to all of the other entries. Essentially, the likelihood that this entry , will be encountered randomly encountered.

Other

...

Optional Fields

  • display (optional, string) - What to show the user when browsing this the entity.
  • context (optional, object) - A context vector which can help disambiguate this that helps disambiguate the entity from others with the same pattern.
    • Format TBD, but probably a list of weighted words, phrases and tags.

...