Page History

Excerpt
Looks up matches to regular expressions in a dictionary across mutiple multiple tokens and then tags the match with one or more semantic tags as an alternative representation(s). For a simple regex expression that , a match only needs to match occur against a singe token. Simple Regex Stage is recomendedrecommended.

Operates On: Lexical Items with TOKEN flag

Note
All possibilities are tagged , (including overlaps and sub-patterns, ) with the expectation that later disambiguation stages will choose which tags are the correct interpretation.

Warning

This stage requires a lot of process processing time, please . Please follow this these recommendations:

keep Keep the amount at a minimum the amount to regex patterns.
try Try to use non greedy regex.
set Set the maxLength to the bare minimum necessary for the expected matches.

Include Page

	Generic Configuration Parameters
	Generic Configuration Parameters

...

patterns (string, required) - The resource which that contains the pattern database.
- See below for the format.
maxLength (integer, optional) - The max length of text to test for regex, default . The default is 25 characters.
- For each token, the stage will increase the size by adding tokens before and after, until a match (or the 25 character limit) is reachreached.

caseInsensitive (boolean, optional) - If true, all regex will be process as case insensitive (default = true)

...

In the following example, "What's your name" is in the dictionary as a regex for self-name, and there are also regex for number numbers "[0-9]+" and "[0-9]+\\.[0-9]+" :

Code Block

language	text
theme	FadeToGrey

 V--------------------------------------[What's your name 12 @#$ 25 63.3]---------------------------------------V  
  ^-----[What's]-----V--[your]--V--[name]--V-----[12]-----V--[@#$]--V-----[25]-----V-----------[63.3]------------^  
  ^--[What]--V--[s]--^                     ^--[{number}]--^         ^--[{number}]--^-----[63]-----V-----[3]------^  
  ^-----[what's]-----^                                                             ^---------[{number}]----------^  
  ^--[what]--^                                                                     ^--[{number}]--^--[{number}]--^  
  ^-------------[{self-name}]--------------^

Output Flags

Lex-Item Flags

...

SEMANTIC_TAG - Identifies all lexical items which that are semantic tags.
PROCESSED - Placed on all the tokens which composed composing the semantic tag.

Resource Data

The regex pattern must have an a "pattern dictionary" (a string to JSON map) which is a list of JSON records, indexed by entity ID. In addition, there may also be a pattern map and a token index.

Pattern (Regex) Dictionary Format

The only required file which is absolutely required is the pattern dictionary. It is a series of JSON records, typically indexed by entity ID.

...

Multiple patterns can have the same entry.
Additional fielded data can be added to the record.
- As needed by downstream processes.

Fields

id (required, string) - Identifies the entity by unique ID. This identifier must be unique across all entries (across all dictionaries).
- Typically, this is an identifier with has meaning to the larger application which that is using the Language Processing Toolkit.
tags (required, array of string) - The list of semantic tags which will be added to add to the interpretation graph whenever any of the patterns are matched.
- These will all be added to the interpretation graph with the SEMANTIC_TAG flag.
patterns (required, array of string) - A list of patterns to match in the content.
splitMatch (optional, boolean) - Indicates if whether or not the partialmatch will create a regex tag even if a full match was not met.
confidence (optional, float) - Specifies the confidence level of the entity, independent of any patterns matched.
- This is the confidence of the entry, in comparison to all of the other entries. Essentially, the likelihood that this entry , will be randomly encountered.

...

display (optional, string) - What to show the user when browsing this the entity.
context (optional, object) - A context vector which that can help disambiguate this the entity from others with the same pattern.
- Format TBD, but probably a list of weighted words, phrases and tags.

...

Page tree

Versions Compared

Old Version 18

New Version 19

Key

Output Flags

Lex-Item Flags

SEMANTIC_TAG - Identifies all lexical items which that are semantic tags.
PROCESSED - Placed on all the tokens which composed composing the semantic tag.

Resource Data

Pattern (Regex) Dictionary Format

Fields

Page tree

Page History

Versions Compared

Old Version 18

New Version 19

Key

Output Flags

Lex-Item Flags

SEMANTIC_TAG - Identifies all lexical items which that are semantic tags.PROCESSED - Placed on all the tokens which composed composing the semantic tag.

Resource Data

Pattern (Regex) Dictionary Format

Fields

SEMANTIC_TAG - Identifies all lexical items which that are semantic tags.
PROCESSED - Placed on all the tokens which composed composing the semantic tag.