Page History

Excerpt
Looks up matches to regular expressions in a dictionary across multiple tokens and then tags the match with one or more semantic tags as an alternative representation. For a simple regex expression , where a match only needs to occur against a singe token. , the Simple Regex Stage is recommended.

Operates On: Lexical Items with TOKEN flagand possibly other flags as specified below.

Saga_is_recognizer

Note
All possibilities are tagged (including overlaps and sub-patterns) with the expectation that later disambiguation stages will choose which tags are the correct interpretation.

...

Include Page

	Generic Configuration Parameters
	Generic Configuration Parameters

Configuration Parameters

patterns (string, required) -
Parameter
summary
The resource that contains the pattern database.
name patterns
- See below for the format.
maxLength (integer, optional) -
Parameter
summary
The max length of text to test for regex.
The default
The default is 25 characters.
default 25
name maxLength
type integer
- For each token, the stage will increase the size by adding tokens before and after, until a match (or the 25 character limit) is reached.
caseInsensitive (boolean, optional) -
Parameter
summary
If true, all regex will be process as case insensitive
(
.
default
= true)
true
name caseInsensitive
type boolean

Code Block

language	js
theme	Eclipse
title	Example Configuration

{
 "type":"RegexPattern",
 "patterns":"regex-provider:patterns",
 "maxLength": 25,
 "caseInsensitive": true
}

Example Output

In the following example, "What's your name" is in the dictionary as a regex for self-name, and there are also regex for numbers "[0-9]+" and "[0-9]+\\.[0-9]+" :

Code Block

language	text	theme	FadeToGrey

 V--------------------------------------[What's your name 12 @#$ 25 63.3]---------------------------------------V  
  ^-----[What's]-----V--[your]--V--[name]--V-----[12]-----V--[@#$]--V-----[25]-----V-----------[63.3]------------^  
  ^--[What]--V--[s]--^                     ^--[{number}]--^         ^--[{number}]--^-----[63]-----V-----[3]------^  
  ^-----[what's]-----^                                                             ^---------[{number}]----------^  
  ^--[what]--^                                                                     ^--[{number}]--^--[{number}]--^  
  ^-------------[{self-name}]--------------^

Output Flags

Lex-Item Flags:

SEMANTIC_TAG - Identifies all lexical items that are semantic tags.PROCESSED - Placed on all tokens composing the semantic tag.

Vertex Flags:

Info
No vertices are created in this stage

Resource Data

The regex pattern must have a "pattern dictionary" (a string to JSON map) which is a list of JSON records, indexed by entity ID. In addition, there may also be a pattern map and a token index.

...

Multiple patterns can have the same entry.
Additional fielded data can be added to the record.
- As needed by downstream processes.

Fields

id (required, string) -
Parameter
summary
Identifies the entity by unique ID. This identifier must be unique across all entries (across all dictionaries).
name id
required true
- Typically, this identifier has meaning to the larger application that is using the Language Processing Toolkit.Saga.
tags (required, array of string) -
Parameter
summary
The list of semantic tags to add to the interpretation graph whenever any of the patterns are matched.
name tags
type string array
required true
- These will all be added to the interpretation graph with the SEMANTIC_TAG flag.
patterns (required, array of string) -
Parameter
summary A list of patterns to match in the content.
name patterns
type string array
required true
splitMatch (optional, boolean) -
Parameter
summary
Indicates whether or not the
partialmatch
partial match will create a regex tag even if a full match was not met.

default false
name splitMatch
type boolean
confidence (optional, float) -
Parameter
summary
Specifies the confidence level of
the
the entity, independent of any patterns matched.
name confidence
type double
- This is the confidence of the entry, in comparison to all of the other entries. Essentially, the likelihood that this entry will be randomly encountered.

Other, Optional Fields

display (optional, string) -
Parameter
summary
What to show the user when browsing the entity.
name display
context (optional, object) -
Parameter
summary
A context vector that can help disambiguate the entity from others with the same pattern.
name context
- Format TBD, but probably a list of weighted words, phrases and tags.

...

Page tree

Versions Compared

Old Version 19

New Version Current

Key