Page History

...

Include Page

	Generic Configuration Parameters
	Generic Configuration Parameters

Configuration Parameters

patterns (string, required) -
Parameter
summary
The resource that contains the pattern database.
name patterns
- See below for the format.
maxLength (integer, optional) -
Parameter
summary
The max length of text to test for regex.
The default
The default is 25 characters.
default 25
name maxLength
type integer
- For each token, the stage will increase the size by adding tokens before and after, until a match (or the 25 character limit) is reached.
caseInsensitive (boolean, optional) -
Parameter
summary
If true, all regex will be process as case insensitive
(
.
default
= true)
true
name caseInsensitive
type boolean

Code Block

{
 "type":"RegexPattern",
 "patterns":"regex-provider:patterns",
 "maxLength": 25,
 "caseInsensitive": true
}

...

Multiple patterns can have the same entry.
Additional fielded data can be added to the record.
- As needed by downstream processes.

id (required, string) -
Parameter
summary
Identifies the entity by unique ID. This identifier must be unique across all entries (across all dictionaries).
name id
required true
- Typically, this identifier has meaning to the larger application that is using the Language Processing Toolkit.
tags (required, array of string) -
Parameter
summary
The list of semantic tags to add to the interpretation graph whenever any of the patterns are matched.
name tags
type string array
required true
- These will all be added to the interpretation graph with the SEMANTIC_TAG flag.
patterns (required, array of string) -
Parameter
summary A list of patterns to match in the content.
name patterns
type string array
required true
splitMatch (optional, boolean) -
Parameter
summary
Indicates whether or not the partialmatch will create a regex tag even if a full match was not met.

default false
name splitMatch
type boolean
confidence (optional, float) -
Parameter
summary
Specifies the confidence level of
the
the entity, independent of any patterns matched.
name confidence
type double
- This is the confidence of the entry, in comparison to all of the other entries. Essentially, the likelihood that this entry will be randomly encountered.

display (optional, string) -
Parameter
summary
What to show the user when browsing the entity.
name display
context (optional, object) -
Parameter
summary
A context vector that can help disambiguate the entity from others with the same pattern.
name context
- Format TBD, but probably a list of weighted words, phrases and tags.

...