Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Excerpt

Looks up matches to regular expressions in a dictionary within a single token and then tags the match with one or more semantic tags as an alternative representation.


Operates On:  Lexical Items with TOKEN

...

and possibly other flags as specified below.

Saga_is_recognizer

Note

All possibilities are tagged, including overlaps and sub-patterns, with the expectation that later disambiguation stages will choose which tags are the correct interpretation.

Include Page
Generic Configuration Parameters
Generic Configuration Parameters

Configuration Parameters

  • patterns (string, required) -
    Parameter
    summary
    The resource containing the pattern database
    .
    • See below for the format.
    namepatterns
    requiredtrue
    • For the format see below.


Saga_config_stage
requiredFlagstoken
Code Block
languagejs
themeEclipse
titleExample Configuration
{
 "type":"SimpleRegex",
 "patterns":"regex-provider:patterns"
}

Example Output

In the following example, "number" is in the dictionary as a regex for using "[0-9]+" and "[0-9]+\\.[0-9]+" :

Saga_graph
Code Block
languagetext
themeFadeToGrey
 V--------------------------------------[What's your name 12 @#$ 25 63.3]---------------------------------------V  
  ^-----[What's]-----V--[your]--V--[name]--V-----[12]-----V--[@#$]--V-----[25]-----V-----------[63.3]------------^  
  ^--[What]--V--[s]--^                     ^--[{number}]--^         ^--[{number}]--^-----[63]-----V-----[3]------^  
  ^-----[what's]-----^                                                             ^---------[{number}]----------^  
  ^--[what]--^                                                                     ^--[{number}]--^--[{number}]--^                                        
Note

Notice that on the example for Regex Pattern Stage the "self-name" tag would have a potential match with "What's your name". However, the Simple Regex Stage does not lookup for matches beyond a single token (as the Regex Patter Stage does).

Output Flags

Lex-Item Flags:

  • SEMANTIC_TAG - Identifies all lexical items that are semantic tags.PROCESSED - Placed on all tokens that compose the semantic tag.

Vertex Flags:

Info

No vertices are created in this stage

Resource Data

The regex pattern must have an "pattern dictionary" (a string to JSON map) which is a list of JSON records, indexed by entity ID. In addition, there may also be a pattern map and a token index.

...

Each JSON record represents an entity. The format is as follows:

Saga_json
Code Block
languagejs
themeEclipse
titleEntity JSON Format
{
    "_id" : "ca84KGAAJGsBemSwA0nZTLXA",
    "tagstag" : [ 
        "number"
    ],
    "patternspattern" : [ 
        "[0-9]+", 
        "[0-9]+\\.[0-9]+",
"options" :   ],{
    "confidencecaseInsensitive" : 0.95
	true,
  "literal" : false
},
	"caseInsensitiveconfAdjust": true
}0.95
. . . additional fields as needed go here . . .

Notes

  1. Multiple patterns can have the same entry.
  2. Additional fielded data can be added to the record.
    • As needed by downstream processes.

Fields

  • Parameter
    summaryIdentifies the entity by unique ID. This identifier must be unique across all entries (across all dictionaries).
    nameid
    requiredtrue
    • Typically, this is an identifier with meaning to the larger application that is using the Language Processing Toolkit.
    Tag which will identify any match in the graph, as an interpretation
    nametag
    Parameter
    summaryThe list of semantic tags that will be added to the interpretation graph whenever any of the patterns are matched.
    nametags
    typestring array
    requiredtrue

    • These will all be added to the interpretation graph with the SEMANTIC_TAG flag.

      Tip

      Tags are hierarchical representations of the same intent. For example, {city} → {administrative-area} → {geographical-area}

  • Parameter
    summary

    A list of patterns

    Pattern to match in the content

    .typestring array

    name

    patterns

    pattern

    requiredtrue

  • Options
    • Parameter
      summaryWhen this flag is specified then the input string that specifies the pattern is treated as a sequence of literal characters. Metacharacters or escape sequences in the input sequence will be given no special meaning.
      defaultfalse
      nameliteral
    • Parameter
      summarySet to true if the pattern is not case sensitive.
      defaulttrue
      namecaseInsensitive
      typeboolean
    parameter

Include Page

...

summarySpecifies the confidence level of the entity, independent of any patterns matched.
nameconfidence
typedouble
  • This is the confidence of the entry, in comparison to all of the other entries. Essentially, the likelihood that this entry will be encountered randomly.

Other Optional Fields

...

Generic Resource Fields
Generic Resource Fields

...