Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Excerpt

Matches advanced recursive patterns of tokens and semantic tags. Pattern databases can be very large (millions) of entries.

The new patterns can be formed by previously defined tags and text literals. For example, the pattern "{name} likes {product}" where "{name}" and "{product}" are tags and "likes" is the text literal. Since this stage allows for complex entity tagging, is also know as the "Advance Recognizer".

Operates On:  Lexical Items with either TOKEN or SEMANTIC_TAG.

Saga_is_recognizer

Include Page
Generic Configuration Parameters
Generic Configuration Parameters

Configuration Parameters

  • Parameter
    summaryThe resource which contains the pattern database
    namepatterns
    typestring array
    requiredtrue


Saga_config_stage
boundaryFlagstext block split
requiredFlagstoken, semantic tag
titleDefault Config
skipFlagsskip
"patterns":"saga_provider:advanced_patterns"

Example Output

The following shows sample output from the advanced pattern matcher, which has multiple patterns for the {product} and {person-product-preference} semantic tags.

Code Block
languagetext
themeFadeToGrey
 V--------------------[Abe Lincoln likes the iPhone-7]--------------------V 
 ^---[Abe]----V--[Lincoln]--V--[likes]--V--[the]--V------[iPhone-7]-------^ 
                                                  ^---[iPhone]----V--[7]--^
 ^---[abe]----^--[lincoln]--^                     ^---[iphone]----^
 ^--[{name}]--^--[{place}]--^           ^-----------[{product}]-----------^
                                        ^-------[{product}]-------^
 ^---------[{name}]---------^                     ^--[{product}]--^
 ^--------[{place}]---------^                     ^------[iphone-7]-------^
                                                  ^------[{product}]------^
 ^-----------------[{person-product-preference}]------------------^
 ^---------------------[{person-product-preference}]----------------------^

Output Flags

Lexical Item Flags

...

  • SEMANTIC_TAG - Identifies all lexical items which are semantic tags.
  • PROCESSED - Placed on all tokens that compose the semantic tag.

Vertex Flags

...

Info

No vertices are created in this stage.

Resource Data

The resource data

...

is a database of advanced patterns

...

and the resulting semantic

...

tags that they produce.

Resource Format

The pattern database is a series of JSON records, typically indexed by "pattern block ID".  Each JSON record represents a block of patterns (one or more) that all produce the same semantic tag.  The format is as follows:

Saga_json
TitleEntity Json Format
{
  "id":"Q28260",
  "tags":["{city}", "{administrative-area}", "{geography}"],
  "patterns":[
    "Lincoln", "Lincoln, Nebraska", "Lincoln, NE"
  ],
  "confidence":0.95
  
  . . . additional fields as needed go here . . . 
}
Note
  • Multiple entries can have the same pattern.

...

  • If the pattern is matched, then it will be tagged with multiple (ambiguous) entry IDs.
  • Additional fielded data can be added to the record

...

  • ; as needed by downstream processes.

Fields

  • Parameter
    summaryIdentifies the entry. This identifier may not be unique, and is not the id of the database, but an specific use case id
    nameid
    requiredtrue

    Typically, this is an identifier with meaning to the larger application that is using the Language Processing Toolkit.

  • Parameter
    summaryWhat to show the user when browsing this entity
    namedisplay
    requiredtrue


  • Parameter
    summaryTag which will identify any match in the graph, as an interpretation
    nametag
    requiredtrue

    These will all be added to the interpretation graph with the SEMANTIC_TAG flag.
    Tip

    Tags are hierarchical representations of the same intent. For example, {city} → {administrative-area} → {geographical-area}


  • Parameter
    summaryPattern to match in the content
    namepattern
    requiredtrue

    Note

    Currently, tokens are separated on simple white-space and punctuation, and then reduced to lowercase.

  • Parameter
    summarySpecifies the confidence level of the entry, independent of any patterns matched
    nameconfidence
    typeboolean
    requiredtrue

    This is the confidence of the entry, in comparison to all of the other entries. (Essentially, the likelihood that this entity will be randomly encountered.)

  • Parameter
    summaryAdditional information for the entry. At the moment this information is not expected
    defaultempty json
    namefields
    typejson


  • Parameter
    summaryDate in milliseconds of the last time the entry was updated
    nameupdatedAt
    typedate epoch
    requiredtrue


  • Parameter
    summaryDate in milliseconds of the creation time of the entry
    namecreatedAt
    typedate epoch
    requiredtrue