Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Excerpt

Identifies patterns with a combination of any number of specified tokens, regardless of the surrounding tokens.


Operates On:  Lexical Items with TOKEN or SEMANTIC_TAG and other possible flags as specified below.

Include Page
Generic Configuration Parameters
Generic Configuration Parameters

Configuration Parameters

  • patterns (string, required) -
    Parameter
    summary
    The resource that contains the pattern database.
    • See below for the format.
    namepatterns
    requiredtrue
Saga_config_stage
boundaryFlagstext block split
stageFragmentation
requiredFlagstoken, semantic tag
skipFlagsskip
"patterns":"saga_provider:fragmented_patterns",
"maxRepeats": 5
Code Block
languagejs
themeEclipse
titleExample Configuration
{
 "type":"Fragmentation",
 "patterns":"fragmented-provider:patterns"
}

Example Output

Description

Code Block
languagetext
themeFadeToGrey
V--------------[abraham lincoln likes macaroni and cheese]--------------------V
^--[abraham]--V--[lincoln]--V--[likes]--V--[macaroni]--V--[and]--V--[cheese]--^
              ^---{place}---^           ^----{food}----^         ^---{food}---^
^----------{person}---------^           ^-----------------{food}--------------^

Output Flags

Lex-Item Flags

  • SEMANTIC_TAG - Identifies all lexical items that are semantic tags.
  • FRAGMENT- Identifies all lexical items that were created from a fragmentation pattern.
  • PROCESSED - Placed on all the tokens that compose the semantic tag.

Resource Data

The resource data is a database of fragmented patterns, and the resulting semantic tags they produce.

Resource Format

The only required file is the entity dictionary. It is a series of JSON records, typically indexed by entity ID.

Description of entity

Entity JSON

Format

Format code

Anchor

language

resourceFormat

js

resourceFormat

theme
Saga_json
Eclipse
TitletitleEntity JSON Json Format
{
  "idtag": "Q28260{city}",
  "tagspattern":[ "{city}(",how many"|"how much") {ingredient} ",
"confAdjust": 0.95
. . . additional fields as needed go here . . . 


Code Block
languagejs
themeEclipse
titleEntity JSON Format
"tag":["recipe"],
"pattern": "{number} {ingredient}",
{administrative-area}", "{geography}"],
  "patterns":[
    "Lincoln", "Lincoln, Nebraska", "Lincoln, NE"
  ],
  "options": {
	  "minTokens": 3,
    "maxTokens": 62,
    "combination": true
  }
  "confidenceconfAdjust":0.95
  
  . . . additional fields as needed go here . . . 
}


Fields

Notes

  1. Multiple entities can have the same pattern.
    • If the pattern is matched, then it will be tagged with multiple (ambiguous) entity IDs.
  2. Additional fielded data can be added to the record.
    • As needed by downstream processes.

Fields

  • Parameter
    summaryTag which will identify any match in the graph, as an interpretation
    nametag
    requiredtrue
    • These will all

  • id (required, string) - Identifies the entity by unique ID. This identifier must be unique across all entities (across all dictionaries).
    • Typically, this is an identifier with meaning to the larger application that uses the Language Processing Toolkit.
  • tags (required, array of string) - The list of semantic tags to be added to the interpretation graph whenever any of the patterns are matched.
    • These will

      be added to the interpretation graph with the SEMANTIC_TAG flag.

      Tip

      Tags

      Typically, multiple tags

      are hierarchical representations of the same intent. For example, {city} → {administrative-area} → {geographical-area}

  • Parameter
    summaryPattern

    patterns (required, array of string) - A list of patterns

    to match in the content

    .
    • Patterns will be tokenized and multiple variations may match.
      NOTE:  Currenty, tokens are separated on simple white-space and punctuation, and then reduced to lowercase.
      TODO:  This will need to be improved in the future, perhaps by specifying a pipeline to perform the tokenization and to allow for multiple variations.

    namepattern
    requiredtrue

  • Parameter
    summaryObject
    options (optional, JSON Object) - Object
    with options applicable for this entity
    nameoptions
    typejson
    • minTokens (optional, int) -
      Parameter
      summary
      Minimum number of tokens the match must contains to be valid. The default is the number of tokens contained in each pattern.
      maxTokens (optional, int) -
      nameminTokens
      typeinteger
    • Parameter
      summaryMaximum number of tokens
      the match
      the match must contain to be valid. The
      default is
      default is the number of tokens contained in
      each pattern. 
      each pattern
      namemaxTokens
      typeinteger
    • Parameter
      summaryndicates
      combination (optional, boolean) - Indicates
      if the given tokens can be matched in any order as long as all appear in the match. If false, the tokens must be in the order provided.
  • confidence (optional, float) - Specifies the confidence level of the entity, independent of any patterns matched.
    • This is the confidence of the entity, in comparison to all of the other entities. Essentially, the likelihood that this entity will be randomly encountered.

Other, Optional Fields

display (optional, string) - What to show the user when browsing this entity.
    • defaulttrue
      namecombination

Include Page
Generic Resource Fields
Generic Resource Fields