Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Excerpt

Identifies patterns with a combination of any number of specified tokens, regardless of the surrounding tokens

...

.


Operates On:  Lexical Items with TOKEN or SEMANTIC_TAG and

...

other possible flags as specified below, but not on TEXT_BLOCK.

Saga_is_recognizer

Include Page
Generic Configuration Parameters
Generic Configuration Parameters

Configuration Parameters

  • boundaryFlags (string, optional) 
    • The tokens to process must be inside two vertex mark with this flags (e.g ["TEXT_BLOCK_SPLIT"])
  • skipFlags (string array, optional) - Flags to be skipped by this stage
    • Tokens marked with this flags will be ignore by this stage, and no process will be performed.
  • requiredFlags (string array, optional)
    • Tokens need to have all the specified flags, in order to be processed.
  • atLeastOneFlag (string array, optional)
    • Tokens will need at least one of the flags specify in this array.
  • debug (boolean, optional)
    • Enable all debug log functionality of the stage, if any.

Configuration Parameters

  • patterns (string, required) - The resource which contains the pattern database
    • See below for the format.
Code Block
languagejs
themeEclipse
titleExample Configuration
{
 "type":"Fragmentation",
 "patameter":"fragmented-provider:patterns",
 "boundaryFlags":["TEXT_BLOCK_SPLIT"]
}

Example Output

Description

...

languagetext
themeFadeToGrey
  • Parameter
    summaryThe resource that contains the pattern database.
    namepatterns
    requiredtrue
  • Parameter
    summaryIf true the stage will prefer larger patterns
    defaulttrue
    namepreferLarge
    typeboolean


Code Block
boundaryFlagstext block split
stageFragmentation
requiredFlagstoken, semantic tag
languagejs
skipFlagsskip
"patterns":"saga_provider:fragmented_patterns",
"preferLarge":true

Example Output

Code Block
languagetext
V--------------[abraham lincoln likes macaroni and cheese]--------------------V
^--[abraham]--V--[lincoln]--V--[likes]--V--[macaroni]--V--[and]--V--[cheese]--^

...

              ^---{place}---^           ^----{food}----^         ^---{food}---^
^----------{person}---------^           ^-----------------{food}--------------^

Output Flags

Lex-Item Flags:

  • SEMANTIC_TAG - Identifies all lexical items

...

  • that are semantic tags.
  • FRAGMENT- Identifies all lexical items

...

  • that were created from a fragmentation pattern.

...

Vertex Flags:

...

Info

No vertices are created in this stage


Resource Data

The resource data

...

is a database of fragmented patterns, and the resulting semantic

...

tags they produce.

Resource Format

The only required file

...

is

...

the entity dictionary. It is a series of JSON records, typically indexed by entity ID.

Description of entity

...

Entity JSON

...

Format 

...

Anchor

...

resourceFormat

...

resourceFormat

...

Code Block

...

TitleEntity

...

Json Format

...

languagejs
"tag": "

...

{city}",

...

"

...

pattern":

...

 "

...

("

...

how many"

...

|"how much") {ingredient} ",
"confAdjust": 0.95
. . . additional fields as needed go here . . . 


Code Block
languagejs
titleEntity JSON Format
"_id" : "KGAAJGsBemSwA0nZTLXA",
"tag":["recipe"],
"pattern": "{number} {ingredient}",
"options": {

...

  "minTokens": 3,
  

...

"maxTokens": 

...

2,

...

  "combination": true

...

}

...

"

...

confAdjust":0.95

...

 

...

. . . additional fields as needed go here . . . 

...

...

Note
  • Multiple

...

  • entries can have the same pattern.

...

  • If the pattern is matched, then it will be tagged with multiple (ambiguous)

...

  • entry IDs.
  • Additional fielded data can be added to the record

...

  • ; as needed by downstream processes.


Fields

...

  • Typically this is an identifier with meaning to the larger application which is using the Language Processing Toolkit.
  • Parameter
    summaryTag which will identify any match in the graph, as an interpretation
    nametag
    requiredtrue

...

    • These will all be added to the interpretation graph with the SEMANTIC_TAG flag.

...

    • Tip

      Tags are hierarchical representations of the same intent. For example, {city} → {administrative-area} → {geographical-area}

...

  • Parameter
    summaryPattern to match in the content

...

  • Patterns will be tokenized and there may be multiple variations which can match.
    • NOTE:  Currenty, tokens are separated on simple white-space and punctuation, and then reduced to lowercase.
    • TODO:  This will need to be improved in the future, perhaps by specifying a pipeline to perform the tokenization and to allow for multiple variations.
  • namepattern
    requiredtrue

  • Parameter
    summary
  • Object with options applicable for this entity

...

  • nameoptions
    typejson
    • Parameter
      summaryMinimum number of tokens the match must contains to be valid

...

    • . The default is the number of tokens contained in each pattern

...

    • .
      nameminTokens
      typeinteger
    • Parameter
      summaryMaximum number of tokens

...

    • the match must

...

    • contain to be valid

...

    • . The default is the number of tokens contained in each pattern

...

    • namemaxTokens
      typeinteger
    • Parameter
      summaryndicates

...

    • if the given tokens can be

...

    • matched in any order

...

    • as long as all appear in the match

...

    • . If false,

...

    • the tokens

...

    • must be in the order provided

...

  • This is the confidence of the entity, in comparison to all of the other entities. Essentially, the likelihood that this entity will be randomly encountered.

Other, Optional Fields

...

    • .
      defaulttrue
      namecombination

Include Page
Generic Resource Fields
Generic Resource Fields