You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Next »

Looks up sequences of tokens in a dictionary and then tags the sequence with one or more semantic tags as an alternative representation(s). Note that all possibilities are tagged, including overlaps and sub-patterns, with the expectation that later disambiguation stages will choose which tags are the correct interpretation.

Operates On:  Lexical Items with TOKEN and possibly other flags as specified below.

Configuration Parameters

  • dictionary (string, required) - The dictionary resource which holds the names and to be located in the text.
    • This is specified as "provider:name" in the standard resource format (INSERT LINK HERE).
  • required (string, optional) - Only process the tokens with the specified flags.
    • A JSON array of strings, such as ["TOKEN", "LOWER"]


Example Configuration
{
 "type":"DictionaryTagger",
 "dictionary":"dict-provider:people-lowercase",
 "required":["TOKEN", "LOWER"]
}

Note that the "people-lowercase" resource must be in the format as specified below.

Example Output

In the following example, "abraham lincoln" is in the dictionary as a person, "lincoln" as a place,  and "macaroni", "cheese" and "macaroni and cheese" are all specified as foods:


V--------------[abraham lincoln likes macaroni and cheese]--------------------V
^--[abraham]--V--[lincoln]--V--[likes]--V--[macaroni]--V--[and]--V--[cheese]--^
              ^---{place}---^           ^----{food}----^         ^---{food}---^
^----------{person}---------^           ^-----------------{food}--------------^

Output Flags

Lex-Item Flags:

  • SEMANTIC_TAG - Identifies all lexical items which are semantic tags.

Resource Data

The dictionary format is under change... will be documented when complete.

Synonym-Based Dictionary Format


Dictionary Index Format


Entity Based Dictionary Format


Dynamic Updates Format





  • No labels