...
Code Block |
---|
language | js |
---|
title | Entity JSON Format |
---|
|
{
"id":"Q28260",
"tags":["{city}", "{administrative-area}", "{geography}"],
"patterns":[
"Lincoln", "Lincoln, Nebraska", "Lincoln, NE"
],
"confidence":0.95
. . . additional fields as needed go here . . .
} |
Notes
- Multiple entities can have the same pattern.
- If the pattern is matched, then it will be tagged with multiple (ambiguous) entity IDs.
- Additional fielded data can be added to the record
- As needed by downstream processes.
Fields
- id (required, string) - Identifies the entity by unique ID. This identifier must be unique across all entities (across all dictionaries) regardless of the type.
- Typically this is an identifier with meaning to the larger application which is using the Language Processing Toolkit.
- tags (required, array of string) - The list of semantic tags which will be added to the interpretation graph whenever any of the patterns are matched.
- These will all be matched with the SEMANTIC_TAG flag.
- patterns (required, array of string) - A list of patterns to match in the content.
- Patterns will be tokenized and there may be multiple variations which can match.
- (details TBD)
- confidence (optional, float) - Specifies the confidence level of the entity, independent of any patterns matched.
- This is the confidence of the entity, in comparison to all of the other entities. Essentially, the likelihood that this entity will be randomly encountered.
Other, Optional Fields
- display (optional, string) - What to show the user when browsing this entity.
- context (optional, object) -
Pattern Map
...
- A context vector which can help disambiguate this entity from others with the same pattern.
- Format TBD, but probably a list of weighted words, phrases and tags.
Dictionary Index
To improve performance especially for every large databases of entities, the entity dictionary is inverted and indexed.