...
...
...
...
Include Page | ||||
---|---|---|---|---|
|
...
Parameter | ||||||
---|---|---|---|---|---|---|
|
Parameter | ||||
---|---|---|---|---|
|
Parameter | ||
---|---|---|
|
...
|
...
Parameter | |
---|---|
|
|
...
|
Parameter | ||||
---|---|---|---|---|
|
Parameter | ||
---|---|---|
|
...
|
...
|
Parameter | |
---|---|
|
|
...
|
...
|
Parameter | ||
---|---|---|
|
...
|
...
|
...
|
Parameter | ||||||||
---|---|---|---|---|---|---|---|---|
|
Parameter | ||||||||
---|---|---|---|---|---|---|---|---|
|
Parameter | ||||||||
---|---|---|---|---|---|---|---|---|
|
Parameter | ||||||||
---|---|---|---|---|---|---|---|---|
|
Code Block | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||
"algorithm": "Levenshtein"
"algorithm_params": {}
"dictionary": |
...
language | js |
---|---|
theme | Eclipse |
title | Example Configuration |
...
"dict-provider:people-lowercase", "dontProcessTags": ["color", "currency" |
...
],
"normalizeAccents": false,
"removeChars": false,
"charsList": "_-‿⁀⁔︳︴﹍﹎﹏_"
"spellchecking": false
"cosineSimThreshold": 0.7
"lowercase": true
"matchAll": false
"matchAllThreshold": 1.0 |
Note |
---|
"people-lowercase" resource must be in the format |
...
specified below. |
In the following example, "abraham lincoln" is in the dictionary as a person, "lincoln" as a place, and "macaroni", "cheese" and "macaroni and cheese" are all specified as foods:
Code Block | ||
---|---|---|
|
...
V--------------[abraham lincoln likes macaroni and cheese]--------------------V
^--[abraham]--V--[lincoln]--V--[likes]--V--[macaroni]--V--[and]--V--[cheese]--^
|
...
^--[{place}]--^ ^---[{food}]---^ ^--[{food}]--^ ^---------[{person}]--------^ ^----------------[{food}]-------------^ |
...
...
...
Info |
---|
No vertices are created in this stage |
The dictionary tagger must have an "entity dictionary" (a string to JSON map) which is a list of JSON records, indexed by entity ID. In addition, there may also be a pattern map and a token index.
The only file
...
that is absolutely required is the entity dictionary. It is a series of JSON records, typically indexed by entity ID.
Each JSON record represents an entity. The format is as follows:
Code Block | ||||
---|---|---|---|---|
|
...
"_id" : "KGAAJGsBemSwA0nZTLXA", "id":"Q28260", |
...
" |
...
tag": |
...
"{city}", |
...
"display": "Lincon" "patterns":[ |
...
"Lincoln", "Lincoln, Nebraska", "Lincoln, NE"
|
...
], "fields": { |
...
"coord": [40.813639, -96.702611] } "confAdjust": 0.95 |
...
|
...
. . . additional fields as needed go here . . . |
...
...
Note |
---|
|
...
|
...
|
...
|
...
|
...
Parameter | ||||||
---|---|---|---|---|---|---|
|
Parameter | ||||||
---|---|---|---|---|---|---|
|
...
These will all be added to the interpretation graph with the SEMANTIC_TAG flag.
...
Tip |
---|
Tags |
...
are hierarchical representations of the same intent. For example, {city} → {administrative-area} → {geographical-area} |
...
Parameter | ||
---|---|---|
|
...
|
Patterns will be tokenized and there may be multiple variations which can match.
Note |
---|
...
...
Currently, tokens are separated on simple white-space and punctuation, and then reduced to lowercase. |
...
Parameter | |
---|---|
|
|
...
|
Parameter | ||||||
---|---|---|---|---|---|---|
|
Include Page | ||||
---|---|---|---|---|
|
...
To improve performance especially for every large databases of entities, the entity dictionary is inverted and indexed.
This currently happens in RAM inside the DictionaryTagger stage. An off-line option for pre-inverting the dictionary will be provided in the future.