Excerpt |
---|
Identifies geo locations, based on the patterns loaded. |
Operates On: Lexical Items with TOKEN and possibly other flags as specified below.
Saga_is_recognizer
Include Page | ||||
---|---|---|---|---|
|
Parameter | ||||||
---|---|---|---|---|---|---|
|
Parameter |
---|
|
Parameter | ||||||||
---|---|---|---|---|---|---|---|---|
|
Parameter | ||||||||
---|---|---|---|---|---|---|---|---|
|
Parameter | ||||||||
---|---|---|---|---|---|---|---|---|
|
Parameter | ||||||||
---|---|---|---|---|---|---|---|---|
|
Code Blocksaga_config_stage | ||||
---|---|---|---|---|
| ||||
"charList": "_-‿⁀⁔︳︴﹍﹎﹏_", "dictionary": "saga-provider:saga_geonames", "lowercase": true, "minimum"parameter": 3, "normalizeAccents"something something": true "removeChars": false |
Code Block | ||
---|---|---|
| ||
V---------------[this is COSTA RICA !!! and Costa Verda]---------------V ^-[this]-V-[is]-V-[COSTA]-V-[RICA]-V-[!!!]-V-[and]-V-[Costa]-V-[Verda]-^ ^-[costa]-^-[rica]-^ ^-[costa]-^-[verda]-^ ^--[{_geoname_}]---^ ^---[{_geoname_}]---^ |
Info |
---|
No vertices are created in this stage |
The only file that is absolutely required is the geonames dictionary. It is a series of JSON records, typically indexed by entity ID.
Each JSON record represents an entity. The format is as follows:
Saga_jsoncode | ||||
---|---|---|---|---|
| ||||
"_id" : "KGAAJGsBemSwA0nZTLXA", "id" : 3621815, "display" : "San Juan", "patterns" : [ "San Juan" ], "tag" : "DDfO1HABPr3bu3tFxDT4", "fields" : { "feature class" : "P", "feature code" : "PPL", "admin3 code" : "20203", "timezone" : "America/Costa_Rica", "country code" : "CR", "admin1 code" : "01", "location" : { "lon" : -84.4654, "lat" : 10.10676 }, "modification date" : "2016-09-07", "admin2 code" : "202", "dem" : 1093, "population" : 0 }, "confAdjust" : 1.0 |
Note |
---|
|
Parameter | ||||||
---|---|---|---|---|---|---|
|
Parameter | ||||||
---|---|---|---|---|---|---|
|
These will all be added to the interpretation graph with the SEMANTIC_TAG flag.
Parameter | ||||||||
---|---|---|---|---|---|---|---|---|
|
Patterns will be tokenized and there may be multiple variations which can match.
Note |
---|
Currently, tokens are separated on simple white-space and punctuation, and then reduced to lowercase. |
Parameter | ||||
---|---|---|---|---|
|
Parameter | ||||||
---|---|---|---|---|---|---|
|
Include Page | ||||
---|---|---|---|---|
|
To improve performance especially for every large databases of entities, the entity dictionary is inverted and indexed.
This currently happens in RAM inside the GeoNames stage. An off-line option for pre-inverting the dictionary will be provided in the future.