Excerpt |
---|
Lemmatize tokens matched to words in a dictionary. |
Operates On: Lexical Items with TOKEN
Note |
---|
This lemmatization does not use rules |
Configuration Parameters
dictionary(string, optional) - The resource containing the list of words and relationships
- if no dictionary is provided a default dictionary will be use
include(list, optional) - A list of the relationships to include
exclude(list, optional) - A list of the relationships to exclude
- skipFlags (string array, optional) - Flags to be skipped by this stage
- Tokens marked with this flags will be ignore by this stage, and no process will be performed.
- requireFlags (string array, optional)
- Tokens need to have all the specified flags, in order to be processed
- debug (boolean, optional)
- Enable all debug log functionality of the stage, if any.
Note |
---|
Default dictionary only available in English |
Code Block |
---|
language | js |
---|
theme | Eclipse |
---|
title | Example Configuration |
---|
|
{
"type": "LemmatizeStage",
"include" : ["pl", "vf"],
"exclude" : ["ob"],
"dictionary" : "lemmatize-provider:lemmatize_words"
} |
Example Output
Code Block |
---|
language | text |
---|
theme | FadeToGrey |
---|
|
V--------------------[I am liking this projects very much]--------------------V
^--[I]--V--[am]--V--[liking]--V--[this]--V--[projects]--V--[very]--V--[much]--^
^--[be]--^---[like]---^ ^--[project]---^
am - {"confidence":0.0084,"rel":["vf","wnm"],"to":"be"}
liking - {"confidence":0.0084,"rel":["vf","wnm"],"to":"like"}
projects - {"confidence":0.012,"rel":["vf","wnm","pl"],"to":"project"} |
Output Flags
Lex-Item Flags:
- LEMMATIZE- All words retrived will be marked as LEMMATIZE
Resource Data
The resource data will be a json file with an array of words in a field named words
Code Block |
---|
|
{
"words": [
{
"confidence": 0.0049,
"rel": [
"wnm",
"sp"
],
"from": "encyclopaedia",
"to": "encyclopedia"
},
{
"confidence": 0.0752,
"rel": [
"wnm",
"sp"
],
"from": "word",
"to": "worth"
}
]
}
|
Relationships
Anchor |
---|
| Relationships |
---|
| Relationships |
---|
|
The required fields for each entry are:
- from - Original word to search for
- this field will be eliminated once added to the entities of the LexItem
- to - Resulting word
- it will be a new LexItem on its own
- rel - List of relationships between the original word and the resulting word
- List of relationships in the default dictionary:
- pl - pluralization
- vf - verb form
- ob - obsolete
- syn - synonym
- alt - alternative
- wwm - word with meaning (more than one)
- wnm - word no meaning (no additional meaning)
Tip |
---|
Any other field will be included in the entities of the LexItem |