Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • dictionary(string, optional) - The resource containing the list of words and relationships

    • if no dictionary is provided a default dictionary will be use
  • include(list, optional) - A list of the relationships to include

  • exclude(list, optional) - A list of the relationships to exclude

  • skipFlags (string array, optional) - Flags to be skipped by this stage
    • Tokens marked with this flags will be ignore by this stage, and no process will be performed.
  • requiredFlags (string array, optional)
    • Tokens need to have all the specified flags, in order to be processed
  • debug (boolean, optional)
    • Enable all debug log functionality of the stage, if any.
  • languageISO3 (string, optional) - The language you need the lemmatize stage use. The value needs to be one of the ISO 3 letter language codes.
    • By default, English is always used unless configured otherwise. At the moment only English (ENG) and Spanish (SPA) are available.


Note

Default dictionary only available in English

Spanish is supported when parameter languageISO3 is properly configured

Code Block
languagejs
themeEclipse
titleExample Configuration
{
  "type": "LemmatizeStage",
  "include" : ["pl", "vf"],
  "exclude" : ["ob"],
  "dictionary" : "lemmatize-provider:lemmatize_words",
  "languageISO3":"SPA"
}


Example Output

Code Block
languagetext
themeFadeToGrey
  V--------------------[I am liking this projects very much]--------------------V  
  ^--[I]--V--[am]--V--[liking]--V--[this]--V--[projects]--V--[very]--V--[much]--^  
          ^--[be]--^---[like]---^          ^--[project]---^  

am - {"confidence":0.0084,"rel":["vf","wnm"],"to":"be"}
liking - {"confidence":0.0084,"rel":["vf","wnm"],"to":"like"}
projects - {"confidence":0.012,"rel":["vf","wnm","pl"],"to":"project"}

...

The resource data will be a json file with an array of words in a field named wordsThis when the 'dictionary' parameter is used.

Code Block
languagejs
themeEclipse
{
  "words": [
    {
      "confidence": 0.0049,
      "rel": [
        "wnm",
        "sp"
      ],
      "from": "encyclopaedia",
      "to": "encyclopedia"
    },
    {
      "confidence": 0.0752,
      "rel": [
        "wnm",
        "sp"
      ],
      "from": "word",
      "to": "worth"
    }
  ]
}


When the 'dictionary' parameter is not used an embedded wiktionary file will be used. This file is formatted as a 1 entry json per line:

Code Block
languagejs
themeEclipse
titleWiktionary file format
{"confidence":0,"rel":["syn"],"from":"japonés","to":"nipón"}
{"confidence":0,"rel":["syn"],"from":"alemán","to":"germano"}
{"confidence":0,"rel":["syn"],"from":"alemán","to":"tedesco"}
{"confidence":0,"rel":["syn"],"from":"alemán","to":"teutón"}
{"confidence":0,"rel":["syn"],"from":"alemán","to":"gringo"}
{"confidence":0,"rel":["syn"],"from":"mayo","to":"guainica"}
{"confidence":0,"rel":["syn"],"from":"mayo","to":"maisito"}
{"confidence":0,"rel":["syn"],"from":"mayo","to":"mayito"}
{"confidence":0,"rel":["syn"],"from":"mayo","to":"turpial de sureste"}
{"confidence":0,"rel":["syn"],"from":"domingo","to":"paga"}

Relationships
Anchor
Relationships
Relationships

...