You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

Lemmatize tokens are matched to words in a dictionary.

Configuration

Method

  • Build-in dictionary - Out of the box dictionaries

    • Language - Currently we support English and Spanish

  • Custom dictionary - Costume dictionary, to know the format of the dictionary please check Lemmatize Stage

    • Dictionary Resource - The name of the resource where the dictionary can be found 

You can use the default Saga Server file system provider filesystem-provider pointing to the config folder, and add your dictionary there (e.g filesystem-provider:dictionary-name)



Build-in Diccionaty

 

General Settings

The general settings can be accessed by clicking on 


  • Enable - Enables the processor to be used in the pipeline.
  • Skip Flags ( optional ) - Lexical items flags to be ignored by this processor.
  • Boundary Flags  ( optional ) - List of vertex flags that indicate the beginning and end of a text block.
  • Required Flags ( optional ) - Lexical items flags required by every token to be processed.
  • At Least One Flags ( optional ) - List of lexical item flags where at least one of them needs to be present to be processed.
  • Don't Process Flags ( optional ) - List of lexical items flags that are not processed. The difference with "Skip Flags" is that this will drop the path in the Saga graph, skip just skips the token and continues in the same path.
  • Confidence Adjustment - Adjustment factor to apply to the confidence value of 0.0 to 2.0 from (Applies for every match).
    • 0.0 to < 1.0  decreases confidence value
    • 1.0 confidence value remains the same
    • > 1.0 to  2.0 increases confidence value
  • Debug - Enable debug logging.

  • No labels