Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Excerpt

This stage reviews tokens using Elasticsearch suggestions functionality and creates a new token with a "suggestion" for words it didn't recognize. The process takes all the available tokens (usually already tokenized by the "WhitespaceTokenizerStage") for the stage (using the highest confidence route), flags like "STOP_WORD" or "ALL_UPPER_CASE" can be used as filters by including them in the "Skip Flags" list.

Info

Uses Spellchecker Stage

Configuration

  • Load from Dataset - Load dictionary from pre-loaded dictionaries.
  • Load from File - Load a dictionary from your local machine.
  • Delete Dictionary - Delete a dictionary.

General Settings

Include PageGeneric Processor ConfigGeneric Processor ConfigElasticsearch Connection Settings

  • Parameter
    summarySchema used by Elasticsearch connection
    defaulthttp
    nameSchema
  • Parameter
    summaryElasticsearch connection port.
    default9200
    namePort
    typeinteger
  • Parameter
    summaryHost used by Elasticsearch connection
    defaultlocalhost
    nameHost
  • Parameter
    summaryIndex used by the stage to store dictionary data.
    defaultsaga_spellcheck_dictionary
    nameIndex Name
    • This is an Elasticsearch index.

General Settings

Include Page
Generic Processor Config
Generic Processor Config