Page History

Warning
This is a work in progress you can expect things to break while using this stage.

Excerpt
This stage review tokens using Elasticsearch suggestions functionality and creates a new token with a "suggestion" for word it does not recognize.

The process takes all the available tokens (usually already tokenized by the "WhitespaceTokenizerStage") for the stage (using the highest confidence route), flags like "STOP_WORD" or "ALL_UPPER_CASE" can be used as filters by including them in the "Skip Flags" list.

Operates On: Lexical Items with TOKEN and possibly other flags as specified below.

Saga_is_recognizer

Note
This recognizer requires a dictionary to work, so it must be loaded either from a dataset or a file before using it. Validate your Elasticsearch version to ensure this stage is compatible.

Include Page

	Generic Configuration Parameters
	Generic Configuration Parameters

Configuration Parameters

Parameter
summary Index used by the stage to store dictionary data.
default spellcheck_dictionary
name index
- This is an Elasticsearch index.
Parameter
summary Schema used by Elasticsearch connection
default http
name schema
Parameter
summary Hostname
default localhost
name host
Parameter
summary Port
default 9200
name port
type integer

Saga_config_stage

boundaryFlags	text block split

"index": "saga_spellchecker_dictionary",
"schema": "http",
"host": "localhost",
"port": "9200"

Example Output

Saga_graph
V--------------[abraham lincoln likes makaroni and cheese]--------------------V ^--[abraham]--V--[lincoln]--V--[likes]--V--[makaroni]--V--[and]--V--[cheese]--^ ^--[macaroni]--^

Output Flags

Lex-Item Flags:

MISSPELL- Identifies a token as potential misspelling.
SUGGESTION - Added to the newly created token to identify it as a generated token and coming from the dictionary.

Vertex Flags:

Info
No vertices are created in this stage

Resource Data

The data used by the dictionary may come from 2 sources:

Dataset
Plain text file

Both options are accessed through Saga Server or the endpoints of the service. To create a dictionary from a dataset, select the one you are interested in and select the pipeline to process it, remember that the pipeline must end with a Spellchecker Stage. To create a dictionary from a file you only need a plain text file with terms separated by new line.

Resource Format

Code Block

title
language	text

Saga_json

Title	Dictionary Plain Text File

abraham
lincoln
likes
macaroni
and
cheese

Page tree

Versions Compared

Old Version 2

New Version Current

Key