Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Excerpt

This Stage flags tokens that are matched to Stop-Words. The flagged tokens will be skipped in subsequent stages (if so indicated on the configuration). 


Operates On:  Lexical Items with TOKEN and possibly other flags as specified below.

Saga_is_recognizer
Recognizerfalse

Include Page
Generic Configuration Parameters
Generic Configuration Parameters

Configuration Parameters

  • Parameter
    summaryIf true, all stop words and tokens will be processed as case insensitive.
    defaulttrue
    namecaseInsensitive
    typeboolean

  • Parameter
    summaryThe resource containing the list of stop words. Or the direct list of stop words
    namestopWords

    • See below for the format. If no resource or list is provided, the stage will use the default list of stop words.
Info
titleDefault list of stop words

a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not, of, on, or, such, that, the, their, then, there, these, they, this, to, was, will, with

Saga_config_stagecode
requiredFlagstoken
languagejs
titleConfig as Resource
"caseInsensitive" : true,
"stopWords" : "words-provider:stop_words"
Saga_config_stagecode
requiredFlagstoken
languagejs
titleConfig as List
"caseInsensitive" : true,
"stopWords" : ["a", "about", "above", "after", "again", "all",
  "am", "an", "and", "the", "i", "who", ...]

Example Output

saga_graph
Code Block
languagetext
V--------------[A test to be skipped]--------------V  
^--[A]--V--[test]--V--[to]--V--[be]--V--[skipped]--^  
^--[a]--^  


Item [A] - [TOKEN, STOP_WORD ]
Item [to] - [TOKEN, STOP_WORD ]
Item [be] - [TOKEN, STOP_WORD ]
Item [a] - [TOKEN, STOP_WORD ]

Output Flags

Lex-Item Flags

  • STOP_WORD - All matched stop words will be marked as STOP_WORD.

Vertex Flags:

Info

No vertices are created in this stage

Resource Data

The resource data will be a json file with an array of words in a field named stopWords.

saga_json
Code Block
languagejs
"stopWords": ["a", "about", "above", "after", "again", "all", "am", "an", "and", "the", "i", "who", ...]