Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Excerpt

This Stage flags tokens that are matched to Stop-Words. The flagged tokens will be skipped in subsequent stages (if so indicated on the configuration). 

This Stage flags tokens that are matched to Stop-Words. The flagged tokens will be skipped in subsequent stages (if so indicated on the configuration). 


Operates On:  Lexical Items with TOKEN and possibly other flags as specified below.

Saga_is_recognizer
Recognizerfalse

Include Page
Generic Configuration Parameters
Generic Configuration Parameters

Configuration Parameters

  • Parameter
    summaryIf true, all stop words and tokens will be processed as case insensitive.
    defaulttrue
    namecaseInsensitive
    typeboolean

  • Parameter
    summaryThe resource containing the list of stop words. Or the direct list of stop words
    namestopWords

    • See below for the format. If no resource or list is provided, the stage will use the default list of stop words.
Info
titleDefault list of stop words

a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not, of, on, or, such, that, the, their, then, there, these, they, this, to, was, will, with

saga_config_stagesaga_config_stage
Code Block
requiredFlagstoken
languagejs
titleConfig as Resource
"caseInsensitive" : true,
"stopWords" : "words-provider:stop_words"
Code Block
requiredFlagstoken
languagejs
titleConfig as List
"caseInsensitive" : true,
"stopWords" : ["a", "about", "above", "after", "again", "all",
  "am", "an", "and", "the", "i", "who", ...]

Example Output

Code Block
languagetext
themeFadeToGrey
V--------------[A test to be skipped]--------------V  
  ^--[A]--V--[test]--V--[to]--V--[be]--V--[skipped]--^  
  ^--[a]--^  


Item [A] - [TOKEN, SKIPSTOP_WORD ]
Item [to] - [TOKEN, SKIPSTOP_WORD ]
Item [be] - [TOKEN, STOP_WORD SKIP]
Item [a] - [TOKEN, STOP_WORD SKIP]

Output Flags

Lex-Item Flags

  • SKIP STOP_WORD All matched stop words will be marked as SKIP.STOP_WORD.

Vertex Flags:

Info

No vertices are created in this stage

Resource Data

The resource data will be a json file with an array of words in a field named stopWords.

saga_json
Code Block
languagejs
"stopWords": ["a", "about", "above", "after", "again", "all", "am", "an", "and", "the", "i", "who", ...]