Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Excerpt

...

This Stage flags tokens that are matched to

...

Stop-Words. The flagged tokens will be skipped in subsequent stages (if so indicated on the configuration). 


Operates On:  Lexical Items with TOKEN and possibly other flags as specified below.

Saga_is_recognizer
Recognizerfalse

Include Page
Generic Configuration Parameters

...

Generic Configuration Parameters

Configuration Parameters

  • Parameter
    summaryIf true, all stop words and tokens will be

...

  • processed as case insensitive

...

  • .
    default

...

  • true
    namecaseInsensitive
    typeboolean

  • Parameter
    summary

  • The resource containing the list of stop words. Or the direct list of stop words
    namestopWords

    • See below for the format.

...

    • If no resource or list is provided, the stage will use the default list of stop words

...

    • .

...

  • Tokens need to have all the specified flags, in order to be processed

...

  • Enable all debug log functionality of the stage, if any.
Info
titleDefault list of stop words

a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not, of, on, or, such, that, the, their, then, there, these, they, this, to, was, will, with

...

Code Block
languagejs
themeEclipse
titleExample Configuration
{
  "type": "StopWordsStage",
  "caseInsensitive" : true,
  "stopWords" : "words-provider:stop_words"
}

Example Output

...

languagetext
themeFadeToGrey

Saga_config_stage
requiredFlagstoken
titleConfig as Resource
"caseInsensitive" : true,
"stopWords" : "words-provider:stop_words"
Saga_config_stage
requiredFlagstoken
titleConfig as List
"caseInsensitive" : true,
"stopWords" : ["a", "about", "above", "after", "again", "all",
  "am", "an", "and", "the", "i", "who", ...]

Example Output

Saga_graph
V--------------[A test to be skipped]--------------V  

...

^--[A]--V--[test]--V--[to]--V--[be]--V--[skipped]--^  

...

^--[a]--^  


Item [A] - [TOKEN, STOP_WORD 

...

]
Item [to] - [TOKEN, 

...

STOP_WORD ]
Item [be] - [TOKEN, 

...

STOP_WORD ]
Item [a] - [TOKEN, 

...

STOP_WORD ]

Output Flags

Lex-Item Flags

...

...

  • STOP_WORD All matched stop words will be marked as

...

languagejs
themeEclipse

...

  • STOP_WORD.

Vertex Flags:

Info

No vertices are created in this stage

Resource Data

The resource data will be a json file with an array of words in a field

...

named stopWords.

Saga_json
"stopWords": ["a", "about", "above", "after", "again", "all", "am", "an", "and", "the", "i", "who", ...]

...