Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Excerpt

...

This Stage flags tokens that are matched to

...

Stop-Words. The flagged tokens will be skipped in subsequent stages (if so indicated on the configuration). 


Operates On:  Lexical Items with TOKEN and possibly other flags as specified below.

Saga_is_recognizer
Recognizerfalse

Include Page
Generic Configuration Parameters
Generic Configuration Parameters

Configuration Parameters

...

  • Parameter
    summaryIf true, all stop words and tokens will be

...

  • processed as case insensitive

...

  • .
    default

...

  • true
    namecaseInsensitive
    typeboolean

  • Parameter
    summary

  • The resource containing the list of stop words. Or the direct list of stop words
    namestopWords

    • See below for the format.

...

    • If no resource or list is provided, the stage will use the default list of stop words.
Info
titleDefault list of stop words

a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not, of, on, or, such, that, the, their, then, there, these, they, this, to, was, will, with

...

Saga_config_stage
requiredFlagstoken
titleConfig as Resource
"caseInsensitive" : true,
"stopWords" : "words-provider:stop_words"
Saga_config_stage
requiredFlagstoken
titleConfig as List
"caseInsensitive" : true,
"stopWords" : ["a", "about", "above", "after", "again", "all"

...

languagejs
themeEclipse
titleExample Configuration

...

,
  "

...

am", 

...

describe the configuration...

Example Output

...

languagetext
themeFadeToGrey
"an", "and", "the", "i", "who", ...]

Example Output

Saga_graph
V--------------[A test to be skipped]--------------V  

...

^--[A]--V--[test]--V--[to]--V--[be]--V--[skipped]--^  

...

^--[a]--^  


Item [A] - [TOKEN,

...

 STOP_WORD ]
Item [to] - [TOKEN,

...

 STOP_WORD ]
Item [be] - [TOKEN,

...

 STOP_WORD ]
Item [a] - [TOKEN,

...

 STOP_WORD ]

Output Flags

Lex-Item Flags

...

...

  • STOP_WORD All matched stop words will be marked as

...

languagejs
themeEclipse

...

  • STOP_WORD.

Vertex Flags:

Info

No vertices are created in this stage

Resource Data

The resource data will be a json file with an array of words in a field

...

named stopWords.

Saga_json
"stopWords": ["a", "about", "above", "after", "again", "all", "am", "an", "and", "the", "i", "who", ...]

...