Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Excerpt

...

This Stage flags tokens that are matched to

...

Stop-Words. The flagged tokens will be skipped in subsequent stages (if so indicated on the configuration). 


Operates On:  Lexical Items with TOKEN and possibly other flags as specified below.

Saga_is_recognizer
Recognizerfalse

Include Page
Generic Configuration Parameters
Generic Configuration Parameters

...

Configuration Parameters

  • Parameter
    summaryIf true, all stop words and tokens will be

...

  • processed as case insensitive

...

  • .
    default

...

  • true
    namecaseInsensitive
    typeboolean

  • Parameter
    summaryThe resource containing the list of stop words. Or the direct

...

  • list of stop words
    namestopWords

    • See below for the format.

...

    • If no resource or list is provided, the stage will use the default list of stop words.
Info
titleDefault list of stop words

a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not, of, on, or, such, that, the, their, then, there, these, they, this, to, was, will, with

Code Block
requiredFlagstoken
languagejs

...

titleConfig as Resource
"caseInsensitive" : true,
"stopWords" : "words-provider:stop_words"
Code Block
requiredFlagstoken
languagejs
titleConfig as List
"caseInsensitive" : true,
"stopWords" : ["a", "about", "above", "after", "again", "all",
  "am", "an", "and", "the", "i", "who", ...]

Example Output

Code Block
languagetext

...

...

V--------------[A test to be skipped]--------------V  

...

^--[A]--V--[test]--V--[to]--V--[be]--V--[skipped]--^  

...

^--[a]--^  


Item [A] - [TOKEN, 

...

STOP_WORD ]
Item [to] - [TOKEN, STOP_WORD 

...

]
Item [be] - [TOKEN, 

...

STOP_WORD ]
Item [a] - [TOKEN, 

...

STOP_WORD ]

Output Flags

Lex-Item Flags

...

...

  • STOP_WORD All matched stop words will be marked as

...

  • STOP_WORD.

Vertex Flags:

Info

No vertices are created in this stage

Resource Data

The resource data will be a json file with an array of words in a field

...

named stopWords.

Code Block
languagejs

...

"stopWords": ["a", "about", "above", "after", "again", "all", "am", "an", "and", "the", "i", "who", ...]

...