Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Excerpt

Flag tokens matched to stop words, to be skipped for following subsequent stages.


Operates On:  Lexical Items with TOKEN

...

  • caseInsensitive (string, optional) - If true, all stop words and tokens will be process processed as case insensitive (default = true).

  • stopWords (string, optional) - The resource containing the list of stop words.

    • See below for the format. if If no resource is provided, the stage will use the default list of stop words

...

Code Block
languagejs
themeEclipse
titleExample Configuration
{
  "type": "StopWords",
  "caseInsensitive" : true,
  "stopWords" : "words-provider:stop_words"
}

Example Output

Code Block
languagetext
themeFadeToGrey
V--------------[A test to be skipped]--------------V  
  ^--[A]--V--[test]--V--[to]--V--[be]--V--[skipped]--^  
  ^--[a]--^  

Item [A] - [TOKEN, SKIP]
Item [to] - [TOKEN, SKIP]
Item [be] - [TOKEN, SKIP]
Item [a] - [TOKEN, SKIP]

Output Flags

Lex-Item Flags

...

  • SKIP - All matched stop words will be marked as SKIP.

Resource Data

The resource data will be a json file with an array of words in a field named stopWrods stopWords.

Code Block
languagejs
themeEclipse
{
  "stopWords": ["a", "about", "above", "after", "again", "all", "am", "an", "and", "the", "i", "who", ...]
}