Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Excerpt

Flag tokens matched to stop words, to be skipped for following stages


Operates On:  Lexical Items with TOKEN

...

  • caseInsensitive (string, optional) - If true, all stop words and tokens will be process as case insensitive (default = true)

  • stopWords (string, optional) - The resource containing the list of stop words

    • See below for the format. if no resource is provided the stage will use the default list of stop words
Info
titleDefault list of stop words

a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not, of, on, or, such, that, the, their, then, there, these, they, this, to, was, will, with


Code Block
languagejs
themeEclipse
titleExample Configuration
{
  "type": "StopWordsStage",
  "caseInsensitive" : true,
  "stopWords" : "words-provider:stop_words"
}

describe the configuration...

Example Output

describe the example output...

Code Block
languagetext
themeFadeToGrey
V--------------[abrahamA lincolntest likesto macaronibe and cheese]----skipped]----------------V  
  ^--[abrahamA]--V--[lincolntest]--V--[likesto]--V--[macaronibe]--V--[and]--V--[cheeseskipped]--^
    
          ^--[a]-{place}---^  

Item [A] - [TOKEN,ALL_UPPER_CASE,SKIP]
Item [to] - [TOKEN,ALL_LOWER_CASE,SKIP]
Item   ^----{food}----^         ^---{food}---^
^----------{person}---------^           ^-----------------{food}--------------^[be] - [TOKEN,ALL_LOWER_CASE,SKIP]
Item [a] - [TOKEN,ALL_LOWER_CASE,SKIP]

Output Flags

Lex-Item Flags:

  • SKIP - All matched stop words will be marked as SKIP

Resource Data

The resource data will be a

...

json file with an array of words in a field named stopWrods

Code Block
languagejs
themeEclipse
{
  "stopWords": ["a", "about", "above", "after", "again", "all", "am", "an", "and", "the", "i", "who", ...]
}