This Stage flags tokens that are matched to Stop-Words. The flagged tokens will be skipped in subsequent stages (if so indicated on the configuration).
Operates On: Lexical Items with TOKEN and possibly other flags as specified below.
caseInsensitive ( type=boolean | default=true | optional ) - If true, all stop words and tokens will be processed as case insensitive.
stopWords ( type=string
| optional
)
- The resource containing the list of stop words. Or the direct list of stop words
Default list of stop words
a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not, of, on, or, such, that, the, their, then, there, these, they, this, to, was, will, with
"caseInsensitive" : true, "stopWords" : "words-provider:stop_words"
"caseInsensitive" : true, "stopWords" : ["a", "about", "above", "after", "again", "all", "am", "an", "and", "the", "i", "who", ...]
V--------------[A test to be skipped]--------------V ^--[A]--V--[test]--V--[to]--V--[be]--V--[skipped]--^ ^--[a]--^ Item [A] - [TOKEN, STOP_WORD ] Item [to] - [TOKEN, STOP_WORD ] Item [be] - [TOKEN, STOP_WORD ] Item [a] - [TOKEN, STOP_WORD ]
No vertices are created in this stage
The resource data will be a json file with an array of words in a field named stopWords.
"stopWords": ["a", "about", "above", "after", "again", "all", "am", "an", "and", "the", "i", "who", ...]