Flag tokens matched to stop words, to be skipped for following stages
Operates On: Lexical Items with TOKEN
caseInsensitive (string, optional) - If true, all stop words and tokens will be process as case insensitive (default = true)
stopWords (string, optional) - The resource containing the list of stop words
Default list of stop words
a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not, of, on, or, such, that, the, their, then, there, these, they, this, to, was, will, with
{ "type": "StopWordsStage", "caseInsensitive" : true, "stopWords" : "words-provider:stop_words" }
V--------------[A test to be skipped]--------------V ^--[A]--V--[test]--V--[to]--V--[be]--V--[skipped]--^ ^--[a]--^ Item [A] - [TOKEN, SKIP] Item [to] - [TOKEN, SKIP] Item [be] - [TOKEN, SKIP] Item [a] - [TOKEN, SKIP]
The resource data will be a json file with an array of words in a field named stopWrods
{ "stopWords": ["a", "about", "above", "after", "again", "all", "am", "an", "and", "the", "i", "who", ...] }