You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Next »

Flag tokens matched to stop words, to be skipped for following stages

Operates On:  Lexical Items with TOKEN

Configuration Parameters

  • caseInsensitive (string, optional) - If true, all stop words and tokens will be process as case insensitive (default = true)

  • stopWords (string, optional) - The resource containing the list of stop words

    • See below for the format. if no resource is provided the stage will use the default list of stop words
  • skipFlags (string array, optional) - Flags to be skipped by this stage
    • Tokens marked with this flags will be ignore by this stage, and no process will be performed.
  • requiredFlags (string array, optional)
    • Tokens need to have all the specified flags, in order to be processed
  • debug (boolean, optional)
    • Enable all debug log functionality of the stage, if any.


Default list of stop words

a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not, of, on, or, such, that, the, their, then, there, these, they, this, to, was, will, with


Example Configuration
{
  "type": "StopWordsStage",
  "caseInsensitive" : true,
  "stopWords" : "words-provider:stop_words"
}


Example Output

V--------------[A test to be skipped]--------------V  
  ^--[A]--V--[test]--V--[to]--V--[be]--V--[skipped]--^  
  ^--[a]--^  

Item [A] - [TOKEN, SKIP]
Item [to] - [TOKEN, SKIP]
Item [be] - [TOKEN, SKIP]
Item [a] - [TOKEN, SKIP]

Output Flags

Lex-Item Flags:

  • SKIP - All matched stop words will be marked as SKIP

Resource Data

The resource data will be a json file with an array of words in a field named stopWrods

{
  "stopWords": ["a", "about", "above", "after", "again", "all", "am", "an", "and", "the", "i", "who", ...]
}
  • No labels