This recognizer uses OpenNLP's DocumentCategorizer to load classification models and tag sentences that match the binary classification model (is or isn't in a certain category) given a specified threshold of accuracy.

This is a plugin recognizer. Uses Classification Stage

Configuration

  • Model ( type=string | default=--LATEST-- | required ) - Model to use for classification, by default it always uses the latest, but can be configure to use one in specific
  • Minimum Probability ( type=double | default=.95 | required ) - Probability threshold. Will only tag sentences that match better or equal to the value specified.
  • Normalize Tags ( type=string array | optional ) - Indicates which tags must be first recognized in order to do the classification and training
    • This helps to reduce the noise in the text, for example reducing every set of numbers to {number}
  • Positive Sample Ration ( type=double | default=0.5 | optional ) - The amount of training data which represents the positive samples
    • If there are 5000 positive samples, that 5000 represents 60% (with a PSR of 0.6) of the total training data, where the other 40% will be negative samples, around 3333 negative samples

Training a Model

Click on the button which will popup the "Start Training Run" dialog

  • Datasets ( type=boolean | required ) - Selects the Dataset to use as training data
  • Iterations ( type=integer | default=200 | required )
  • Cut Off ( type=integer | default=5 | required )
  • Threads ( type=integer | default=2 | required )
  • Feature Selection ( type=string | default=BoW | required )
    • BoW (Bag of Words)
    • N-Gram
  • Algorithm ( type=string | default=MAXENT_QN | required )
    • Available algorithms
      • MAXENT_QN

      • MAXENT

      • NAIVEBAYES

      • PERCEPTRON

  • L1Cost ( type=double | default=0.1 | required )
  • L2Cost ( type=double | default=0.1 | required )
  • Number of Updates ( type=integer | default=15 | required )
  • Max FctEval ( type=integer | default=30000 | required )

General Settings

The general settings can be accessed by clicking on

More settings could be displayed in the same dialog, it varies per recognizer.


  • Enable - Enable the processor to be use in pipelines.
  • Base Pipeline - Indicates the last stage, from a pipeline, needed by the recognizer.
  • Skip Flags ( optional ) - Lexical items flags to be ignored by this processor.
  • Boundary Flags  ( optional ) - List of vertex flags that indicate the beginning and end of a text block.
  • Required Flags ( optional ) - Lexical items flags required by every token to be processed.
  • At Least One Flag ( optional ) - Lexical items flags needed by every token to be processed.
  • Don't Process Flags ( optional ) - List of lexical items flags that are not processed. The difference with "Skip Flags" is that this will drop the path in the Saga graph, skip just skips the token and continues in the same path.
  • Confidence Adjustment - Adjustment factor to apply to the confidence value of 0.0 to 2.0 from (Applies for every match).
    • 0.0 to < 1.0  decreases confidence value
    • 1.0 confidence value remains the same
    • > 1.0 to  2.0 increases confidence value
  • Debug - Enable debug logging.

  • No labels