Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Excerpt

This recognizer uses OpenNLP's DocumentCategorizer to load classification models and tag sentences that match the binary classification model (is or isn't in a certain category) given a specified threshold of accuracy.

Note

This is a plugin recognizer. Uses Classification Stage

Configuration

  • Parameter
    summaryModel to use for classification, by default it always uses the latest, but can be configure to use one in specific
    default--LATEST--
    nameModel
    requiredtrue
  • Parameter
    summaryProbability threshold. Will only tag sentences that match better or equal to the value specified.
    default.95
    nameMinimum Probability
    typedouble
    requiredtrue
  • Parameter
    summaryIndicates which tags must be first recognized in order to do the classification and training
    nameNormalize Tags
    typestring array

    • This helps to reduce the noise in the text, for example reducing every set of numbers to {number}
  • Parameter
    summaryThe amount of training data which represents the positive samples
    default0.5
    namePositive Sample Ration
    typedouble

    • If there are 5000 positive samples, that 5000 represents 60% (with a PSR of 0.6) of the total training data, where the other 40% will be negative samples, around 3333 negative samples

Training a Model

Click on the  button which will popup the "Start Training Run" dialog

  • Parameter
    summarySelect Selects the Dataset to use as training data
    nameDatasets
    typeboolean
    requiredtrue
  • Parameter
    default200
    nameIterations
    typeinteger
    requiredtrue
  • Parameter
    default5
    nameCut Off
    typeinteger
    requiredtrue
  • Parameter
    default2
    nameThreads
    typeinteger
    requiredtrue
  • Parameter
    defaultBoW
    nameFeature Selection
    requiredtrue
    • BoW (Bag of Words)
    • N-Gram
  • Parameter
    defaultMAXENT_QN
    nameAlgorithm
    requiredtrue
    • Available algorithms
      • MAXENT_QN

      • MAXENT

      • NAIVEBAYES

      • PERCEPTRON

  • Parameter
    default0.1
    nameL1Cost
    typedouble
    requiredtrue
  • Parameter
    default0.1
    nameL2Cost
    typedouble
    requiredtrue
  • Parameter
    default15
    nameNumber of Updates
    typeinteger
    requiredtrue
  • Parameter
    default30000
    nameMax FctEval
    typeinteger
    requiredtrue

General Settings

Include Page
Generic Recognizer Config
Generic Recognizer Config