Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Training is done the same way it is done in for the Name Entity recognizer. We need another recognizer to use as base and a dataset with a good quantity of samples of the test text we want to classify.

As exampleThe following steps describe how to do a training, we'll use the Aviation dataset and we will try to tag sentences that talk about incidents with birds. 

...

c. Add the Classification recognizer to the {bird-incident} tag, select '--NONE–' in the 'Model' field. NOTE: Always remember to set this field to --NONE-- when training.

d. Click on the 'Train' button, when the dialog opens:

...

So this will train a model using the Aviation dataset and using the pattern in the Fragmented recognizer.


e. Check in out 'Background Processes' tab when and wait for the Classification training is doneto be complete. Once done, go back to the {bird-incident} tag and disable the Fragmented recognizer.

f. In the Classification recognizer, select your latest created model in the model 'Model' field. It should be named like: bird-incident-[datetime stamp here].bin, for example: 'bird-incident-20190208173305.bin'. You can also use the option '--LATEST–' to always use the latest generated trained model.

g. Start a test run using the Aviation dataset.

h. Check in out 'Background Processes' tab when the test run completes and then open the Search screen to check out resultsfor completion.  When complete,  click on the 'Open Search' button to check results in the Search screen:

As you can see the Classification recognizer is tagging some sentences that in theory are supposed to be related to incidents with birds. For this case though, because the dataset is small and the positive samples identified by the Fragmented recognizer were not that bigmuch, the Classification recognizer is not doing a very good job on identifying the sentences.

It is expected that with way more and better training data the accuracy of the Classification recognizer improves. You can also play with all the different setting at settings the time of training screen has and decide which ones brings better results for your specific use case.

...