...
In this section we'll go through the process of creating a couple set of tags, add some recognizers to them and finally test how they perform against a dataset. This will give the user a better idea of how the process/flow is when using the SAGA UI.
We currently have an a dataset loaded into SAGA about aviation incidents. We will try to identify incidents where a engine gets on fire due to a bird.
One important thing to consider is what stages you want in the base pipeline used by the recognizers. The base pipeline usually has some stages to pre-process text before passing it to the recognizers.
...
e. Now, do the same for the fire and engine tags:
...
The idea here is to create a tag that will use {fire}, {engine} and {bird} to identify a concept which is engine got on fire due to birds. For this special tag we'll use the Fragmented recognizer. This is an advanced recognizer that will tag text that contains the other 3 tags in any order of appearance and that are close enough from each other within the aviation report.
a. Create a tag called 'fire-by-bird'. Use similar steps you used to create the other tags.
...
c. Add the following pattern: {fire} {engine} {bird}. Make sure to check the option of 'In Any Order', Max tokens at 16 and min Min tokens at 4
d. Make sure all of your recognizers of all your tags are using the same pipeline or the pipeline you need it to be. Click on the gear icon in each one of the recognizers to open its settings and check the field 'Base Pipeline':
...
You are able to test any of your tags using the preview functionality. Let's test the {fire-by-bird} tag. Make sure to click on it in the Tag tree, then enter the following text into the preview textboxtext box: "SEAGULL STRIKE INTO TURBINE ON TAKEOFF. SEVERE VIBRATION, SMOKE AND FLAME."
...
A dialog with the SAGA graph will be shown. Note how the {bird}, {fire}, {engine} and also the {fire-by-bird} tags have identified the text:
...
Once you have tested the performance of you tags using the preview then it might be a good idea to test it against bigger text.
At the moment SAGA comes with several testing datasets but you can also create your own and upload it to a special folder in SAGA file system.
a. Always inside the {fire-by-bird} tag, click on the "Test Run" button and then click the "--- New Test Run ---" option
...
just for partial results while running, click on the "Open search" button to open the search interface.
d. Wait for completion of the test run orIn this screen you will find your tags as facets. So when selected you'll see search results containing your tags. In the following image we are clearing facets then selecting only {fire-by-bird} to check the comments that talk about engines getting on fire due to birds.
e. After reviewing results you can continue iterating on this process of reviewing results and tweaking your tags and pipelines to create the best model to use for you specific use case.
The name entity recognizer uses Apache OpenNLP to tag text using an existing model (previously trained).
In addition the recognizer can be used together with other recognizers to train a new model.
In order to use it as recognizer, you only need to add the recognizer to your tag, then choose a model, the probability threshold used to decide if something is a match or not and finally normalization tags in case you want to cleanse and normalize the input to machine learning.
Let's use the {human} tag to test this functionality:
a. Click on the search interface. the {human} tag
b. Click on the Entity recognizer and then click the gear button to bring up the settings
c. Make the Entity recognizer disabled
d. Attach the Name Entity recognizer to the {human} tag
e. Choose one of the models that come by default: "en-ner-person.bin". This model was trained to identify English names of people.
f. Put this text in the preview in order to check out the SAGA graph: "Several employees work from home, Joseph is one of them, Paul too". As you can see in the following image, the recognizer tags 'Joseph' and 'Paul' as humans: