Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

1. Introduction

 For a better understanding of what SAGA Saga is and what the purpose of the UI is, please check out this presentation in Teams.


2. Exploring the UI

The UI is pretty simple, it has a main tab selector at the top where you can select which one of the 4 areas you want to work on:

...

  • Search Interface: In this screen the user can review results from a Test Run.  So the flow will be something like: 
    1. The user adds a semantic tag 
    2. Then adds recognizers to this tag and configure them.
    3. The user tests effectiveness of the Tag and its recognizers by running a test run against a dataset file.
    4. To review results, the user will open the search interface to check how well (or bad) text was tagged



3. Using the UI

3.1. End to end use case

In this section we'll go through the process of creating a set of tags, add some recognizers to them and finally test how they perform against a dataset. This will give the user a better idea of how the process/flow is when using the SAGA Saga UI.

We currently have a dataset loaded into SAGA Saga about aviation incidents.  We will try to identify incidents where a engine gets on fire due to a bird.

Step #1: Check and choose your base pipeline

One important thing to consider is what stages you want in the base pipeline used by recognizers. The base pipeline usually has some stages to pre-process text before passing it to the recognizers.

...

  1. WhiteSpaceTokenizer: it will split sentences into words using the white space as separator
  2. StopWords: it will identify words that are very common and that don't give any value on the process. Words like 'the', 'a', 'this', 'those', etc.
  3. CaseAnalysis: It will identify if the word is all UPPERCASE or lowercase and then it will convert text to lowercase. This is used usually to normalize words so they match easily when we create patterns in our recognizers.
  4. CharChangeSplitter: Separates tokens based on character changes from lowercase-uppercase, letter-number, alphanumeric-punctuation.Without taking any character in the vertex, and respecting the capital letter


Step #2: Create basic tags you will need

We want to identify 3 things:

...

e. Now, do the same for the fire and engine tags:


Step #3: Add the {fire-by-bird} tag that will use the other tags

The idea here is to create a tag that will use {fire}, {engine} and {bird} to identify a concept which is engine got on fire due to birds.  For this special tag we'll use the Fragmented recognizer. This is an advanced recognizer that will tag text that contains the other 3 tags in any order of appearance and that are close enough from each other within the aviation report.

...

d. Make sure all of your recognizers of all your tags are using the same pipeline or the pipeline you need it to be. Click on the gear icon in each one of the recognizers to open its settings and check the field 'Base Pipeline':


Step #4: Quick test using the preview

You are able to test any of your tags using the preview functionality. Let's test the {fire-by-bird} tag. Make sure to click on it in the Tag tree, then enter the following text into the preview text box: "SEAGULL STRIKE INTO TURBINE ON TAKEOFF. SEVERE VIBRATION, SMOKE AND FLAME."


A dialog with the SAGA Saga graph will be shown. Note how the {bird}, {fire}, {engine} and also the {fire-by-bird}  tags have identified the text:



Step #5: Perform a Test Run with a dataset

Once you have tested the performance of you tags using the preview then it might be a good idea to test it against bigger text.

At the moment SAGA Saga comes with several testing datasets but you can also create  your own and upload it to a special folder in SAGA Saga file system.


a. Always inside the {fire-by-bird} tag, click on the "Test Run" button and then click the "--- New Test Run ---" option

...

e. After reviewing results you can continue iterating on this process of reviewing results and tweaking your tags and pipelines to create the best model to use for you specific use case.


3.2  Machine learning recognizers

3.2.1 Name Entity recognizer

The name entity recognizer uses Apache OpenNLP to tag text using an existing model (previously trained).

In addition the recognizer can be used together with other recognizers to train a new model.

3.2.1.1 Using it as recognizer

In order to use it as recognizer, you only need to add the recognizer to your tag, then choose a model, the probability threshold used to decide if something is a match or not and finally normalization tags in case you want to cleanse and normalize the input to machine learning.

...

f. Put this text in the preview in order to check out the SAGA Saga graph: "Several employees work from home, Joseph is one of them, Paul too". As you can see in the following image, the recognizer tags 'Joseph' and 'Paul' as humans:

3.2.1.2 Using it as a trainer