1. Introduction

For a better understanding of what SAGA is and what the purpose of the UI is, please check out this presentation in Teams.

2. Exploring the UI

The UI is pretty simple, it has a main tab selector at the top where you can select which one of the 4 areas you want to work on:

Tags: in this tab the user will define all of the semantic tags to be used and also will define what recognizers and settings each tag will use.

Pipelines: here the user can add new, delete o update pipelines. Pipelines defines what stages are included in the pipeline before recognizer stages are added to the pipeline. So for example you could have stages like white space tokenizer, text case analyzer or a stop words identifier.

Datasets: Here the user can view the datasets loaded into the application to perform test runs and/or training of machine learning models. In this screen the user can define what fields to process from the dataset file and the way to split the text to feed the pipeline. The user cannot upload datasets at the moment, so datasets need to be placed in a special folder in Saga Server file system.

Background Processes: In this screen the user can monitor background processes running. For example, when running a test run against a dataset, this process could take a long time to complete, so the user can check the progress in this screen.

Search Interface: In this screen the user can review results from a Test Run. So the flow will be something like:
1. The user adds a semantic tag
2. Then adds recognizers to this tag and configure them.
3. The user tests effectiveness of the Tag and its recognizers by running a test run against a dataset file.
4. To review results, the user will open the search interface to check how well (or bad) text was tagged

3. Using the UI

3.1. End to end use case

In this section we'll go through the process of creating a couple of tags, add some recognizers to them and finally test how they perform against a dataset. This will give the user a better idea of how the process/flow is when using the SAGA UI.

We currently have an dataset loaded into SAGA about aviation incidents. We will try to identify incidents where a engine gets on fire due to a bird.

Step #1: Check and choose your pipeline

One important thing to consider is what stages you want in the pipeline used by the recognizers. The base pipeline usually has some stages to pre-process text before passing it to the recognizers.

As you can see in the following image, we'll use the baseline-pipeline which has the following stages:

WhiteSpaceTokenizer: it will split sentences into words using the white space as separator
StopWords: it will identify words that are very common and that don't give any value on the process. Words like 'the', 'a', 'this', 'those', etc.
CaseAnalysis: It will identify if the word is all UPPERCASE or lowercase and then it will convert text to lowercase. This is used usually to normalize words so they match easily when we create patterns in our recognizers.
CharChangeSplitter: Separates tokens based on character changes from lowercase-uppercase, letter-number, alphanumeric-punctuation.Without taking any character in the vertex, and respecting the capital letter

Step #2: Create basic tags you will need

We want to identify 3 things:

birds
fire
engine

If those 3 things are present in an incident report then we could say that the incident is about engines getting on fire due to birds.

a. So lets start by creating the {bird} tag:

b. Add 'SimpleRegex' and 'Entity' recognizers to the bird tag:

c. Add the following patterns to the entity recognizer. See the image below to know how to do it. Repeat those steps for each of the following patterns

duck
hawk
seagull

d. Also add the following regex in the simpleRegex recognizer. ( Note: Steps are very similar to how entities were added in Entity recognizer)

bird[s]?

e. Now, do the same for the fire and engine tags:

Step #3: Add the {fire-by-bird} tag that will use the other tags to identify when an engine gets on fire due to birds

The idea here is to create a tag that will use {fire}, {engine} and {bird} to identify a concept which is engine got on fire due to birds. For this special tag we'll use the Fragmented recognizer. This is an advanced recognizer that will tag text that contains the other 3 tags in any order of appearance and that are close enough from each other.

a. Create a tag called 'fire-by-bird'. Use similar steps you used to create the other tags.

b. Attach the Fragmented recognizer to the tag

c. Add the following pattern: {fire} {engine} {bird}. Make sure to check the option of 'In Any Order', Max tokens at 16 and min tokens at 4

d. Make sure all of your recognizers of all your tags are using the same pipeline or the pipeline you need it to be. Click on the gear icon in each one of the recognizers to open its settings and check the field 'Base Pipeline':

Step #4: Let's do a quick test using the preview

You are able to test any of your tags using the preview functionality. Let's test the {fire-by-bird} tag. Make sure to click on it in the Tag tree, then enter the following text into the preview textbox: "SEAGULL STRIKE INTO TURBINE ON TAKEOFF. SEVERE VIBRATION, SMOKE AND FLAME."

A dialog with the SAGA graph will be shown. Note how the {bird}, {fire}, {engine} and also the {fire-by-bird} tags have identified the text:

Step #5: Now let's do a test run using the full Aviation dataset

a. Always inside the {fire-by-bird} tag, click on the "Test Run" button and then click the "--- New Test Run ---" option

b. Select the Aviation-Incidents dataset and click on Execute button

c. Click the "Background Processes" tab to check the progress of the run

d. Wait for completion of the test run or just click on the "Open search" button to check the results on the search interface.

Page tree

SAGA UI - User Manual