You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

1. Introduction

 For a better understanding of what SAGA is and what the purpose of the UI is, please check out this presentation in Teams.


2. Exploring the UI

The UI is pretty simple, it has a main tab selector at the top where you can select which one of the 4 areas you want to work on:

  • Tags: in this tab the user will define all of the semantic tags to be used and also will define what recognizers and settings each tag will use.


  • Pipelines: here the user can add new, delete o update pipelines. Pipelines defines what stages are included in the pipeline before recognizer stages are added to the pipeline. So for example you could have stages like white space tokenizer, text case analyzer or a stop words identifier.



  • Datasets: Here the user can view the datasets loaded into the application to perform test runs and/or training of machine learning models. In this screen the user can define what fields to process from the dataset file and the way to split the text to feed the pipeline.  The user cannot upload datasets at the moment, so datasets need to be placed in a special folder in Saga Server file system.



  • Background Processes: In this screen the user can monitor background processes running. For example, when running a test run against a dataset, this process could take a long time to complete, so the user can check the progress in this screen. 


  • Search Interface: In this screen the user can review results from a Test Run.  So the flow will be something like: 
    1. The user adds a semantic tag 
    2. Then adds recognizers to this tag and configure them.
    3. The user tests effectiveness of the Tag and its recognizers by running a test run against a dataset file.
    4. To review results, the user will open the search interface to check how well (or bad) text was tagged



3. Using the UI

3.1. End to end use case

In this section we'll go through the process of creating a couple of tags, add some recognizers to them and finally test how they perform against a dataset. This will give the user a better idea of how the process/flow is when using the SAGA UI.

We currently have an dataset loaded into SAGA about aviation incidents.  We will try to identify incidents where a engine gets on fire due to a bird.

Step #1: Check and choose your pipeline

One important thing to consider is what stages you want in the pipeline used by the recognizers. The base pipeline usually has some stages to pre-process text before passing it to the recognizers.

As you can see in the following image, we'll use the baseline-pipeline which has the following stages:

  1. TextBreaker: It will split text into sentences. Noticed the language is set to English, in case you are processing another language you need to keep in mind this setting.
  2. WhiteSpaceTokenizer: it will split sentences into words using the white space as separator
  3. StopWords: it will identify words that are very common and that don't give any value on the process. Words like 'the', 'a', 'this', 'those', etc.
  4. CaseAnalysis: It will identify if the word is all UPPERCASE or lowercase and then it will convert text to lowercase. This is used usually to normalize words so they match easily when we create patterns in our recognizers.


Step #2: Create the tags you will need

We want to identify 3 things:

  1. birds
  2. fire
  3. engine

If those 3 things are present in an incident report then we could say that the incident is about engines getting on fire due to birds. So lets start by creating the {bird} tag:












  • No labels