Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

1. Introduction

The purpose of SAGA UI is to help on the creation of the model (tags, patterns, recognizer settings) that eventually SAGA Library will use to process text.

The UI is aimed to SMEs (subject matter experts), business people that are not necessary tech savvy but know a lot about the business need and the things that are needed to be identified within the text to the processed.

The UI permits the collaboration between SMEs and tech people, all using the application at the same time.  So creation and testing of the model which is a difficult and time consuming task can be accelerated to produce results sooner.

As will be described in following sections, the functionality of the UI is organized by semantic tags. So first of all, the user thinks on the things that he/she wants to identify in the text (the 'what'), then thinks on 'how' to identify those tags using the recognizers provided in the application.  For a better understanding of what SAGA is and what the purpose of the UI is, please check out this presentation in Teams.


2. Exploring the UI

The UI is pretty simple, it has a main tab selector at the top where you can select which one of the 4 areas you want to work on:

...

  • Background Processes: In this screen the user can monitor background processes running. For example, when running a test run against a dataset, this process could take a long time to complete, so the user can view check the progress in this screen. 

...

  • Search Interface: In this screen the user can review results from a Test Run.  So the flow will be something like: 
    1. The user adds a semantic tag 
    2. Then adds recognizers to this tag and configure them.
    3. The user tests effectiveness of the Tag and its recognizers by running a test run against a dataset file.
    4. To review results, the user will open the search interface to check how well (or bad) text was tagged

...

In this section we'll go through the process of creating a couple of tags, add some recognizers to them and finally test how they perform against a dataset. This will give the user a better idea of how the process/flow is when using the SAGA UI.

We currently have an Android Reddit dataset loaded into SAGA about aviation incidents.  We want to identify positive comments that talk about the Samsung Galaxy S6 phonewill try to identify incidents where a engine gets on fire due to a bird.

Step #1: Check

...

and choose your pipeline

One important thing to consider is what stages you want in the pipeline used by the recognizers. The base pipeline usually has some stages to pre-process the text before giving passing it to the recognizers.

As you can see in the following image, we'll use the baseline-pipeline which has the following stages:

  1. TextBreaker: It will split text into sentences. Noticed the language is set to English, in case you are processing another language you need to keep in mind this setting.
  2. WhiteSpaceTokenizer: it will split sentences into words using the white space as separator
  3. StopWords: it will identify words that are very common and that don't give any value on the process. Words like 'the', 'a', 'this', 'those', etc.
  4. CaseAnalysis: It will identify if the word is all UPPERCASE or lowercase and then it will convert text to lowercase. This is used usually to normalize words so they match easily when we create patterns in our recognizers.


Step #2: Create the tags you will need

We want to identify 3 things:

  1. birds
  2. fire
  3. engine

If those 3 things are present in an incident report then we could say that the incident is about engines getting on fire due to birds. So lets start by creating the {bird} tag: