You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

1. Introduction

The purpose of SAGA UI is to help on the creation of the model (tags, patterns, recognizer settings) that eventually SAGA Library will use to process text.

The UI is aimed to SMEs (subject matter experts), business people that are not necessary tech savvy but know a lot about the business need and the things that are needed to be identified within the text to the processed.

The UI permits the collaboration between SMEs and tech people, all using the application at the same time.  So creation and testing of the model which is a difficult and time consuming task can be accelerated to produce results sooner.

As will be described in following sections, the functionality of the UI is organized by semantic tags. So first of all, the user thinks on the things that he/she wants to identify in the text (the 'what'), then thinks on 'how' to identify those tags using the recognizers provided in the application. 


2. Exploring the UI

The UI is pretty simple, it has a main tab selector at the top where you can select which one of the 4 areas you want to work on:

  • Tags: in this tab the user will define all of the semantic tags to be used and also will define what recognizers and settings each tag will use.


  • Pipelines: here the user can add new, delete o update pipelines. Pipelines defines what stages are included in the pipeline before recognizer stages are added to the pipeline. So for example you could have stages like white space tokenizer, text case analyzer or a stop words identifier.



  • Datasets: Here the user can view the datasets loaded into the application to perform test runs and/or training of machine learning models. In this screen the user can define what fields to process from the dataset file and the way to split the text to feed the pipeline.  The user cannot upload datasets at the moment, so datasets need to be placed in a special folder in Saga Server file system.



  • Background Processes: In this screen the user can monitor background processes running. For example, when running a test run against a dataset, this process could take a long time to complete, so the user can view the progress in this screen. 


  • Search Interface: In this screen the user can review results from a Test Run.  So the flow will be something like: 
    1. The user adds a semantic tag 
    2. Then adds recognizers to this tag and configure them.
    3. The user tests effectiveness of the Tag and its recognizers by running a test run against a dataset file.
    4. To review results will open the search interface to check how well (or bad) text was tagged



3. Using the UI

3.1. End to end use case

In this section we'll go through the process of creating a couple of tags, add some recognizers to them and finally test how they perform against a dataset. This will give the user a better idea of how the process/flow is when using SAGA.

We currently have an Android Reddit dataset loaded into SAGA.  We want to identify positive comments that talk about the Samsung Galaxy S6 phone.

Step #1: Check the pipeline

One important thing to consider is what stages you want in the pipeline used by the recognizers. The base pipeline usually has some stages to pre-process the text before giving it to the recognizers.

As you can see in the following image, we'll use the baseline-pipeline which has the following stages:

  1. TextBreaker: It will split text into sentences. Noticed the language is set to English, in case you are processing another language you need to keep in mind this setting.
  2. WhiteSpaceTokenizer: it will split sentences into words using the white space as separator
  3. StopWords: it will identify words that are very common and that don't give any value on the process. Words like 'the', 'a', 'this', 'those', etc.
  4. CaseAnalysis: It will identify if the word is all UPPERCASE or lowercase and then it will convert text to lowercase. This is used usually to normalize words so they match easily when we create patterns in our recognizers.








  • No labels