Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Search Interface: In this screen the user can review results from a Test Run.  So the flow will be something like: 
    1. The user adds a semantic tag 
    2. Then adds recognizers to this tag and configure them.
    3. The user tests effectiveness of the Tag and its recognizers by running a test run against a dataset file.
    4. To review results will open the search interface to check how well (or bad) text was tagged



3. Using the UI

3.1. End to end use case

In this section we'll go through the process of creating a couple of tags, add some recognizers to them and finally test how they perform against a dataset. This will give the user a better idea of how the process/flow is when using SAGA.

We currently have an Android Reddit dataset loaded into SAGA.  We want to identify positive comments that talk about the Samsung Galaxy S6 phone.

Step #1: Check the pipeline

One important thing to consider is what stages you want in the pipeline used by the recognizers. The base pipeline usually has some stages to pre-process the text before giving it to the recognizers.

As you can see in the following image, we'll use the baseline-pipeline which has the following stages:

  1. TextBreaker: It will split text into sentences. Noticed the language is set to English, in case you are processing another language you need to keep in mind this setting.
  2. WhiteSpaceTokenizer: it will split sentences into words using the white space as separator
  3. StopWords: it will identify words that are very common and that don't give any value on the process. Words like 'the', 'a', 'this', 'those', etc.
  4. CaseAnalysis: It will identify if the word is all UPPERCASE or lowercase and then it will convert text to lowercase. This is used usually to normalize words so they match easily when we create patterns in our recognizers.

Image Added