Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

As you can see in the following image, we'll use the baseline-pipeline which has the following stages:

  1. TextBreaker: It will split text into sentences. Noticed the language is set to English, in case you are processing another language you need to keep in mind this setting.
  2. WhiteSpaceTokenizer: it will split sentences into words using the white space as separator
  3. StopWords: it will identify words that are very common and that don't give any value on the process. Words like 'the', 'a', 'this', 'those', etc.
  4. CaseAnalysis: It will identify if the word is all UPPERCASE or lowercase and then it will convert text to lowercase. This is used usually to normalize words so they match easily when we create patterns in our recognizers.
  5. CharChangeSplitter: Separates tokens based on character changes from lowercase-uppercase, letter-number, alphanumeric-punctuation.Without taking any character in the vertex, and respecting the capital letter


    Image Modified

Step #2: Create

...

basic tags you will need

We want to identify 3 things:

...

If those 3 things are present in an incident report then we could say that the incident is about engines getting on fire due to birds.

a. So lets start by creating the {bird} tag:



Image Added


b. Add 'SimpleRegex' and 'Entity' recognizers to the bird tag:

Image Added

c. Add the following patterns to the entity recognizer.  See the image below to know how to do it. Repeat those steps for each of the following patterns

  • duck
  • hawk
  • seagull


Image Added


d. Also add the following regex in the simpleRegex recognizer. ( Note: Steps are very similar to how entities were added in Entity recognizer)

  • bird[s]?

Image Added

e. Now, do the same for the fire and engine tags:

Image Added


Image Added

Step #3: Add the {fire-by-bird} tag that will use the other tags to identify when an engine gets on fire due to birds

The idea here is to create a tag that will use {fire}, {engine} and {bird} to identify a concept which is engine got on fire due to birds.  For this special tag we'll use the Fragmented recognizer. This is an advanced recognizer that will tag text that contains the other 3 tags in any order of appearance and that are close enough from each other.

a. Create a tag called 'fire-by-bird'. Use similar steps you used to create the other tags.

b. Attach the Fragmented recognizer to the tag

c. Add the following pattern: {fire} {engine} {bird}. Make sure to check the option of 'In Any Order', Max tokens at 16 and min tokens at 4


Image Added

d. Make sure all of your recognizers of all your tags are using the same pipeline or the pipeline you need it to be. Click on the gear icon in each one of the recognizers to open its settings and check the field 'Base Pipeline':

Image Added


Step #4: Let's do a quick test using the preview

You are able to test any of your tags using the preview functionality. Let's test the {fire-by-bird} tag. Make sure to click on it in the Tag tree, then enter the following text into the preview textbox: "SEAGULL STRIKE INTO TURBINE ON TAKEOFF. SEVERE VIBRATION, SMOKE AND FLAME."


Image Added

A dialog with the SAGA graph will be shown. Note how the {bird}, {fire}, {engine} and also the {fire-by-bird}  tags have identified the text:

Image Added