These stages are external libraries to the Saga Core library, and need to be added as dependencies to your application.

Text Block Breakers

Breakers read text blocks and break them into smaller text blocks.

  • Sentence Breaker - Breaks a text block into sentences, using OpenNLP Sentence Detector

Recognizers

Recognizers identify and flag tokens based on their character patterns.

  • Parts Of Speech - Part Of Speech tags a word in a text (corpus) as corresponding to a particular part of speech such as noun, verb, adjective, etc., based on its definition, as well as its context. Using OpenNLP (https://opennlp.apache.org/) and its POS Tagger

Spell Checkers

Spell checkers process specific tokens to identify misspellings and add alternatives to the interpretation graph.

  • Spellchecker - This stage review tokens using Elasticsearch suggestions functionality and creates a new token with a "suggestion" for word it does not recognize.

Language Detectors

Language detectors use OpenNLP (https://opennlp.apache.org/) and its language detector model to identify the language of a text block.

Machine Learning

These stages load a ML model and evaluate input text through Saga.

  • Name Entity Recognizer - The name predictor stage uses OpenNLP's NameFinder to load Name Entity Recognizer models and tag tokens that match entities based on the model given a certain threshold of accuracy.
  • Sentence Classifier - The sentence classifier stage uses OpenNLP's DocumentCategorizer to load classification models and tag sentences that match the binary classification model (is or isn't in a certain category) given a specified threshold of accuracy.
  • FAQ - The FAQ stage does a semantic comparison of a sentence against questions and its respective answer (using TensorFlow), if the confidence value is in the threshold, it will create a tag holding the question and answer.



  • No labels