ABA (American Bankers Association)

Implements an entity extractor for ABA (American Bankers Association) routing transit numbers (RTNs). ABA RTNs are only for use in payment transactions within the United States. They are used on paper check, wire transfers, and ACH transactions.

Advanced

Matches advanced recursive patterns of tokens and semantic tags. Pattern databases can be very large (millions) of entries.

Best Bets

This stage maintains a list of tokens used to identify possible subjects of interest and suggest a URL reference along with "title" and "description".

BIC (Bank/Business Identifier Codes)

Implements an entity extractor for Bank/Business Identifier Codes. These codes are assigned to each bank and/or business in every country and are administered by the Society for Worldwide Interbank Financial Telecommunication (SWIFT).

Classification Recognizer

This recognizer uses OpenNLP's DocumentCategorizer to load classification models and tag sentences that match the binary classification model (is or isn't in a certain category) given a specified threshold of accuracy.

This is a plugin recognizer. Uses Classification Stage

Credit Card Number

Implements an entity extractor for Credit Card numbers. This is often referred to as the ‘long number’ on the front of your credit card, which is usually 16 digits, but can be up to 19 digits in some instances. The first digit indicates the provider, for example:  Mastercard numbers start with a 2 or 5, Visa card numbers start with a 4 and American Express numbers start with a 3.

DateTime

Identifies tokens that look like dates or time indicators and flags them with the "DATE" flag.

Email

Identifies tokens that look like emails and flags them with the "EMAIL" flag.

Entity

Looks up sequences of tokens in a dictionary and then tags the sequence with one or more semantic tags as an alternative representation. Typically, these tags represent entities such as {person}, {place}, {company}, etc.

FAQ

This recognizer uses a frozen Universal Sentence Encoder TensorFlow model to encode, using sentence embedding vectors, a list of Frequently Asked Questions and tag sentences that match a question/answer pair given a specified threshold of accuracy with the question/answer from the FAQ.

The recognizer also gives the option to use a python model instead of TensorFlow. You'll need the Python Bridge running for this.

This is a plugin recognizer. Uses FAQ Stage

Federal ID

Detect federal identifications such as U.S. SSN, Canada SIN, UK NINo and Costa Rica cédula.

Fragmented

Identifies patterns with a combination of any number of specified tokens, regardless of the surrounding tokens.

Geonames

Identifies geo locations, based on the patterns loaded.

Google Entity Recognizer

This recognizer tags entities based on a NER model trained with the Google AutoML Entity Extraction Cloud API. The recognizer connects to the cloud API to use a model that was trained by the API (Saga doesn't perform the training, yet).

This is a plugin recognizer

Saga uses a keys.json file from the service account configured to interact with this API to authenticate all GCP REST calls. More information here.

Google Knowledge

This recognizer uses the Google Knowledge API (closed alpha) for FAQ matching. FAQ could be loaded directly from an HTML page via a URL or created manually in Saga and then uploaded to the Google Knowledge service to construct the model.

This is a plugin recognizer

Saga uses a keys.json file from the service account configured to interact with this API to authenticate all GCP REST calls. More information here.

IBAN

Implements an entity extractor for International Bank Account Numbers. These codes are assigned to individual bank accounts (mostly EU, Middle East, & Caribbean).

Intent Recognizer

The Intent stage does a semantic comparison of a provided sentence against possible intents the recognizer already has. If the confidence value is in the threshold, it will create a tag holding the intent.


This is a plugin recognizer. Uses Intent Stage


This recognizer can be used with 2 model types:

  1. A frozen Universal Sentence Encoder TensorFlow model. This one is stored in "[saga-home]\tf-models" directory.
  2. Any of the models that create embeddings that are available in the Python Bridge.


Both the intents stored in the recognizer as the query entered by the user are encoded (using sentence embedding vectors) and then compared.  The intent recognizer will chose the intent that best matches the query.

IP Address

Identifies Internet Protocol address (IP) of version v4 and v6.

Latitude and Longitude

Identifies latitude and longitude including the cardinal direction (North, West, East, South).

MAC Address

Identifies media access control address (MAC address). MAC addresses are recognizable as six groups of two hexadecimal digits, separated by hyphens, colons, or without a separator.

MAID

Identifies the format for Global Device Advertising Identifiers (i.e. iDFA, GAID, Roku ID) used in the digital advertising ecosystem.

Name Recognizer

The name predictor stage uses OpenNLP's NameFinder to load Name Entity Recognizer models and tag tokens that match entities based on the model given a certain threshold of accuracy.
(If you need the model, Go to OpenNLP Models and look for en-ner-person.bin)

This is a plugin recognizer. Uses Name Predictor Stage

Number

Identifies tokens that look like numbers and flags the tokens with the "NUMBER" flag.

Phone Number

This stage identifies tokens that look like phone numbers and flag them as "PHONE".

Postal Code

This stages identifies tokens that look like postal codes and flag them as "POSTCODE".

Python Classification Watcher

Connects directly to the Python Bridge, to send text or sections of the interpretation graph to be processed by ML algorithms in Python.

This recognizer is used when there is a need to classify an entire document for example. That is the difference when compared to the Python Model Recognizer which will be processing or running for each token or text block.

Processing an entire document has its benefits, it may be the best way to classify a document as a whole.  By running only once per document we get a boost in performance when compared to run the recognizer for each individual token or text block.

The other benefit is that we could normalize the text before sending it to the python model and also specify dependent tags so it runs in the order we need in the processing pipeline.


Python Model Recognizer

Connects directly to the Python Bridge, to send text or sections of the interpretation graph to be processed by ML algorithms in Python.

If authentication is needed to connect to the Python Bridge, enable the "Authenticate" checkbox and enter the Service username and password as below:

If authentication is not required or authentication is successful, select Model Name, Version and/or Method then click on the Save button:

Regex

Looks up matches to regular expressions in a dictionary across multiple tokens and then tags the match with one or more semantic tags as an alternative representation. For a simple regex expression, a match only needs to occur against a singe token. Simple Regex is recommended

Simple Regex

Accepts Java regular expressions then tries to match those expressions against the tokens coming in the pipeline. When there is a match the text will be tagged with a Semantic Tag.  In case you need the regex to match several tokens you can use the Regex Recognizer, just be aware that it is heavier on processing.

Token Matcher Recognizer

This recognizer works in a similar way to the Entity Recognizer in the sense that looks up sequences of tokens in a dictionary to match the text being processed. The difference is that it will also include in the matching text N tokens to the right and/or left of the original matched text. 

URL

This stage identifies tokens that looks like URL addresses and flag them as "URL".

  • No labels