ABA (American Bankers Association)

Implements an entity extractor for ABA (American Bankers Association) routing transit numbers (RTNs). ABA RTNs are only for use in payment transactions within the United States. They are used on paper check, wire transfers, and ACH transactions.

Advanced

Matches advanced recursive patterns of tokens and semantic tags. Pattern databases can be very large (millions) of entries.

Best Bets

This stage maintains a list of tokens used to identify possible subjects of interest and suggest a URL reference along with "title" and "description".

BIC (Bank/Business Identifier Codes)

Implements an entity extractor for Bank/Business Identifier Codes. These codes are assigned to each bank and/or business in every country and are administered by the Society for Worldwide Interbank Financial Telecommunication (SWIFT).

Classification Recognizer

This recognizer uses OpenNLP's DocumentCategorizer to load classification models and tag sentences that match the binary classification model (is or isn't in a certain category) given a specified threshold of accuracy.

This is a plugin recognizer. Uses Classification Stage

DateTime

Identifies tokens that look like dates or time indicators and flags them with the "DATE" flag.

Email

Identifies tokens that look like emails and flags them with the "EMAIL" flag.

Entity

Looks up sequences of tokens in a dictionary and then tags the sequence with one or more semantic tags as an alternative representation. Typically, these tags represent entities such as {person}, {place}, {company}, etc.

FAQ

This recognizer uses a frozen Universal Sentence Encoder TensorFlow model to encode, using sentence embedding vectors, a list of Frequently Asked Questions and tag sentences that match a question/answer pair given a specified threshold of accuracy with the question/answer from the FAQ.

The recognizer also gives the option to use a python model instead of TensorFlow. You'll need the Python Bridge running for this.

This is a plugin recognizer.

Fragmented

Identifies patterns with a combination of any number of specified tokens, regardless of the surrounding tokens.

Google Entity Recognizer

This recognizer tags entities based on a NER model trained with the Google AutoML Entity Extraction Cloud API. The recognizer connects to the cloud API to use a model that was trained by the API (Saga doesn't perform the training, yet).

This is a plugin recognizer

Saga uses a keys.json file from the service account configured to interact with this API to authenticate all GCP REST calls. More information here.

Google Knowledge

This recognizer uses the Google Knowledge API (closed alpha) for FAQ matching. FAQ could be loaded directly from an HTML page via a URL or created manually in Saga and then uploaded to the Google Knowledge service to construct the model.

This is a plugin recognizer

Saga uses a keys.json file from the service account configured to interact with this API to authenticate all GCP REST calls. More information here.

Name Recognizer

The name predictor stage uses OpenNLP's NameFinder to load Name Entity Recognizer models and tag tokens that match entities based on the model given a certain threshold of accuracy.

This is a plugin recognizer. Uses Name Predictor Stage

Number

Identifies tokens that look like numbers and flags the tokens with the "NUMBER" flag.

Phone Number

This stage identifies tokens that look like phone numbers and flag them as "PHONE".

Postal Code

This stages identifies tokens that look like postal codes and flag them as "POSTCODE".

Regex

Looks up matches to regular expressions in a dictionary across multiple tokens and then tags the match with one or more semantic tags as an alternative representation. For a simple regex expression, a match only needs to occur against a singe token. Simple Regex is recommended

Simple Regex

Token Matcher Recognizer

This recognizer works in a similar way to the Entity Recognizer in the sense that looks up sequences of tokens in a dictionary to match the text being processed. The difference is that it will also include in the matching text N tokens to the right and/or left of the original matched text. 

URL

This stage identifies tokens that looks like URL addresses and flag them as "URL".

  • No labels