Release date: September 12th, 2023

Tech Stack

NoSQL DB provider supported

  • Elasticsearch versions 7.14 - 7.17 and 8.8.1
  • Opensearch v. 1.1

Java supported

  • OpenJDK 17

Python supported

  • Python version 3.11.4

Node.js

  • Node.js LTS v. 18.17.1.


New Features

UI/UX

  • Copy ProcessorID button for pipeline stages

Processors and Recognizers

  • Addition of Japanese, Korean and Chinese tokenizer setting in pattern based recognizers (Entity, Bestbet, Geonames )


Improvements

Aspire Saga-Parser

  • Upgraded Aspire libraries to version 5.2
  • Refactor to decouple Saga from Saga-Parser (it will be available in Aspire 5.3)
  • Addition of configurable cache to improve processing performance
  • New setting to run a Python Bridge instance for every Saga Engine created in the Aspire Worker
  • Removed maximum number of engines setting

Server

  • Critical and High vulnerabilities found in security scans fixed.
  • Support Elasticsearch 8.8.1
  • Addition of HealthCheck endpoint to Saga and Python Bridge
  • Change in SSO authentication to use OIDC instead of SAML
  • Addition of T5 and MiniLM models to Python Bridge
  • Addition of SSL and Authentication to the Python Bridge
  • Migration to latest version 5 of Javalin
  • Javalin max payload size is now configurable in config.json file
  • Improving security by moving logs of data processed from Info to Debug
  • Migration to Apache 2 licensed version of Elasticsearch client library
  • Migrated to Java 17
  • Returning a user error when using a Processing Unit that doesn't exist when processing text
  • Addition of the "id" field in the importer for the FAQ recognizer
  • Https port is now configurable

UI/UX

  • Countries/Languages configuration in PostalCode and Number recognizers are now dropdowns instead of free textboxes
  • Increased maximum length of the hostname in the Python Model recognizer
  • Upgrade to Angular 15

Recognizers/Processors

  • Accuracy improvements in PostalCode for USA
  • Support for additional scientific notation (ex. 2.1E+11) in the Number recognizer
  • Supporting identification of SSN without format in the FederalId recognizer
  • Accuracy improvement in FederalId recognizer by discarding invalid SSNs
  • New setting in DateTime recognizer to identify dates in the past/future/both
  • Accuracy improvement in CreditCard recognizer by implementing LUHN check
  • Performance improvement at loading time in Regex and SimpleRegex recognizers
  • Performance improvement at loading time in the CreditCard recognizer
  • Performance improvement at loading time in the Entity recognizer
  • Supporting many to many synonyms in the Synonym stage
  • Phone Number supports UK number format more closely, along with area codes

Docker

  • Image tag now includes base layer name (ex. saga-server:1.3.3-javacio17-base)
  • Now using CIO recommended base layer


Bug fixes

  • NPE in Saga-Parser when retrieving tags
  • Missing entries in Saga export file
  • Importer fails in Linux
  • Error when selecting a model in FAQ recognizer when default model is not present
  • Error starting Saga when no indices are present in Elasticsearch
  • Python bridge is not downloading Bert models in Windows
  • Error loading a dictionary on start up after importing a bad .sg file
  • Saga-Parser failing on initialization when numeric settings are passed as string
  • Saga failing when processing big payload from Elasticsearch
  • FederalId recognizer not recognizing 11 digit numbers
  • NPE in synonym stage
  • Wrong matching text in Regex recognizer when special characters are processed
  • Index out of bounds error when processing big content with no breaks
  • Lat/Long recognizer not supporting degree sign when next to a number
  • Stack overflow error in LucenePipeline processor when configured with Korean tokenizer
  • Export for data science not working
  • UI pagination shows wrong numbers after a search is done
  • Intent recognizer not working with TensorFlow model
  • Intent and FAQ recognizer not working
  • Error in Sentence Breaker OpenNLP when used from Saga-Parser
  • Corrupted data in the Classification Watcher
  • Processing Unit is not cleaning the Saga Graph when an error occurs traversing the graph
  • API is creating Processing Unit without a tag
  • NPE in FederalId recognizer
  • Preview pop-up overflows with large text
  • Hitting Enter in the preview textbox adds a new line
  • Error in Bestbets recognizer when property is null
  • Releasing vector to memory pool error when processing lots of documents
  • Unknow query error in GeoNames recognizer
  • Bert models stuck loading in Python Bridge
  • UI issues adding a stage in the Lucene Pipeline processor
  • Invalid combination of arguments error in Python Bridge when using a Bert model in the Intent Recognizer
  • Error when using Lucene Pipeline processor with no tokenizer, added validation and default tokenizer
  • NPE in Saga-Parser when using Aspire with embedded JRE
  • Phone Number recognizer was not checking correctly the area code
  • Engine provider should wait for resources to be loaded before creating new engines
  • Error when selecting many tags in Export for Data Science 
  • Pagination at the bottom of the UI is not refreshing correctly
  • Error when trying to authenticate without host/port in Python Model recognizer


Note: you can find all the details in Jira here.



Known Issues

  • Spell checker only works with Elasticsearch
  • Saga-Parser using recognizers that need Python Bridge, cannot connect to a Python Bridge set up with HTTPS, only HTTP
  • Debug setting in Saga-parser causes a NPE during crawl. This setting needs to be disabled
  • Statistics screen is not rendering correctly after an Evaluation with a dataset is performed
  • Check Tag button in preview screen is not working properly
  • Depending on connection quality a false "Server Down" message could appear in the UI 














  • No labels