Release date: September 12th, 2023
Tech Stack
NoSQL DB provider supported
- Elasticsearch versions 7.14 - 7.17 and 8.8.1
- Opensearch v. 1.1
Java supported
Python supported
Node.js
- Node.js LTS v. 18.1217.1.
New Features
UI/UX
- Copy ProcessorID button for pipeline stages
Processors and Recognizers
- Addition of Japanese, Korean and Chinese tokenizer setting in pattern based recognizers (Entity, Bestbet, Geonames )
Improvements
Aspire Saga-Parser
- Upgraded Aspire libraries to version 5.2
- Refactor to decouple Saga from Saga-Parser (it will be available in Aspire 5.3)
- Addition of configurable cache to improve processing performance
- New setting to run a Python Bridge instance for every Saga Engine created in the Aspire Worker
- Removed maximum number of engines setting
Server
- Critical and High vulnerabilities found in security scans fixed.
- Support to work with Elasticsearch 8.8.1
- Addition of HealthCheck endpoint to Saga and Python Bridge
- Change in SSO authentication to use OIDC instead of SAML
- Addition of T5 and MiniLM models to Python Bridge
- Addition of SSL and Authentication to the Python Bridge
- Migration to latest version 5 of Javalin
- Javalin max payload size is now configurable in config.json file
- Improving security by moving logs of data processed from Info to Debug
- Migration to Apache 2 licensed version of Elasticsearch client library
- Support for Migrated to Java 17
- Returning an a user error when using a Processing Unit that doesn't exist when processing text
- Addition of the "id" field in the importer for the FAQ recognizer
- Https port is not now configurable
UI/UX
- Countries/Languages configuration in PostalCode and Number recognizers are now dropdowns instead of free textboxes
- Increased maximum length of the hostname in the Python Model recognizer
- Upgrade to Angular 15
Recognizers/Processors
- Accuracy improvements in PostalCode for USA
- Support for additional scientific notation (ex. 2.1E+11) in the Number recognizer
- Supporting identification of SSN without format in the FederalId recognizer
- Accuracy improvement in FederalId recognizer by discarting discarding invalid SSNs
- New setting in DateTime recognizer to identify dates in the past/future/both
- Accuracy improvement in CreditCard recognizer by implementing LUHN check
- Performance improvement at loading time in Regex and SimpleRegex recognizers
- Performance improvement at loading time in the CreditCard recognizer
- Performance improvement at loading time in the Entity recognizer
- Support of multiple to multiple Supporting many to many synonyms in the Synonym stage
- Phone Number supports UK number format more closely, along with area codes
Docker
- Image tag now includes base layer name (ex. saga-server:1.3.3-javacio17-base)
- Now using CIO recommended base layer
...
- NPE in Saga-Parser when retrieving tags
- Error in Bestbets recognizer when property is null
- Missing entries in Saga export file
- Importer fails in Linux fails
- Error when selecting a model in FAQ recognizer when default model is not present
- Error starting Saga when no indices are present in Elasticsearch
- Python bridge is not downloading Bert models in Windows
- Error loading a dictionary on start up after importing a bad .sg file
- Saga-Parser failing on initialization when numeric settings are passed as string
- Saga failing when processing big payload from Elasticsearch
- FederalId recognizer not recognizing 11 digit numbers
- NPE in synonym stage
- Wrong matching text in Regex recognizer when special characters are processed
- Index out of bounds error when processing big content with no breaks
- Lat/Long recognizer not supporting degree sign when next to a number
- Stack overflow error in LucenePipeline stage when processor when configured with Korean tokenizer
- Export for data science not working
- UI pagination shows wrong numbers after a search is done
- Intent recognizer not working with TensorFlow model
- Intent and FAQ recognizer not working
- Error in Sentence Breaker OpenNLP when used from Saga-Parser
- Corrupted data in the Classification Watcher
- Processing Unit is not cleaning the Saga Graph when an error occurs traversing the graph
- API is creating Processing Unit without a tag
- NPE in FederalId recognizer
- Preview pop-up overflows with large text
- Hitting Enter in the preview textbox adds a new line
- Error in Bestbets recognizer when property is null
- Releasing vector to memory pool error when processing lots of documents
- Unknow query error in GeoNames recognizer
- Bert models stuck loading in Python Bridge
- UI issues adding a stage in the Lucene Pipeline processor
- Invalid combination of arguments error in Python Bridge when using a Bert model in the Intent Recognizer
- Error when using Lucene Pipeline processor with no tokenizer, added validation and default tokenizer
- NPE in Saga-Parser when using Aspire with embedded JRE
- Phone Number recognizer was not checking correctly the area code
- Engine provider should wait for resources to be loaded before creating new engines
- Error when selecting many tags in Export for Data Science
- Pagination at the bottom of the UI is not refreshing correctly
- Error when trying to authenticate without host/port in Python Model recognizer
Note: you can find all the details in Jira here.
Known Issues
- Spell checker only works with Elasticsearch
- Saga-Parser using recognizers that need Python Bridge, cannot connect to a Python Bridge set up with HTTPS, only HTTP
- Debug setting in Saga-parser causes a NPE during crawl. This setting needs to be disabled
- Statistics screen is not rendering correctly after an Evaluation with a dataset is performed
- Check Tag button in preview screen is not working properly
- Depending on connection quality a false "Server Down" message could appear in the UI