Language processing requires linguistic resources:

  • Dictionaries of word variations (e.g. for lemmatizers)
  • Dictionaries of names, places, products, etc.
  • The pipeline configuration itself


Saga has a "Resource Management" system for reading and using these resources for language processing.

Resources are intended to be shared across all engines within an instance of Saga (and possibly, across multiple nodes as well - depending on the implementation).

Goals

  • Separated storage layer
    • Allow for resources to be stored in files or different database systems
  • Isolate storage details from pipeline functionality
    • Change providers without changing pipeline configuration
  • Allow extremely large dictionary resources to be stored and used centrally
    • For example, in a REDIS or similar distributed key-value system
  • Allow for Dev, Staging, and Production publishing
  • Allow for business user editors to edit dictionaries and publish updates
  • Allow for publishing of dynamic updates linguistic resources


Many of these are just goals for now and are in the process of being implemented.

Resource Providers 

A resource provider provides access to a specific set of resources from a particular storage technology. For example, a file system directory of resource files or a collection of tables.

Available Resource Provider Implementations

Add-on Resources

Built-in Resources

Only the FileSystem implementation is provided as part of the core system. Others are provided in separate Jar files that are specifically included on the class path when needed.

Locating the Resource Provider Class

The "type" parameter is used in resource configuration to locate the resource provider class. For example:

"type":"FileSystem"

or

"type":"com.accenture.saga.resourcemgr.filesystem.FileSystemProvider"

The type parameter is used to locate the class using the following steps:

  1. If the type parameter has periods in it

    • Try and look for the class as specified

      For example:  com.accenture.saga.resourcemgr.filesystem.FileSystemProvider

  2. Otherwise:

    • Try and find the class in the "com.accenture.saga.resourcemgr.filesystem" package

      For example:  "FileSystemProvider" → "com.accenture.saga.resourcemgr.filesystem.FileSystemProvider"

    • Try and find the class with a "Provider" suffix in the "com.accenture.saga.resourcemgr.filesystem" package

      For example:  "FileSystem" → "com.accenture.saga.resourcemgr.filesystem.FileSystemProvider"

Types of Resources

Currently, there are three types of resources defined:

  1. Blob
    • This resource contains a simple blob of binary data.
    • The typical use is to hold a JSON file, such as the pipeline configuration file.
  2. JSON Map
    • This resource contains a key/value map, where the key is a string and the value is a JSON record.
  3. String[] Map
    • This resource contains a key/value map, where the key is a string and the value is an array of strings.

A resource provider must provide methods to read and write all three types of resources.  Most resources are read-only, but some resources may need to be created by the pipeline stage which uses them. For example, to create a dictionary index.

Resource Providers Configuration

The following is a sample resource providers configuration:

Sample Resource Configuration
{
	"providers": [
      {
        "name": "filesystem-provider",
        "type": "FileSystem",
        "baseDir": "./config"
      },
      {
        "name": "saga-provider",
        "type": "Elastic",
        "scheme": "http",
        "hostnamesAndPorts": ["localhost:9200"],
        "timestamp": "updatedAt",
        "exclude": [
          "updatedAt",
          "createdAt"
        ]
      }
    ]
}


This configuration has a single field "providers".  More fields may be added later with a list of provider configurations.  See the documentation for each provider type for details.



  • No labels