You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 8 Next »

Language processing requires linguistic resources:

  • Dictionaries of word variations (e.g. for lemmatizers)
  • Dictionaries of names, places, products, etc.
  • The pipeline configuration itself

The Language Processing Toolkit has a "Resource Management" system for reading and using these resources for language processing.

Note that resources are intended to be shared across all engines within an instance of Saga (and, possibly, across multiple nodes as well - depending on the implementation).

Goals:

  • Separated storage layer
    • Allow for resources to be stored in files or different database systems
  • Isolate storage details from pipeline functionality
    • Change providers without changing pipeline configuration
  • Allow extremely large dictionary resources to be stored and used centrally
    • For example, in a REDIS or similar distributed key-value system
  • Allow for Dev, Staging, and Production publishing
  • Allow for business user editors to edit dictionaries and publish updates
  • Allow for publishing of dynamic updates linguistic resources

Note that many of these goals are just goals for now and are in the process of being implemented.

Resource Providers

A resource provider provides access to a specific set of resources. This can be a directory of resource files or a collection of tables.

Available Resource Provider Types

Types of Resources

Currently, there are two types of resources:

  • JSON Config
  • JSON Map

Resources Configuration




  • No labels