Structure of a Language Processing Toolkit Program

The overall structure of Saga is shown in the diagram below:

A Saga engine is a pipeline of text processing stages

The first stage in the pipeline is a "reader"

- This reads raw text from a text stream and returns it as text blocks to be processed by the stages.

2. Then there is a list of pipeline stages.

- Each stage takes an "interpretation graph" and extends it.
- See Understanding Interpretation Graphs for more information.

3. The result is the final interpretation graph of text blocks, tokens, and semantic tags.

4. Different orders will result in different results.

Resources

Resources are any of the data structures which typically support an engine like this. This includes pipeline configurations, dictionaries, pattern databases (perhaps from text mining), machine learning models, etc.

Resource Providers

Resources are provided by "resource providers" which insulate the pipeline stages from having to know the details of the underlying storage technology. Example providers are: "FileSystem" and "Elasticsearch".

Resource providers are configured in the "config.json" configuration file. It contains a "providers" section with parameters for each provider such as server connection strings, username, password, base directory path, etc.

A Key Design Goal: Changing the storage location of a resource will not require changing the pipeline configuration.

For example, you might first develop your NLP program using simple files. But then you move it to a No-SQL database so you have real-time updates. The same pipeline configuration should work in both places.

Page tree

Structure of a Language Processing Toolkit Program

A Saga engine is a pipeline of text processing stages

Resources

Resource Providers