Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The overall structure of an Saga program is shown in the diagram below:

Image RemovedImage Added

...

A Saga

...

engine is a

...

pipeline of text processing stages

  1. The first stage in the pipeline is a "reader"
    • This reads raw text from a text stream and returns it as text blocks to be processed by the stages.

2. Then there

...

is a list of pipeline stages.

3. The result is the final interpretation graph

...

of text blocks, tokens,

...

and semantic tags

...

Notes:

  1. It is a "pull" architecture
    1. Content is pulled from the last stage, which fetches content from the previous stage, etc. all the way up to the reader.
  2. SagaEngine is single-threaded
    1. If you want to process text with multiple threads, you will need to create multiple SagaEngine objects
  3. The order of the stages matter
    1. Different orders will result in different results

...

.

4. Different orders will result in different results.


Resources

Resources are any of the data structures which typically support an engine like this. This includes pipeline configurations, dictionaries, pattern databases (perhaps from text mining), machine learning models, etc.

Resource Providers

Resources are provided by "resource providers" which insulate the pipeline stages from having to know the details of the underlying storage technology. Example providers are:  "FileSystem" and "Elasticsearch".

Resource providers are configured in the "config.json" configuration file.  It contains a "providers" section with parameters for each provider such as server connection strings, username, password, base directory path, etc.

A Key Design Goal:  Changing the storage location of a resource will not require changing the pipeline configuration.

For example, you might first develop your NLP program using simple files. But then you move it to a No-SQL database so you have real-time updates. The same pipeline configuration should work in both places.