Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Below is an example data flow for background processing.

Image Modified

A typical Aspire workflow has a connector extracting data from a repository, performing text extraction and publishing to a search engine. To enable background processing, we simply add an extra workflow task to “publish” the document to a queue. This is shown on the top row of the diagram.

...

There is however a limitation to the flow above – the binary file extracted from the repository. In Aspire we try to be as efficient as we can and use streams where ever possible. The binary stream of data from the data repository is consumed by the extract text stage and no longer available. If we need that stream for background processing, we’ll need to do something extra. Using Aspire’s [link to Binary Store], we add an extra stage to both parts of the processing. The data flow is now as below.

Image Modified

Before we use the stream to extract the test when processing the original document, we add a stage to use Aspire’s binary store capability (see [link to store documentation]) and write the binary to the store. Now, when we extract the text, we still have a copy of the original binary to process again (as many times as we want). When we perform the “background” processing, we “read” the file so it’s available to any subsequent workflow stages.