Warning |
---|
This feature is currently in Beta and only available in a SNAPSHOT version |
In Aspire, we try to be as efficient as possible, especially when passing round large binary objects such as files or attachments. We have processes called “Fetchers”, but they typically don’t really fetch things. Instead they open a stream to the object and so the transfer happens when something such as text extraction reads the data. This means we’re not transferring the data (possibly over some slow network route) until we know we need it. It also means we don’t have to hold the entire object (which could be gigabytes) in memory. Streams work on bytes at a time.
However, this approach comes with a disadvantage. Whilst it is sometimes possible to “backup” on a stream to process the data more than once, typically once you’ve processed the data it’s gone. If you wanted to process it a second time, you’d have to open another stream and transfer it again – again possibly over that slow network. A binary store would give use the ability (when we know we need to process the file more than once) to store the file more locally.
With other use cases, we might produce binary objects (document rendering for example) and need a place to store them before presenting via a UI. A binary store would also work in this scenario.
However, what is a good medium for the store? A local file system, S3, Azure, something else? Any binary store we have would benefit from being built in such a way to abstract the Aspire workflow components from the actual code that writes to the medium. That way the workflow can just “read” or “write”, with something else worrying about where it’s written to or read from.
I’ve mentioned reading and writing, but it would be useful to delete from the store, or possibly clear it as well.
The background processing store’s architecture is shown below
Each binary store inside Aspire is implemented as an Aspire Service that implements a Java interface. This interface allows for all the common storage actions – read, write, delete, clear store and so on. The service implements the code required to perform those actions for the given store – local file system, S3 or Azure. Components are implemented that call in to the services and perform the desired storage actions – write, read, delete.
This approach has a number of advantages:
This architecture also supports having multiple stores of the same type by installing more than one service with different configurations – allowing (say) use of more than one file system directory or more than S3 bucket.