Batching
Batches are configured in the connector developer settings. If no batching is defined, the Publisher Framework creates a one-time batch with only one document included.
Developer can choose among batch types: BUFFER/ STREAM/ NONE:
- For STREAM batch type, the Publisher Framework gets connection from the pool on batch start and keeps sending this connection to PAP methods in the course of the whole batch.
- The connection is released when closing the batch.
- For BUFFER batch type the connection is claimed from the pool at the beginning of batch close, passed to PAP endBatch method and released afterwards.
- This means that the developer should buffer all documents in the course of batch. For this purpose, so called batch data buffer is available in PublisherBatch object.
- The Publisher Framework also supports multi server batches.
- Batch factory creates this kind of batch when more URL's are provided in the configuration.
- The purpose of this is to support the ability to publish documents to more servers.
- Broadcasting and round robin are supported.
- There is a BatchAdapter object available in PublisherBatch.
- This object can be used for reporting error and other messages to the Aspire framework.
![](/download/attachments/707319758/batching.png?version=1&modificationDate=1542307754000&api=v2)
Transformers are used for transforming AspireObjects coming in jobs into some String format representation required by the target repository. For example, when publishing to Elasticsearch, you need to create a JSON structure of the Aspire document.
The Publisher Framework supports XML, JSON and simple String transformers
- Transformers are configured by specifying transform file – Groovy script for JSON or XSLT template for XML transformer.
- Transform files are typically provided by the developer of the specific publisher.
- For example, the Elasticsearch publisher bundle is pre-packed with transform.groovy script.
- In run-time, users can configure the publisher with their own transform file.
- Transformer functionality can be used by calling the PublisherInfo.transform(AspireObject doc) method, which produces a string result of the transformation.
Note: For more low-level handling of the transformation process, use the PublisherInfo.getTransformerFactory method to create transformers and use streams passed as parameters to transformers.
![](/download/attachments/707319758/transformers.png?version=1&modificationDate=1542307754000&api=v2)
HttpClient
HttpClient is provided by the HttpConnection object. When developing a publisher for REST-based target repositories, consider using this class.
- HttpClient was primarily developed for writing AspireObject documents.
- If required, HttpClient uses transformers for converting AspireObjects before writing.
- HttpClient supports REST-based API and can execute GET, PUT, POST, DELETE methods.
- HttpClient also supports streaming.
- This can be used in batching. For example, Elasticsearch publisher writes single documents to the HttpClient stream first. Then on batch close, this stream is posted to the Elasticsearch.
- HttpClient can be configured by the HttpProperties object.
- HttpClient configuration is flexible enough to accept changes even after the object is constructed.
- This opens possibilities for reconfiguring already created and possibly already pooled objects.
- For example, we may need to modify an URL parameter value in already created configuration because we need different URLs for bulk POST and other actions such as index clean.
- HttpClient supports retry logic.
- HttpClient can be configured by HttpErrorHandler.
- If this handler is provided, the developer can get information about possible connection errors or other Http related errors and act accordingly – either by throwing an exception or by continuing with retry logic.
![](/download/attachments/707319758/HttpClient.png?version=1&modificationDate=1542307754000&api=v2)
Delete By Query
If a document with action “deleteByQuery” arrives in a publisher, the Publisher Framework takes appropriate action.
- The query document is first automatically transformed by the configured transformer if transformation is configured for the publisher.
- You must support the transformation in a transformation script.
- For example, in JSON, introduce a section with “if (action == "deleteByQuery")” command.
- Leave this section empty if the deleteByQuery document should not be transformed.
- In the PAP class implement delete by query logic in the method processDeleteByQuery by interpreting the syntax of “deleteByQuery” document.
- When arriving in PAP.processDeleteByQuery(DeleteByQuery) the DeleteByQuery object (the object where the original “deleteByQuery” document is wrapped) can be translated by the supported Visitor objects into some meaningful string representation.
- The prepared visitor classes support delete by query format created by ArchiveExtractor utility (QueryForArchiveDefaultVisitorImpl).
- For example in the Elasticsearch publisher, you can create part of an Elasticsearch REST request to get all documents with the same “parentId” published previously; hence, handling deletion of all the documents from the archive.
![](/download/attachments/707319758/deleteByQuery.png?version=1&modificationDate=1542307754000&api=v2)