Batching
Batches are configured in the connector
configuration and the Publisher Framework respects thisdeveloper settings. If no batching is defined, the Publisher Framework creates a one-time batch with only one document included.
On the publisher level, a developer Developer can choose among
certain batch types: BUFFER/ STREAM/ NONE
:
- For STREAM batch type, the Publisher Framework gets connection from the pool on batch start and keeps sending this connection to PAP methods in the course of the whole batch.
- The connection is released when closing the batch.
- For BUFFER batch type the connection is claimed from the pool at the beginning of batch close, passed to PAP endBatch method and released afterwards.
- This means that the developer should buffer all documents in the course of batch. For this purpose, so called batch data buffer is available in PublisherBatch object.
- The Publisher Framework also supports "so called" multi server batches.
- Batch factory creates this kind of batch when more URL's are provided in the configuration.
- The purpose of this is to support the ability to publish documents to more servers.
- Broadcasting and round robin are supported.
- There is a BatchAdapter object available in PublisherBatch.
- This object can be used for reporting error and other messages to the Aspire framework.
Image Modified
Transformers are used for transforming AspireObjects coming in jobs into some String format representation
of this object required by the target repository. For example, when publishing to Elasticsearch
we , you need to create a JSON structure of the Aspire document.
We support The Publisher Framework supports XML, JSON and simple String transformers
- Transformers are configured by specifying transform file – Groovy script for JSON or XSLT template for XML transformer.
- Transform files are typically provided by the developer of the specific publisher.
- For example, the Elasticsearch publisher bundle is pre-packed with transform.groovy script.
In runtime the user -
- In run-time, users can configure the publisher with
his - their own transform file.
- Transformer functionality can be used by calling the PublisherInfo.transform(AspireObject doc) method, which produces a string result of the transformation.
Note: For more low-level handling of the transformation process,
use the PublisherInfo.getTransformerFactory method to create transformers and use streams passed as parameters to transformers.
Image Modified
HttpClient
HttpClient
(HC) is is provided by the HttpConnection object.
Whenever When developing a publisher for REST-based target repositories, consider using this class.
HC - HttpClient was primarily developed for writing AspireObject
documents - documents.
- If required
HC - , HttpClient uses transformers for converting AspireObjects before writing.
- HttpClient
- HC supports REST-based API and can execute GET, PUT, POST, DELETE methods methods.
- HC HttpClient also supports streaming.
- This can be used in batching. For example, Elasticsearch publisher writes single documents to
HC stream first and then - the HttpClient stream first. Then on batch close, this stream is posted to the Elasticsearch.
- HC HttpClient can be configured by the HttpProperties object.
- HC HttpClient configuration is flexible enough to accept changes even after the object is constructed.
- This opens possibilities for reconfiguring already created and possibly already pooled objects.
- For example, we
can - may need to modify an URL parameter value in already created
HC - configuration because we need different
URL normal for some like - HC HttpClient supports retry logic configured by specific parameters HC can .
- HttpClient can be configured
by HttpErrorHandler- by HttpErrorHandler.
- If this handler is provided, the developer can get information about possible connection errors or other Http related errors and
react - act accordingly – either by throwing an exception or by continuing with retry logic.
Image Modified
Delete By Query
If a document with action
“deleteByQuery” “deleteByQuery” arrives in a publisher
PF , the Publisher Framework takes
an appropriate action
.
- The query document is first automatically transformed by the configured transformer
. The developer - if transformation is configured for the publisher.
- You must support the transformation in a transformation script
– for example in case of JSON he can - .
- For example, in JSON, introduce a section
for this
“if - “if (action == "deleteByQuery")” command.
If
is left - empty if the deleteByQuery document
is considered
to in
The developer must then- In the PAP class implement delete by query logic in the method processDeleteByQuery by interpreting the syntax of
“deleteByQuery” - “deleteByQuery” document.
- When arriving in PAP.processDeleteByQuery(DeleteByQuery) the
DeleteByQuery- DeleteByQuery object (the object where the original
“deleteByQuery” - “deleteByQuery” document is wrapped) can be
translated- translated by the supported Visitor objects into some meaningful string representation.
- The prepared visitor classes support delete by query format created by ArchiveExtractor utility (QueryForArchiveDefaultVisitorImpl).
- For example
, - in the Elasticsearch publisher
we
Elasticsearch API command for getting - an Elasticsearch REST request to get all documents with the same
“parentId” published previously and this way handle- “parentId” published previously; hence, handling deletion of all the documents
of
Image Modified