scheme to publisher mainly implementation of PublisherAccessProvider interface with Solr. Developer like processAddUpdate and processDelete when the Aspire when bundle in addition loads always other bundle called PF – e class. PublisherControllerImpl also the DXF file for while the PF bundle contains the DXF file with which the PF PublisherControllerImpl object is the first point where all documents coming from connectors arrive and they other PF objects, of PF connection PublisherControllerImpl holds one PublisherInfo object PublisherInfo contains user in shared The publisher when loaded contains one instance of PublisherControllerImpl class and one instance of PublisherInfo class holds PublisherControllerImpl PAP class shared PublisherControllerImpl handles of PublisherRepositoryConnection implementation objects the PublisherConnectionController implementationPublisherInfo provides PublisherConnectionController implementation Connection objects must connection objects when writing repositories the PF itself, like HttpClient for REST, others PublisherControllerImpl also objects PublisherControllerImpl creates Aspire standard ComponentBatchPublisherBatch objects by ComponentBatch objects

Image Modified

Batching

Batches are configured in the connector configuration and

PF respects

the Publisher Framework respects this. If no batching

defined PF creates

is defined, the Publisher Framework creates a one-time batch with only one document included.

On the publisher level

the

, a developer can choose among certain batch types: BUFFER/ STREAM/ NONE

For STREAM batch type

PF gets

, the Publisher Framework gets connection from the pool on batch start and

keeps

keeps sending this connection to PAP methods in the course of the whole batch.
- The connection is released when closing the batch.

For BUFFER batch type the connection is claimed from the pool at the beginning of batch close, passed to PAP endBatch method and released afterwards.
- This means that the developer should buffer all documents in the course of batch. For this purpose, so called batch data buffer is available in PublisherBatch object.

Besides mentioned batch types PF supports also

The Publisher Framework also supports "so called" multi server batches.
- Batch factory creates this kind of batch when more URL's are provided in the configuration.
- The purpose of this is to support the ability

of publishing

- to publish documents to more servers.
- Broadcasting and round robin are supported.

There is

also BatchAdapter

a BatchAdapter object available in PublisherBatch.
- This object can be used for reporting error and other messages to the Aspire framework.

Image Removed

Image Modified

Transformers

Transformers are used for transforming AspireObjects coming in jobs into some String format representation of this object required by the target repository. For example, when publishing to Elasticsearch we need to create a JSON structure of the Aspire document.

We support XML, JSON and simple String transformers

Transformers are configured by specifying transform file – Groovy script for JSON or XSLT template for XML transformer.
Transform files are typically provided by the developer of the specific publisher. For example Elasticsearch publisher bundle is pre-packed with transform.groovy script. In runtime the user can configure the publisher with his own transform file.
Transformer functionality can be used by calling PublisherInfo.transform(AspireObject doc) method which produces string result of the transformation.

For more low-level handling of the transformation process

there is a method

, use the PublisherInfo.getTransformerFactory

which can be used by developer for creating

method to create transformers and

using

use streams passed as parameters to transformers.

Image Modified

HttpClient

HttpClient (HC) is provided by the HttpConnection object.

Whenever onedevelops the publisher

Whenever developing a publisher for REST-based target repositories

he should

, consider using this class.

HC was primarily developed for writing AspireObject

documents

documents
If required HC uses transformers for converting AspireObjects before writing

HC supports REST based API and can execute GET, PUT, POST, DELETE

methods

methods
HC also supports streaming. This can be used in batching. For example, Elasticsearch publisher writes single

documents

documents to HC stream first and then on batch close this stream is posted to the Elasticsearch.

HC can be configured by the HttpProperties object.
HC configuration is flexible enough to accept changes even after the object is constructed. This opens possibilities for reconfiguring already created and possibly already pooled objects. For example, we can modify URL parameter value in created HC because we need different URL for normal bulk POST and other for some

actions

actions like index clean.
HC supports retry logic configured by specific

parameters

parameters
HC can be configured by HttpErrorHandler. If this handler is provided the developer can get information about possible connection errors or other Http errors and react accordingly – either by throwing an exception or by continuing with retry logic.

Image Modified

Delete By Query

If a document with action “deleteByQuery” arrives in publisher PF takes an appropriate action
The query document is first automatically transformed by the configured transformer. The developer must support the transformation in transformation script – for example in case of JSON he can introduce a section for this with “if (action == "deleteByQuery")” command. If this section is left empty the document is considered not to be transformed.

The developer must then in the PAP class implement delete by query logic in the method processDeleteByQuery by interpreting the syntax of “deleteByQuery” document.
When arriving in PAP.processDeleteByQuery(DeleteByQuery) the DeleteByQuery object (the object where the original “deleteByQuery” document is wrapped) can be translated by the supported Visitor objects into some meaningful string representation. The prepared visitor classes supportdelete by query format created by ArchiveExtractor utility (QueryForArchiveDefaultVisitorImpl). For example, in Elasticsearch publisher we can create part of Elasticsearch API command for getting all documents with the same “parentId” published previously and this way handle all the documents of the archive.

Image Modified

Simple File

This publisher Simple File (SF) comes as a part of PF.
SF publishes all documents into single file.

SF was developed to help developers who are new to PF and wants to learn how to develop and deploy the specific publisher.
It is an advice when learning
PF to
PF to always build and deploy this publisher first and then after running the crawl checking
the
the result in publisher output file.
Resources/dxf/publisher.xml is an example how to create DXF file with publisher specific parameters
Resources/aspire.properties
is
is an example how to use parameters for merging and hiding general DXF coming from PF itself with the specific DXF provided by the publisher.

Image Modified

Page tree

Versions Compared

Old Version 1

New Version 2

Key

General

Scheme

Batching

Transformers

HttpClient

Delete By Query

Simple File

Page tree

Page History

Versions Compared

Old Version 1

New Version 2

Key

General

Scheme

Batching

Transformers

HttpClient

Delete By Query

Simple File