Publish to ElasticSearch App Bundle
Factory Name	com.searchtechnologies.aspire:app-publish-to-elasticsearch
Type Flags	job-input
Inputs	AspireObject from a connector's subjob with metadata and content extracted from a specific file/folder.
Outputs	A JSON transformation of the AspireObject sent to the ElasticSearch's bulk URL.

The Publish to ElasticSearch application performs content feeds to a ElasticSearch of metadata and content of files extracted by Aspire connectors. The feed to the ElasticSearch can be customized by editing the JSON transformation file provided by the user.

Configuration

This section lists all configuration parameters available to configure the Publish to ElasticSearch App Bundle component.

Element	Type	Default	Description
ElasticNoUrl	boolean	true	Indicates if the publisher must use a Url or build one from the host and port entered.
ElasticUrl	String	-	Complete Url where the feeds are going to be send. e.g. http://localhost:9200/bulk_
ElasticPort	int	9200	ElasticSearch port where to send the feeds
ElasticHost	String	-	ElasticSearch hostname or IP adress. e.g. server.domain.com
ElasticIndex	String	index1	Index to which the jobs are going to be published.
aspireToElasticGroovy	String	${appbundle.home}/config/groovy/aspireToElasticsearchBulk.groovy	Location of the Groovy to transform the job data to a ElasticSearch feed.
maxResults	int	1000000	How many documents can be fetched by the search engine for the same query
pageSize	int	10000	How many documents to fetch per page
idField	String	hits._id	Field used to store the url in the search engine
urlField	String	hits.fields.url	Field used to store the id in the search engine.
timestampField	String	hits.fields.submitTime	The name of the timestamp field holding the index timestamp of every document.
debug	boolean	false	If true it will log debug information from the component

Example Configuration

With Host and Port

<application config="com.searchtechnologies.aspire:app-publish-to-elasticsearch">
	<properties>
    	<ElasticNoUrl>true</ElasticNoUrl>
    	<ElasticHost>locahost</ElasticHost>
    	<ElasticPort>9200</ElasticPort>
    	<ElasticIndex>index1</ElasticIndex>
    	<aspireToElasticGroovy>${appbundle.home}/config/groovy/aspireToElasticsearchBulk.groovy</aspireToElasticGroovy>
    	<maxResults>1000000</maxResults>
    	<pageSize>10000</pageSize>
    	<idField>hits._id</idField>
    	<urlField>hits.fields.url</urlField>
    	<timestampField>hits.fields.submitTime</timestampField>
    	<debug>false</debug>
	</properties>
</application>

With Complete Url

<application config="com.searchtechnologies.aspire:app-publish-to-elasticsearch">
	<properties>
    	<ElasticNoUrl>false</ElasticNoUrl>
    	<ElasticUrl>http://localhost:9200/_bulk</ElasticUrl>
    	<ElasticIndex>index1</ElasticIndex>
    	<aspireToElasticGroovy>${appbundle.home}/config/groovy/aspireToElasticsearchBulk.groovy</aspireToElasticGroovy>
    	<maxResults>1000000</maxResults>
    	<pageSize>10000</pageSize>
    	<idField>hits._id</idField>
    	<urlField>hits.fields.url</urlField>
    	<timestampField>hits.fields.submitTime</timestampField>
    	<debug>false</debug>
	</properties>
</application>

Edit Groovy

The default Groovy transformation file can be found in aspireToElasticsearchBulk.groovy

The default transformation Groovy file provided by the publisher expects metadata as described in Connector Metadata

Add metadata field

To add a new metadata field extracted by an Aspire Connector add an groovy element inside the builder.$object() that is right after the builder.flush().

   metadata-name doc.metadatafield

Change the document ID

The id of a ElasticSearch document is used to uniquely identify a file in the index. By default, Publish To ElasticSearch will use the following fields from the Aspire document in order of precedence (if one is missing, then the next will be used):

fetchUrl
url
displayUrl
id

If you want to change this behavior, edit or create a new Groovy file which has the following element inside builder.index():

 '_id' value-for-id

For more information in how to create a Groovy file transformation please see JSON Transformation

Page tree

Publish to ElasticSearch App Bundle