You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

The Publish to ElasticSearch application performs content feeds to a ElasticSearch of metadata and content of files extracted by Aspire connectors. The feed to the ElasticSearch can be customized by editing the JSON transformation file provided by the user.

Publish to ElasticSearch App Bundle
Factory Namecom.searchtechnologies.aspire:app-publish-to-elasticsearch
Type Flagsjob-input
InputsAspireObject from a connector's subjob with metadata and content extracted from a specific file/folder.
OutputsA JSON transformation of the AspireObject sent to the ElasticSearch's bulk URL.

Configuration


This section lists all configuration parameters available to configure the Publish to ElasticSearch App Bundle  component.

ElementTypeDefaultDescription
ElasticNoUrlbooleantrueIndicates if the publisher must use a Url or build one from the host and port entered.
ElasticUrlString-Complete Url where the feeds are going to be send. e.g. http://localhost:9200/bulk_
ElasticPortint9200ElasticSearch port where to send the feeds
ElasticHostString-ElasticSearch hostname or IP adress. e.g. server.domain.com
ElasticIndexStringindex1Index to which the jobs are going to be published.
aspireToElasticGroovyString${appbundle.home}/config/groovy/aspireToElasticsearchBulk.groovy 
maxResultsint1000000 
pageSizeint10000 
idFieldStringhits._id 
urlFieldStringhits.fields.url 
timestampFieldStringhits.fields.submitTime 
debugbooleanfalse 

Example Configuration


With Host and Port

<application config="com.searchtechnologies.aspire:app-publish-to-elasticsearch">
	<properties>
    	<ElasticNoUrl>true</ElasticNoUrl>
    	<ElasticHost>locahost</ElasticHost>
    	<ElasticPort>9200</ElasticPort>
    	<ElasticIndex>index1</ElasticIndex>
    	<aspireToElasticGroovy>${appbundle.home}/config/groovy/aspireToElasticsearchBulk.groovy</aspireToElasticGroovy>
    	<maxResults>1000000</maxResults>
    	<pageSize>10000</pageSize>
    	<idField>hits._id</idField>
    	<urlField>hits.fields.url</urlField>
    	<timestampField>hits.fields.submitTime</timestampField>
    	<debug>false</debug>
	</properties>
</application>

With Complete Url

<application config="com.searchtechnologies.aspire:app-publish-to-elasticsearch">
	<properties>
    	<ElasticNoUrl>false</ElasticNoUrl>
    	<ElasticUrl>http://localhost:9200/_bulk</ElasticUrl>
    	<ElasticIndex>index1</ElasticIndex>
    	<aspireToElasticGroovy>${appbundle.home}/config/groovy/aspireToElasticsearchBulk.groovy</aspireToElasticGroovy>
    	<maxResults>1000000</maxResults>
    	<pageSize>10000</pageSize>
    	<idField>hits._id</idField>
    	<urlField>hits.fields.url</urlField>
    	<timestampField>hits.fields.submitTime</timestampField>
    	<debug>false</debug>
	</properties>
</application>

Edit Groovy


The default Groovy transformation file can be found in aspireToElasticsearchBulk.groovy


The default transformation Groovy file provided by the publisher expects metadata as described in Connector Metadata

Add metadata field

To add a new metadata field extracted by an Aspire Connector add an groovy element inside the builder.$object() that is right after the builder.flush().

   metadata-name doc.metadatafield

Change the document ID

The id of a ElasticSearch document is used to uniquely identify a file in the index. By default, Publish To ElasticSearch will use the following fields from the Aspire document in order of precedence (if one is missing, then the next will be used):

  • fetchUrl
  • url
  • displayUrl
  • id

If you want to change this behavior, edit or create a new Groovy file which has the following element inside builder.index():

 '_id' value-for-id

For more information in how to create a Groovy file transformation please see JSON Transformation

 

 

  • No labels