The Publish to Azure Search application performs content feeds to a Azure Search of metadata and content of files extracted by Aspire connectors. The feed to the Azure Search can be customized by editing the JSON transformation file provided by the user.

Publish to ElasticSearch App Bundle
Factory Name	com.searchtechnologies.aspire:app-publish-to-azure-search
Type Flags	job-input
Inputs	AspireObject from a connector's subjob with metadata and content extracted from a specific file/folder.
Outputs	A JSON transformation of the AspireObject sent to the Azure Search's bulk URL.

Configuration

This section lists all configuration parameters available to configure the Publish to Azure Search App Bundle component.

Element	Type	Default	Description

ElasticNoUrl

server

boolean

String

true

-

Indicates

if the publisher must use a Url or build one from the host and port entered.

the name of the service endpoint to use..

index

ElasticUrl

String

-

Complete Url

Azure Search index where the

feeds are going to be send. e.g. http://localhost:9200/bulk_ElasticPortint9200ElasticSearch port where to send the feedsElasticHostString-ElasticSearch hostname or IP adress. e.g. server.domain.comElasticIndexStringindex1Index to which the jobs are going to be published.aspireToElasticGroovy

jobs will be stored
apiVersion	String	-	Azure Search Api version of the REST Api.
apiKey	String	-	Azure Search Api Key used to connect to the REST Api.
aspireToAzureSearchGroovy	String	${appbundle.home}/config/groovy/

aspireToElasticsearchBulkmaxResultsint1000000How many documents can be fetched by the search engine for the same querypageSizeint10000How many documents to fetch per pageidFieldStringhits._idField used to store the url in the search engineurlFieldStringhits.fields.urlField used to store the id in the search engine.timestampFieldStringhits.fields.submitTimeThe name of the timestamp field holding the index timestamp of every document.

aspireToAzureSearchBulk.groovy

Location of the Groovy to transform the job data to a

ElasticSearch feed.

Azure Search feed.

debug

boolean

false

If true it will log debug information from the component

Example Configuration

With Host and Port

Code Block

language	xml

<application config="com.searchtechnologies.aspire:app-publish-to-elasticsearch">
	<properties>
    	<ElasticNoUrl>true</ElasticNoUrl>
    	<ElasticHost>locahost</ElasticHost>
    	<ElasticPort>9200</ElasticPort>
    	<ElasticIndex>index1</ElasticIndex>
    	<aspireToElasticGroovy>${appbundle.home}/config/groovy/aspireToElasticsearchBulk.groovy</aspireToElasticGroovy><server>corest.search.windows.net</server>
    	<maxResults>1000000</maxResults>
    	<pageSize>10000<<index>test</pageSize>index>
    	<idField>hits._id</idField>
    	<urlField>hits.fields.url</urlField>
    	<timestampField>hits.fields.submitTime</timestampField>
    	<debug>false</debug>
	</properties>
</application>

With Complete Url

Code Block

language	xml

<application config="com.searchtechnologies.aspire:app-publish-to-elasticsearch">
	<properties>
<apiVersion>2016-09-01</apiVersion>
    	<ElasticNoUrl>false</ElasticNoUrl>
    	<ElasticUrl>http<apiKey>encrypted://localhost:9200/_bulk</ElasticUrl>9804B36327DAF1E712E4E82301B6A276FCBBA459834EB15F6A94255B6B0BC32B20A3E7262DD2D3D74A6FE5A70A251FCD</apiKey>
    	<ElasticIndex>index1</ElasticIndex>
    	<aspireToElasticGroovy>$<aspireToAzureSearchGroovy>${appbundle.home}/config/groovy/aspireToElasticsearchBulkaspireToAzureSearchBulk.groovy</aspireToElasticGroovy>aspireToAzureSearchGroovy>
    	<maxResults>1000000</maxResults>
    	<pageSize>10000</pageSize>
    	<idField>hits._id</idField>
    	<urlField>hits.fields.url</urlField>
    	<timestampField>hits.fields.submitTime</timestampField>
    	<debug>false</debug>
	</properties>
</application>

Edit Groovy

The default Groovy transformation file can be found in aspireToElasticsearchBulk.groovy aspireToAzureSearchBulk

The default transformation Groovy file provided by the publisher expects metadata as described in Connector Metadata

Add metadata field

To add a new metadata field extracted by an Aspire Connector add an groovy element inside the builder.$object() that is right after the builder.flush().

   metadata-name doc.metadatafield

Change the document ID

The id of a ElasticSearch Azure Search document is used to uniquely identify a file in the index. By default, Publish To ElasticSearch Azure Search will use the MD5 of the following fields from the Aspire document in order of precedence (if one is missing, then the next will be used):

fetchUrl
url
displayUrl
id

If you want to change this behavior, edit or create a new Groovy file which has the following element inside builder.index():

Code Block

language	groovy
theme	Eclipse

 '_id' value-for-id

Tip
For more information in how to create a Groovy file transformation please see JSON Transformation

Connector-specific fields

By default the connector specific fields of the document are not indexed, in order to enable the indexing of connector specific fields you have to add them at the map connectorSpecificMap that is at the start of the Groovy file

Code Block

language	groovy
theme	Eclipse

def connectorSpecificMap = [
    'isContainer':'is_container'
]

The key of the map entry is the name of the connector specific field as is contained by the document and the value is the name that is going to be used for the indexing. Only the fields specified in this map will be indexed.

Page tree

Versions Compared

Old Version 1

New Version Current

Key

Configuration

Example Configuration

With Host and Port

With Complete Url

Edit Groovy

Add metadata field

Change the document ID

Connector-specific fields

Page tree

Page History

Versions Compared

Old Version 1

New Version Current

Key

Configuration

Example Configuration

With Host and Port

With Complete Url

Edit Groovy

Add metadata field

Change the document ID

Connector-specific fields