The Elasticsearch Connector can be configured using the Rest API. It requires the following entities to be created:

  • Credential
  • Connection
  • Connector
  • Seed

Below are the examples of how to create the Connection and the Seed. For the Connector, please check this page.

Create Credential


Field

Required

Default

Multiple

NotesExample
typeYes-No

The value must be "elasticsearch".

"elasticsearch"

descriptionYes-NoName of the credential object.

"ElasticsearchCredential"

propertiesYes-NoConfiguration object
authenticationYes"None"NoThe selected authentication method"Basic"
usernameNo-NoOnly required if "Use Basic Authentication" is selected. The name of elasticsearch user to use. testuser
passwordNo-NoOnly required if "Use Basic Authentication" is selected. The password of elasticsearch user to use. Password123
regionNo-NoOnly required if "AWS Signature V4 Authentication" is selected. The Region of the ES service to use. us-east-1
defaultAWSNoTRUENoEnable this to use the Default AWS Credentials
accessKeyNo-NoOnly required if "Use the Default AWS Credentials" is false. The Access key of the ES service to use
secretKeyNo-NoOnly required if "Use the Default AWS Credentials" is false. The Secret key of the ES service to use

Example

POST aspire/_api/credentials
{
    "type": "elasticsearch",
    "description": "Elasticsearch Credential",
    "properties": {
         "authentication": "Basic",
         "username": "testuser",
         "password": "Password123",
         "region": "us-east-1",
         "defaultAWS": true,
         "accessKey": "xxxxxxxxxxxxxxxxxxxxxxx",
         "secretKey": "xxxxxxxxxxxxxxxxxxxxxxx"
    }
}

Update Credential


Field

Required

Default

Multiple

NotesExample
idYes-NoId of the credential to update."2f287669-d163-4e35-ad17-6bbfe9df3778"
descriptionYes-NoName of the credential object.

"ElasticsearchCredential"

propertiesYes-NoConfiguration object
authenticationYes"None"NoThe selected authentication method"Basic"
usernameNo-NoOnly required if "Use Basic Authentication" is selected. The name of elasticsearch user to use. testuser
passwordNo-NoOnly required if "Use Basic Authentication" is selected. The password of elasticsearch user to use. Password123
regionNo-NoOnly required if "AWS Signature V4 Authentication" is selected. The Region of the ES service to use. us-east-1
defaultAWSNoTRUENoEnable this to use the Default AWS Credentials
accessKeyNo-NoOnly required if "Use the Default AWS Credentials" is false. The Access key of the ES service to use
secretKeyNo-NoOnly required if "Use the Default AWS Credentials" is false. The Secret key of the ES service to use

Example 

PUT aspire/_api/credentials/2f287669-d163-4e35-ad17-6bbfe9df3778
{
   "id": "2f287669-d163-4e35-ad17-6bbfe9df3778",
    "description": "Elasticsearch Credential",
    "properties": {
         "authentication": "Basic",
         "username": "testuser",
         "password": "Password123",
         "region": "us-east-1",
         "defaultAWS": true,
         "accessKey": "xxxxxxxxxxxxxxxxxxxxxxx",
         "secretKey": "xxxxxxxxxxxxxxxxxxxxxxx"
    }
}

Create Connection


Field

Required

Default

Multiple

Notes

Example

typeYes-No

The value must be "elasticsearch"

"elasticsearch"

descriptionYes-NoName of the connection object.

"MyElasticsearchConnection"

credentialNo-NoThe ID of the credential to be used with this seed. The credential type must match the seed type."2f287669-d163-4e35-ad17-6bbfe9df3778"
throttlePolicyNo-NoID of the throttle policy that applies to this connection object."f5587cee-9116-4011-b3a9-6b235b333a1b"
routingPoliciesNo[ ]YesThe IDs of the routing policies that this connection will use.["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"]
propertiesYes-NoConfiguration object
hostnameYes"localhost"NoThe elastic server hostnamelocalhost
portYes9200NoThe elastic server port9200
protocolNo-NoThe elastic server URL protocolhttps
fetchDocumentsNoTRUENoCheck to fetch the documents contentTRUE
useMGETNoTRUENoCheck to user MGET for fetching the documents. If not, individual GET requests will be executed for each documentTRUE
waitBeforeFetchingNoFALSENoCheck to make the fetch process wait for discovery process to be doneFALSE
includeFieldsNo-YesThe specified fields will be included in the fetch process of the document.[{"includeField":"field1"}, {"includeField":"field2"}]
includeFieldNo-Nothe name of the field to include in the fetch process.field1
excludeFieldsNo-YesThe specified fields will be excluded in the fetch process of the document.[{"excludeField":"field3"}, {"excludeField":"field4"}]
excludeFieldNo-Nothe name of the field to exclude in the fetch process.field3
verifyFinalCountNoFALSENoCheck to execute an initial document count query that will be used at the end of the crawl to validate the total of crawled documents.False
sliceYes5NoThe number of slices to use for the queries5
pageSizeYes1000NoThe number of documents to get per request1000
scrollTimeYes5mNoThe time to keep each scroll request active5m
timeoutYes20000NoThe timeout to use for the connections to elastic20000
retriesYes3NoThe number of retries for each slice processing3
retryWaitTimeYes10000NoThe time in milliseconds to wait between each slice retry10000
retriesConnectionYes5NoThe number of retries for each elasticsearch request5
retryWaitTimeConnectionYes60000NoThe time in milliseconds to wait between each elasticsearch request retry60000
useThrottlingNoFALSENoCheck to enable connection throttlingFALSE
throttleRateInMillisNo5000NoOnly required if "Use Throttling" is true. The throttle rate in milliseconds5000
throttleConnectionRateNo750NoOnly required if "Use Throttling" is true. The number of connections to allow in the specified throttle rate750

Example

POST aspire/_api/connections
{
    "type": "elasticsearch",
    "description": "MyElasticsearchConnection",
	"credential": null,
    "properties": {
        "hostname": "localhost",
        "port": 9200, 
        "protocol": "https",
        "fetchDocuments": true,
        "useMGET": true,
        "waitBeforeFetching": false,
        "includeFields": [
            {"includeField": "field1"},
            {"includeField": "field2"}
        ],
        "excludeFields": [
            {"excludeField": "field3"},
            {"excludeField": "field4"}
        ],
        "verifyFinalCount": false,
        "slice": 5,
        "pageSize": 1000,
        "scrollTime": "5m",
        "timeout": 20000,
        "retries": 3,
        "retryWaitTime": 10000,
        "retriesConnection": 5,
        "retryWaitTimeConnection": 60000,
        "useThrottling": true,
        "throttleRateInMillis": 5000,
        "throttleConnectionRate": 750
    }
}

Update Connection


Field

Required

Default

Multiple

Notes

Example

idYes-NoID of the connection to update"89d6632a-a296-426c-adb0-d442adcab4b0",
descriptionNo-NoName of the connection object.

"MyElasticsearchConnection"

credentialNo-NoThe ID of the credential to be used with this seed. The credential type must match the seed type."2f287669-d163-4e35-ad17-6bbfe9df3778"
throttlePolicyNo-NoID of the throttle policy that applies to this connection object."f5587cee-9116-4011-b3a9-6b235b333a1b"
routingPoliciesNo[ ]YesThe IDs of the routing policies that this connection will use.["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"]
propertiesYes-NoConfiguration object
hostnameYes"localhost"NoThe elastic server hostnamelocalhost
portYes9200NoThe elastic server port9200
protocolNo-NoThe elastic server URL protocolhttps
fetchDocumentsNoTRUENoCheck to fetch the documents contentTRUE
useMGETNoTRUENoCheck to user MGET for fetching the documents. If not, individual GET requests will be executed for each documentTRUE
waitBeforeFetchingNoFALSENoCheck to make the fetch process wait for discovery process to be doneFALSE
includeFieldsNo-YesThe specified fields will be included in the fetch process of the document.[{"includeField":"field1"}, {"includeField":"field2"}]
includeFieldNo-Nothe name of the field to include in the fetch process.field1
excludeFieldsNo-YesThe specified fields will be excluded in the fetch process of the document.[{"excludeField":"field3"}, {"excludeField":"field4"}]
excludeFieldNo-Nothe name of the field to exclude in the fetch process.field3
verifyFinalCountNoFALSENoCheck to execute an initial document count query that will be used at the end of the crawl to validate the total of crawled documents.False
sliceYes5NoThe number of slices to use for the queries5
pageSizeYes1000NoThe number of documents to get per request1000
scrollTimeYes5mNoThe time to keep each scroll request active5m
timeoutYes20000NoThe timeout to use for the connections to elastic20000
retriesYes3NoThe number of retries for each slice processing3
retryWaitTimeYes10000NoThe time in milliseconds to wait between each slice retry10000
retriesConnectionYes5NoThe number of retries for each Elasticsearch request5
retryWaitTimeConnectionYes60000NoThe time in milliseconds to wait between each Elasticsearch request retry60000
useThrottlingNoFALSENoCheck to enable connection throttlingFALSE
throttleRateInMillisNo5000NoOnly required if "Use Throttling" is true. The throttle rate in milliseconds5000
throttleConnectionRateNo750NoOnly required if "Use Throttling" is true. The number of connection to allow in the the specified throttle rate750

Example

PUT aspire/_api/connections/89d6632a-a296-426c-adb0-d442adcab4b0
{
    "id": "89d6632a-a296-426c-adb0-d442adcab4b0",
    "description": "MyElasticsearchConnection",
	"credential": null,
     "properties": {         
		"hostname": "localhost",
        "port": 9200, 
        "protocol": "https",
        "fetchDocuments": true,
        "useMGET": true,
        "waitBeforeFetching": false,
        "includeFields": [
            {"includeField": "field1"},
            {"includeField": "field2"}
        ],
        "excludeFields": [
            {"excludeField": "field3"},
            {"excludeField": "field4"}
        ],
        "verifyFinalCount": false,
        "slice": 5,
        "pageSize": 1000,
        "scrollTime": "5m",
        "timeout": 20000,
        "retries": 3,
        "retryWaitTime": 10000,
        "retriesConnection": 5,
        "retryWaitTimeConnection": 60000,
        "useThrottling": true,
        "throttleRateInMillis": 5000,
        "throttleConnectionRate": 750  
	}   
}

Create Connector Instance


For the creation of the Connector object using the Rest API, check this page

Update Connector Instance


For the update of the Connector object using the Rest API, check this page

Create Seed


Field

Required

Default

Multiple

Notes

Example

seedYes-NoThe elastic server hostnamelocalhost
typeYes-No

The value must be "elasticsearch".

"elasticsearch"
descriptionYes-NoName of the seed object.

"My Elasticsearch Seed"

connectorYes-NoThe ID of the connector to be used with this seed. The connector type must match the seed type."82f7f0a4-8d28-47ce-8c9d-e3ca414b0d31"
connectionYes-NoThe ID of the connection to be used with this seed. The connection type must match the seed type."602d3700-28dd-4a6a-8b51-e4a663fe9ee6"
workflowsNo[ ]YesThe IDs of the workflows that will be executed for the documents crawled.["f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"]
throttlePolicyNo-NoID of the throttle policy that applies to this connection object."f5587cee-9116-4011-b3a9-6b235b333a1b"
routingPoliciesNo[ ]YesThe IDs of the routing policies that this seed will use.["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"]
tagsNo[ ]YesThe tags of the seed. These can be used to filter the seed["tag1", "tag2"]
propertiesYes-NoConfiguration object
indexesYes-Yes The list of Elasticsearch indexes to crawl, it supports multiple indexes and the use of the wildcard "*". index1
indexYes-NoThe elastic index to crawl. Index name limitations: 1) Lowercase only. 2) Cannot include \\, \/, ?, \", <, >, |, (space character), ,, # 3) Cannot start with -, _, + 4) Cannot be . or ..[{"index":"index1"}]
snapshotsYesTRUENoSelect the crawl mode, a snapshot based crawl with deletes support or a timestamp based crawl with better performance but without support for deleted documents .TRUE
discoveryFieldsNo-YesOnly required if "Use Snapshots" is true. List of field names to be used to generate the documents' signature.[{"discoveryField":"last_modified"}]
discoveryFieldNo-NoOnly required if "Use Snapshots" is true. Name of the field to be used to generate the documents' signature.last_modified
discoveryQueryNo-NoOnly required if "snapshot" is true. The query to run for discovering documents. This query is used for full and incremental crawls. { "track_total_hits": true, "slice": { "id": {{sliceNumber}}, "max": {{sliceTotal}}  }, "size": {{pageSize}}, "_source": { "includes": ["last_modified"] }, "query": { "match_all": {} } }
timestampFieldNo-NoOnly required if "snapshot" is false.  The field that contains the timestamp of the documenttimestamp
discoveryQueryIncNo-NoOnly required if "snapshot" is false. The query to run for discovering documents for incremental crawls.{ "track_total_hits": true, "slice": { "id": {{sliceNumber}}, "max": {{sliceTotal}}  }, "size": {{pageSize}}, "_source": { "includes": ["last_modified"] }, "query": { "range" : { "connectorSpecific.timestamp" : { "gt" : {{timestamp}} } } } }
useLimitNoFALSENoCheck to limit how many items are selected from the indexFALSE
topLimitNo-NoOnly required if "useLimit" is true. The number of items to be crawled, since this connector uses slices and scrolls, this number is an approximation, and you could get a little more items100
makeIdUniqueNoFALSENoCheck to ensure unique documents IDs when crawling multiple indexes, if not checked ID collision could happen. This will be done by appending the index name with a delimiter to the ID. This option will be ignored if only a single index without wildcard (*) is specified.FALSE
idDelimiterNo-NoOnly required if "makeIdUnique" is true. The delimiter that will be used to append the index name to the document ID_
storeSpecificNoTRUENoCheck to keep the elastic connector metadata and to store all the fields of the elastic source as connector specific fields. If not checked, the elastic source will be used as the document metadata in the same format that it was retrievedTRUE

Example 

POST aspire/_api/seeds
{
    "type": "elasticsearch",
    "seed": "localhost",
    "connector": "82f7f0a4-8d28-47ce-8c9d-e3ca414b0d31",
    "description": "Elasticsearch_Test_Seed",
    "throttlePolicy": "6b8b5f23-fc77-47a1-9b58-106577162e7b",
    "routingPolicies": ["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"],
    "connection": "602d3700-28dd-4a6a-8b51-e4a663fe9ee6",
    "workflows": ["f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"],
    "tags": ["tag1", "tag2"],
    "properties": {
        "indexes": [
            {"index": "index1"},
            {"index": "index2"}
        ],
        "snapshots": true,
        "discoveryFields":  [
            {"discoveryField": "last_modified"}
        ],
        "discoveryQuery": "{ \"track_total_hits\": true, \"slice\": { \"id\": {{sliceNumber}}, \"max\": {{sliceTotal}}  }, \"size\": {{pageSize}}, \"_source\": { \"includes\": [\"last_modified\"] }, \"query\": { \"match_all\": {} } }",
        "timestampField": "timestamp",
        "discoveryQueryInc": "{ \"track_total_hits\": true, \"slice\": { \"id\": {{sliceNumber}}, \"max\": {{sliceTotal}}  }, \"size\": {{pageSize}}, \"_source\": { \"includes\": [\"last_modified\"] }, \"query\": { \"range\" : { \"connectorSpecific.timestamp\" : { \"gt\" : {{timestamp}} } } } }",
        "useLimit": true,
        "topLimit": 100,
        "makeIdUnique": true,
        "idDelimiter": "_",
        "storeSpecific": true 
    }
}

Update Seed


Field

Required

Default

Multiple

Notes

Example

idYes-NoID of the seed to update."2f287669-d163-4e35-ad17-6bbfe9df3778"
seedNo-NoThe elastic server hostnamelocalhost
descriptionNo-NoName of the seed object.

"MyElasticsearchSeed"

connectorNo-NoThe ID of the connector to be used with this seed. The connector type must match the seed type."82f7f0a4-8d28-47ce-8c9d-e3ca414b0d31"
connectionNo-NoThe ID of the connection to be used with this seed. The connection type must match the seed type."602d3700-28dd-4a6a-8b51-e4a663fe9ee6"
workflowsNo[ ]YesThe IDs of the workflows that will be executed for the documents crawled.["f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"]
workflows.addNo[ ]YesThe IDs of the workflows to add.["f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"]
workflows.removeNo[ ]YesThe IDs of the workflows to remove.["f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"]
throttlePolicyNo-NoID of the throttle policy that applies to this connection object."f5587cee-9116-4011-b3a9-6b235b333a1b"
routingPoliciesNo[ ]YesThe IDs of the routing policies that this seed will use.["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"]
routingPolicies.addNo[ ]YesThe IDs of the routingPolicies to add.["b4d2579f-1a0a-4a8b-9fd4-d42780003b36"]
routingPolicies.removeNo[ ]YesThe IDs of the routingPolicies to remove.["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7"]
tagsNo[ ]YesThe tags of the seed. These can be used to filter the seed["tag1", "tag3"]
tags.addNo[ ]YesThe tags to add["tag4"]
tags.removeNo[ ]YesThe tags to remove["tag2"]
propertiesYes-NoConfiguration object
indexesYes-Yes The list of Elasticsearch indexes to crawl, it supports multiple indexes and the use of the wildcard "*". index1
indexYes-NoThe elastic index to crawl. Index name limitations: 1) Lowercase only. 2) Cannot include \\, \/, ?, \", <, >, |, (space character), ,, # 3) Cannot start with -, _, + 4) Cannot be . or ..[{"index":"index1"}]
snapshotsYesTRUENoSelect the crawl mode, a snapshot based crawl with deletes support or a timestamp based crawl with better performance but without support for deleted documents .TRUE
discoveryFieldsNo-YesOnly required if "Use Snapshots" is true. List of field names to be used to generate the documents' signature.[{"discoveryField":"last_modified"}]
discoveryFieldNo-NoOnly required if "Use Snapshots" is true. Name of the field to be used to generate the documents' signature.last_modified
discoveryQueryNo-NoOnly required if "snapshot" is true. The query to run for discovering documents. This query is used for full and incremental crawls. { "track_total_hits": true, "slice": { "id": {{sliceNumber}}, "max": {{sliceTotal}}  }, "size": {{pageSize}}, "_source": { "includes": ["last_modified"] }, "query": { "match_all": {} } }
timestampFieldNo-NoOnly required if "snapshot" is false.  The field that contains the timestamp of the documenttimestamp
discoveryQueryIncNo-NoOnly required if "snapshot" is false. The query to run for discovering documents for incremental crawls.{ "track_total_hits": true, "slice": { "id": {{sliceNumber}}, "max": {{sliceTotal}}  }, "size": {{pageSize}}, "_source": { "includes": ["last_modified"] }, "query": { "range" : { "connectorSpecific.timestamp" : { "gt" : {{timestamp}} } } } }
useLimitNoFALSENoCheck to limit how many items are selected from the index.FALSE
topLimitNo-NoOnly required if "useLimit" is true. The number of items to be crawled, since this connector uses slices and scrolls, this number is an approximation, and you could get a little more items100
makeIdUniqueNoFALSENoCheck to ensure unique documents IDs when crawling multiple indexes, if not checked id collision could happen. This will be done by appending the index name with a delimiter to the ID. This option will be ignored if only a single index without wildcard (*) is specified.FALSE
idDelimiterNo-NoOnly required if "makeIdUnique" is true. The delimiter that will be used to append the index name to the document ID._
storeSpecificNoTRUENoCheck to keep the elastic connector metadata and to store all the fields of the elastic source as connector specific fields. If not checked, the elastic source will be used as the document metadata in the same format that it was retrieved.TRUE

Example 

PUT aspire/_api/seeds/2f287669-d163-4e35-ad17-6bbfe9df3778
{
    "id": "2f287669-d163-4e35-ad17-6bbfe9df3778",
    "seed": "localhost",
    "connector": "82f7f0a4-8d28-47ce-8c9d-e3ca414b0d31",
    "description": "Elasticsearch_Test_Seed",
    "throttlePolicy": "6b8b5f23-fc77-47a1-9b58-106577162e7b",
    "routingPolicies": ["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"],
    "connection": "602d3700-28dd-4a6a-8b51-e4a663fe9ee6",
    "workflows": ["b255e950-1dac-46dc-8f86-1238b2fbdf27", "f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"],
    "tags": ["tag", "tag2"],
    "properties": {          
		"indexes": [
            {"index": "index1"},
            {"index": "index2"}
        ],
        "snapshots": true,
        "discoveryFields":  [
            {"discoveryField": "last_modified"}
        ],
        "discoveryQuery": "{ \"track_total_hits\": true, \"slice\": { \"id\": {{sliceNumber}}, \"max\": {{sliceTotal}}  }, \"size\": {{pageSize}}, \"_source\": { \"includes\": [\"last_modified\"] }, \"query\": { \"match_all\": {} } }",
        "timestampField": "timestamp",
        "discoveryQueryInc": "{ \"track_total_hits\": true, \"slice\": { \"id\": {{sliceNumber}}, \"max\": {{sliceTotal}}  }, \"size\": {{pageSize}}, \"_source\": { \"includes\": [\"last_modified\"] }, \"query\": { \"range\" : { \"connectorSpecific.timestamp\" : { \"gt\" : {{timestamp}} } } } }",
        "useLimit": true,
        "topLimit": 100,
        "makeIdUnique": true,
        "idDelimiter": "_",
        "storeSpecific": true       
	}
}
  • No labels