Field | Required | Default | Multiple | Notes | Example |
---|---|---|---|---|---|
type | Yes | - | No | The value must be "elasticsearch". | "elasticsearch" |
description | Yes | - | No | Name of the credential object. | "ElasticsearchCredential" |
properties | Yes | - | No | Configuration object | |
authentication | Yes | "None" | No | The selected authentication method | "Basic" |
username | No | - | No | Only required if "Use Basic Authentication" is selected. The name of elasticsearch user to use. | testuser |
password | No | - | No | Only required if "Use Basic Authentication" is selected. The password of elasticsearch user to use. | Password123 |
region | No | - | No | Only required if "AWS Signature V4 Authentication" is selected. The Region of the ES service to use. | us-east-1 |
defaultAWS | No | TRUE | No | Enable this to use the Default AWS Credentials | |
accessKey | No | - | No | Only required if "Use the Default AWS Credentials" is false. The Access key of the ES service to use | |
secretKey | No | - | No | Only required if "Use the Default AWS Credentials" is false. The Secret key of the ES service to use |
{ "type": "elasticsearch", "description": "Elasticsearch Credential", "properties": { "authentication": "Basic", "username": "testuser", "password": "Password123", "region": "us-east-1", "defaultAWS": true, "accessKey": "xxxxxxxxxxxxxxxxxxxxxxx", "secretKey": "xxxxxxxxxxxxxxxxxxxxxxx" } }
Field | Required | Default | Multiple | Notes | Example |
---|---|---|---|---|---|
id | Yes | - | No | Id of the credential to update. | "2f287669-d163-4e35-ad17-6bbfe9df3778" |
description | Yes | - | No | Name of the credential object. | "ElasticsearchCredential" |
properties | Yes | - | No | Configuration object | |
authentication | Yes | "None" | No | The selected authentication method | "Basic" |
username | No | - | No | Only required if "Use Basic Authentication" is selected. The name of elasticsearch user to use. | testuser |
password | No | - | No | Only required if "Use Basic Authentication" is selected. The password of elasticsearch user to use. | Password123 |
region | No | - | No | Only required if "AWS Signature V4 Authentication" is selected. The Region of the ES service to use. | us-east-1 |
defaultAWS | No | TRUE | No | Enable this to use the Default AWS Credentials | |
accessKey | No | - | No | Only required if "Use the Default AWS Credentials" is false. The Access key of the ES service to use | |
secretKey | No | - | No | Only required if "Use the Default AWS Credentials" is false. The Secret key of the ES service to use |
{ "id": "2f287669-d163-4e35-ad17-6bbfe9df3778", "description": "Elasticsearch Credential", "properties": { "authentication": "Basic", "username": "testuser", "password": "Password123", "region": "us-east-1", "defaultAWS": true, "accessKey": "xxxxxxxxxxxxxxxxxxxxxxx", "secretKey": "xxxxxxxxxxxxxxxxxxxxxxx" } }
Field | Required | Default | Multiple | Notes | Example |
---|---|---|---|---|---|
type | Yes | - | No | The value must be "elasticsearch" | "elasticsearch" |
description | Yes | - | No | Name of the connection object. | "MyElasticsearchConnection" |
throttlePolicy | No | - | No | Id of the throttle policy that applies to this connection object. | "f5587cee-9116-4011-b3a9-6b235b333a1b" |
routingPolicies | No | [ ] | Yes | The ids of the routing policies that this connection will use. | ["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"] |
properties | Yes | - | No | Configuration object | |
hostname | Yes | "localhost" | No | The elastic server hostname | localhost |
port | Yes | 9200 | No | The elastic server port | 9200 |
protocol | No | - | No | The elastic server url protocol | https |
fetchDocuments | No | TRUE | No | Check to fetch the documents content | TRUE |
useMGET | No | TRUE | No | Check to user MGET for fetching the documents. If not individual GET requests will be executed for each document | TRUE |
waitBeforeFetching | No | FALSE | No | Check to make the fetch process wait for discovery process to be done | FALSE |
includeFields | No | - | Yes | The specified fields will be included in the fetch process of the document. | [{"includeField":"field1"}, {"includeField":"field2"}] |
includeField | No | - | No | the name of the field to include in the fetch process. | field1 |
excludeFields | No | - | Yes | The specified fields will be excluded in the fetch process of the document. | [{"excludeField":"field3"}, {"excludeField":"field4"}] |
excludeField | No | - | No | the name of the field to exclude in the fetch process. | field3 |
verifyFinalCount | No | FALSE | No | Check to execute an initial document count query that will be used at the end of the crawl to validate the total of crawled documents. | False |
slice | Yes | 5 | No | The number of slices to use for the queries | 5 |
pageSize | Yes | 1000 | No | The number of documents to get per request | 1000 |
scrollTime | Yes | 5m | No | The time to keep each scroll request active | 5m |
timeout | Yes | 20000 | No | The timeout to use for the connections to elastic | 20000 |
retries | Yes | 3 | No | The number of retries for each slice processing | 3 |
retryWaitTime | Yes | 10000 | No | The time in millis to wait between each slice retry | 10000 |
retriesConnection | Yes | 5 | No | The number of retries for each elasticsearch request | 5 |
retryWaitTimeConnection | Yes | 60000 | No | The time in millis to wait between each elasticsearch request retry | 60000 |
useThrottling | No | FALSE | No | Check to enable connection throttling | FALSE |
throttleRateInMillis | No | 5000 | No | Only required if "Use Throttling" is true. The throttle rate in milliseconds | 5000 |
throttleConnectionRate | No | 750 | No | Only required if "Use Throttling" is true. The number of connection to allow in the the specified throttle rate | 750 |
{ "type": "elasticsearch", "description": "MyElasticsearchConnection", "properties": { "hostname": "localhost", "port": 9200, "protocol": "https", "fetchDocuments": true, "useMGET": true, "waitBeforeFetching": false, "includeFields": [ {"includeField": "field1"}, {"includeField": "field2"} ], "excludeFields": [ {"excludeField": "field3"}, {"excludeField": "field4"} ], "verifyFinalCount": false, "slice": 5, "pageSize": 1000, "scrollTime": "5m", "timeout": 20000, "retries": 3, "retryWaitTime": 10000, "retriesConnection": 5, "retryWaitTimeConnection": 60000, "useThrottling": true, "throttleRateInMillis": 5000, "throttleConnectionRate": 750 } }
Field | Required | Default | Multiple | Notes | Example |
---|---|---|---|---|---|
id | Yes | - | No | Id of the connection to update | "89d6632a-a296-426c-adb0-d442adcab4b0", |
description | No | - | No | Name of the connection object. | "MyElasticsearchConnection" |
throttlePolicy | No | - | No | Id of the throttle policy that applies to this connection object. | "f5587cee-9116-4011-b3a9-6b235b333a1b" |
routingPolicies | No | [ ] | Yes | The ids of the routing policies that this connection will use. | ["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"] |
properties | Yes | - | No | Configuration object | |
hostname | Yes | "localhost" | No | The elastic server hostname | localhost |
port | Yes | 9200 | No | The elastic server port | 9200 |
protocol | No | - | No | The elastic server url protocol | https |
fetchDocuments | No | TRUE | No | Check to fetch the documents content | TRUE |
useMGET | No | TRUE | No | Check to user MGET for fetching the documents. If not individual GET requests will be executed for each document | TRUE |
waitBeforeFetching | No | FALSE | No | Check to make the fetch process wait for discovery process to be done | FALSE |
includeFields | No | - | Yes | The specified fields will be included in the fetch process of the document. | [{"includeField":"field1"}, {"includeField":"field2"}] |
includeField | No | - | No | the name of the field to include in the fetch process. | field1 |
excludeFields | No | - | Yes | The specified fields will be excluded in the fetch process of the document. | [{"excludeField":"field3"}, {"excludeField":"field4"}] |
excludeField | No | - | No | the name of the field to exclude in the fetch process. | field3 |
verifyFinalCount | No | FALSE | No | Check to execute an initial document count query that will be used at the end of the crawl to validate the total of crawled documents. | False |
slice | Yes | 5 | No | The number of slices to use for the queries | 5 |
pageSize | Yes | 1000 | No | The number of documents to get per request | 1000 |
scrollTime | Yes | 5m | No | The time to keep each scroll request active | 5m |
timeout | Yes | 20000 | No | The timeout to use for the connections to elastic | 20000 |
retries | Yes | 3 | No | The number of retries for each slice processing | 3 |
retryWaitTime | Yes | 10000 | No | The time in millis to wait between each slice retry | 10000 |
retriesConnection | Yes | 5 | No | The number of retries for each elasticsearch request | 5 |
retryWaitTimeConnection | Yes | 60000 | No | The time in millis to wait between each elasticsearch request retry | 60000 |
useThrottling | No | FALSE | No | Check to enable connection throttling | FALSE |
throttleRateInMillis | No | 5000 | No | Only required if "Use Throttling" is true. The throttle rate in milliseconds | 5000 |
throttleConnectionRate | No | 750 | No | Only required if "Use Throttling" is true. The number of connection to allow in the the specified throttle rate | 750 |
{ "id": "89d6632a-a296-426c-adb0-d442adcab4b0", "description": "MyElasticsearchConnection", "properties": { "hostname": "localhost", "port": 9200, "protocol": "https", "fetchDocuments": true, "useMGET": true, "waitBeforeFetching": false, "includeFields": [ {"includeField": "field1"}, {"includeField": "field2"} ], "excludeFields": [ {"excludeField": "field3"}, {"excludeField": "field4"} ], "verifyFinalCount": false, "slice": 5, "pageSize": 1000, "scrollTime": "5m", "timeout": 20000, "retries": 3, "retryWaitTime": 10000, "retriesConnection": 5, "retryWaitTimeConnection": 60000, "useThrottling": true, "throttleRateInMillis": 5000, "throttleConnectionRate": 750 } }
For the creation of the Connector object using the Rest API check this page
Field | Required | Default | Multiple | Notes | Example |
---|---|---|---|---|---|
seed | Yes | - | No | The elastic server hostname | localhost |
type | Yes | - | No | The value must be "elasticsearch". | "elasticsearch" |
description | Yes | - | No | Name of the seed object. | "My Elasticsearch Seed" |
connector | Yes | - | No | The id of the connector to be used with this seed. The connector type must match the seed type. | "82f7f0a4-8d28-47ce-8c9d-e3ca414b0d31" |
connection | Yes | - | No | The id of the connection to be used with this seed. The connection type must match the seed type. | "602d3700-28dd-4a6a-8b51-e4a663fe9ee6" |
workflows | No | [ ] | Yes | The ids of the workflows that will be executed for the documents crawled. | ["f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"] |
throttlePolicy | No | - | No | Id of the throttle policy that applies to this connection object. | "f5587cee-9116-4011-b3a9-6b235b333a1b" |
routingPolicies | No | [ ] | Yes | The ids of the routing policies that this seed will use. | ["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"] |
tags | No | [ ] | Yes | The tags of the seed. These can be used to filter the seed | ["tag1", "tag2"] |
properties | Yes | - | No | Configuration object | |
indexes | Yes | - | Yes | The list of elasticsearch indexes to crawl, it supports multiple indexes and the use of the wildcard "*". | index1 |
index | Yes | - | No | The elastic index to crawl. Index name limitations: 1) Lowercase only. 2) Cannot include \\, \/, ?, \", <, >, |, (space character), ,, # 3) Cannot start with -, _, + 4) Cannot be . or .. | [{"index":"index1"}] |
snapshots | Yes | TRUE | No | Select the crawl mode, a snapshot based crawl with deletes support or a timestamp based crawl with better performance but without support for deleted documents . | TRUE |
discoveryFields | No | - | Yes | Only required if "Use Snapshots" is true. List of field names to be used to generate the documents signature. | [{"discoveryField":"last_modified"}] |
discoveryField | No | - | No | Only required if "Use Snapshots" is true. Name of the field to be used to generate the documents signature. | last_modified |
discoveryQuery | No | - | No | Only required if "snapshot" is true. The query to run for discovering documents. This query is used for full and incremental crawls. | { "track_total_hits": true, "slice": { "id": {{sliceNumber}}, "max": {{sliceTotal}} }, "size": {{pageSize}}, "_source": { "includes": ["last_modified"] }, "query": { "match_all": {} } } |
timestampField | No | - | No | Only required if "snapshot" is false. The field that contains the timestamp of the document | timestamp |
discoveryQueryInc | No | - | No | Only required if "snapshot" is false. The query to run for discovering documents for incremental crawls. | { "track_total_hits": true, "slice": { "id": {{sliceNumber}}, "max": {{sliceTotal}} }, "size": {{pageSize}}, "_source": { "includes": ["last_modified"] }, "query": { "range" : { "connectorSpecific.timestamp" : { "gt" : {{timestamp}} } } } } |
useLimit | No | FALSE | No | Check to limit how many items are selected from the index | FALSE |
topLimit | No | - | No | Only required if "useLimit" is true.The number of items to be crawled, since this connector uses slices and scrolls, this number is an approximation and you could get a little bit more items | 100 |
makeIdUnique | No | FALSE | No | Check to ensure unique documents ids when crawling multiple indexes, if not checked id collision could happen. This will be done by appending the index name with a delimiter to the id. This option will be ignored if only a single index without wildcard (*) is specified. | FALSE |
idDelimiter | No | - | No | Only required if "makeIdUnique" is true. The delimiter that will be used to append the index name to the document id | _ |
storeSpecific | No | TRUE | No | Check to keep the elastic connector metadata and to store all the fields of the elastic source as connector specific fields. If not checked, the elastic source will be used as the document metadata in the same format that it was retrieved | TRUE |
{ "type": "elasticsearch", "seed": "localhost", "connector": "82f7f0a4-8d28-47ce-8c9d-e3ca414b0d31", "description": "Elasticsearch_Test_Seed", "throttlePolicy": "6b8b5f23-fc77-47a1-9b58-106577162e7b", "routingPolicies": ["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"], "connection": "602d3700-28dd-4a6a-8b51-e4a663fe9ee6", "workflows": ["f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"], "tags": ["tag1", "tag2"], "properties": { "indexes": [ {"index": "index1"}, {"index": "index2"} ], "snapshots": true, "discoveryFields": [ {"discoveryField": "last_modified"} ], "discoveryQuery": "{ \"track_total_hits\": true, \"slice\": { \"id\": {{sliceNumber}}, \"max\": {{sliceTotal}} }, \"size\": {{pageSize}}, \"_source\": { \"includes\": [\"last_modified\"] }, \"query\": { \"match_all\": {} } }", "timestampField": "timestamp", "discoveryQueryInc": "{ \"track_total_hits\": true, \"slice\": { \"id\": {{sliceNumber}}, \"max\": {{sliceTotal}} }, \"size\": {{pageSize}}, \"_source\": { \"includes\": [\"last_modified\"] }, \"query\": { \"range\" : { \"connectorSpecific.timestamp\" : { \"gt\" : {{timestamp}} } } } }", "useLimit": true, "topLimit": 100, "makeIdUnique": true, "idDelimiter": "_", "storeSpecific": true } }
Field | Required | Default | Multiple | Notes | Example |
---|---|---|---|---|---|
id | Yes | - | No | Id of the seed to update. | "2f287669-d163-4e35-ad17-6bbfe9df3778" |
seed | No | - | No | The elastic server hostname | localhost |
description | No | - | No | Name of the seed object. | "MyElasticsearchSeed" |
connector | No | - | No | The id of the connector to be used with this seed. The connector type must match the seed type. | "82f7f0a4-8d28-47ce-8c9d-e3ca414b0d31" |
connection | No | - | No | The id of the connection to be used with this seed. The connection type must match the seed type. | "602d3700-28dd-4a6a-8b51-e4a663fe9ee6" |
workflows | No | [ ] | Yes | The ids of the workflows that will be executed for the documents crawled. | ["f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"] |
workflows.add | No | [ ] | Yes | The ids of the workflows to add. | ["f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"] |
workflows.remove | No | [ ] | Yes | The ids of the workflows to remove. | ["f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"] |
throttlePolicy | No | - | No | Id of the throttle policy that applies to this connection object. | "f5587cee-9116-4011-b3a9-6b235b333a1b" |
routingPolicies | No | [ ] | Yes | The ids of the routing policies that this seed will use. | ["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"] |
routingPolicies.add | No | [ ] | Yes | The ids of the routingPolicies to add. | ["b4d2579f-1a0a-4a8b-9fd4-d42780003b36"] |
routingPolicies.remove | No | [ ] | Yes | The ids of the routingPolicies to remove. | ["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7"] |
tags | No | [ ] | Yes | The tags of the seed. These can be used to filter the seed | ["tag1", "tag3"] |
tags.add | No | [ ] | Yes | The tags to add | ["tag4"] |
tags.remove | No | [ ] | Yes | The tags to remove | ["tag2"] |
properties | Yes | - | No | Configuration object | |
indexes | Yes | - | Yes | The list of elasticsearch indexes to crawl, it supports multiple indexes and the use of the wildcard "*". | index1 |
index | Yes | - | No | The elastic index to crawl. Index name limitations: 1) Lowercase only. 2) Cannot include \\, \/, ?, \", <, >, |, (space character), ,, # 3) Cannot start with -, _, + 4) Cannot be . or .. | [{"index":"index1"}] |
snapshots | Yes | TRUE | No | Select the crawl mode, a snapshot based crawl with deletes support or a timestamp based crawl with better performance but without support for deleted documents . | TRUE |
discoveryFields | No | - | Yes | Only required if "Use Snapshots" is true. List of field names to be used to generate the documents signature. | [{"discoveryField":"last_modified"}] |
discoveryField | No | - | No | Only required if "Use Snapshots" is true. Name of the field to be used to generate the documents signature. | last_modified |
discoveryQuery | No | - | No | Only required if "snapshot" is true. The query to run for discovering documents. This query is used for full and incremental crawls. | { "track_total_hits": true, "slice": { "id": {{sliceNumber}}, "max": {{sliceTotal}} }, "size": {{pageSize}}, "_source": { "includes": ["last_modified"] }, "query": { "match_all": {} } } |
timestampField | No | - | No | Only required if "snapshot" is false. The field that contains the timestamp of the document | timestamp |
discoveryQueryInc | No | - | No | Only required if "snapshot" is false. The query to run for discovering documents for incremental crawls. | { "track_total_hits": true, "slice": { "id": {{sliceNumber}}, "max": {{sliceTotal}} }, "size": {{pageSize}}, "_source": { "includes": ["last_modified"] }, "query": { "range" : { "connectorSpecific.timestamp" : { "gt" : {{timestamp}} } } } } |
useLimit | No | FALSE | No | Check to limit how many items are selected from the index | FALSE |
topLimit | No | - | No | Only required if "useLimit" is true.The number of items to be crawled, since this connector uses slices and scrolls, this number is an approximation and you could get a little bit more items | 100 |
makeIdUnique | No | FALSE | No | Check to ensure unique documents ids when crawling multiple indexes, if not checked id collision could happen. This will be done by appending the index name with a delimiter to the id. This option will be ignored if only a single index without wildcard (*) is specified. | FALSE |
idDelimiter | No | - | No | Only required if "makeIdUnique" is true. The delimiter that will be used to append the index name to the document id | _ |
storeSpecific | No | TRUE | No | Check to keep the elastic connector metadata and to store all the fields of the elastic source as connector specific fields. If not checked, the elastic source will be used as the document metadata in the same format that it was retrieved | TRUE |
{ "id": "2f287669-d163-4e35-ad17-6bbfe9df3778", "seed": "localhost", "connector": "82f7f0a4-8d28-47ce-8c9d-e3ca414b0d31", "description": "Elasticsearch_Test_Seed", "throttlePolicy": "6b8b5f23-fc77-47a1-9b58-106577162e7b", "routingPolicies": ["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"], "connection": "602d3700-28dd-4a6a-8b51-e4a663fe9ee6", "workflows": ["b255e950-1dac-46dc-8f86-1238b2fbdf27", "f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"], "tags": ["tag", "tag2"], "properties": { "indexes": [ {"index": "index1"}, {"index": "index2"} ], "snapshots": true, "discoveryFields": [ {"discoveryField": "last_modified"} ], "discoveryQuery": "{ \"track_total_hits\": true, \"slice\": { \"id\": {{sliceNumber}}, \"max\": {{sliceTotal}} }, \"size\": {{pageSize}}, \"_source\": { \"includes\": [\"last_modified\"] }, \"query\": { \"match_all\": {} } }", "timestampField": "timestamp", "discoveryQueryInc": "{ \"track_total_hits\": true, \"slice\": { \"id\": {{sliceNumber}}, \"max\": {{sliceTotal}} }, \"size\": {{pageSize}}, \"_source\": { \"includes\": [\"last_modified\"] }, \"query\": { \"range\" : { \"connectorSpecific.timestamp\" : { \"gt\" : {{timestamp}} } } } }", "useLimit": true, "topLimit": 100, "makeIdUnique": true, "idDelimiter": "_", "storeSpecific": true } }