Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The Elasticsearch Connector can be configured using the Aspire Admin UI. It requires the following entities to be created:

  • Credential
  • Connection
  • Connector
  • Seed

Easy Heading Free
navigationTitleOn this Page
wrapNavigationTexttrue
navigationExpandOptionexpand-all-by-default

Create Credential


  1. On the Aspire Admin UI go to the credentials page
  2. All existing credentials will be listed. Click on the new button
  3. Enter the new credential description.
  4. Select Elasticsearch from the Type list.
  5. General:
    1. No Authentication: Select this if authentication is not required.
    2. Use Basic Authentication: Select this to enable basic user authentication.
      1. Username: The name of Elasticsearch user to use.
      2. Password: The password of Elasticsearch user to use.

    3. AWS Signature V4: Select this to enable AWS Signature V4 authentication.
      1. Region: The Region of the ES service to use i.e: us-east-1.
      2. Use Default AWS Credentials: Check this to use default AWS credentials.
      3. Access Key: The Access key of the ES service to use. Only required if "Use Default AWS Credentials" is unchecked.
      4. Secret Key: The Secret key of the ES service to use. Only required if "Use Default AWS Credentials" is unchecked.



Create Connection


  1. On the Aspire Admin UI go to the connections page
  2. All existing connections will be listed. Click on the new button
  3. Enter the new connection description. 
  4. Select Elasticsearch from the Type list.
  5. General:
    1. Hostname: The Elasticsearch server hostname.
    2. Port: The Elasticsearch server port number.
    3. Protocol: The Elasticsearch server URL protocol.
    4. Should Fetch Documents: Check to fetch the documents content.
    5. Use MGET for fetching: Check to user MGET for fetching the documents. If not individual GET requests will be executed for each document.
    6. Wait for discovery before fetching: Check to make the fetch process wait for discovery process to be done.
    7. Include Fields: The specified fields will be included in the fetch process of the document.
      1. Include Field: Enter the name of the field to include in the fetch process.
    8. Exclude Fields: The specified fields will be excluded in the fetch process of the document.
      1. Exclude Field: Enter the name of the field to exclude in the fetch process.
  6. Network:
    1. Number of slices: The number of slices to use for the queries.
    2. Page size: The number of documents to get per request.
    3. Scroll timeout: The time to keep each scroll request active.
    4. Connection timeout: The timeout to use for the connections to elastic.
    5. Number of Slice Retries: The number of retries for each slice processing.
    6. Slice Retry Wait Time: The time in milliseconds to wait between each slice retry.
    7. Number of Request Retries: The number of retries for each Elasticsearch request.
    8. Requests Retry Wait Time: The time in milliseconds to wait between each Elasticsearch request retry.
    9. Use Connection Throttling: Check to enable connection throttling.
      1. Throttle Rate in Millis: The throttle rate in milliseconds.
      2. Connections Rate: The number of connection to allow in the the specified throttle rate.
  7. Credentials:
    1. Select credential for this connection.
  8. Policies:
    1. Throttle Policy: Select the throttle policy that applies to this connection object.
    2. Routing Policies: Select the routing policies that this connection will use.








Create Connector Instance


For the creation of the Connector object using the Admin UI check this page.


Create Seed 


  1. On the Aspire Admin UI go to the seeds page
  2. All existing seed will be listed. Click on the new button
  3. Enter the new seed description.
  4. Select Elasticsearch from the Type list.
  5. Seed:
    1. Hostname: The Elasticsearch server hostname.
    2. Index: The Elasticsearch index to crawl, it supports multiple indexes and the use of the wildcard "*".
    3. Crawl Mode: Select the crawl mode, a snapshot based crawl with deletes support or a timestamp based crawl with better performance but without support for deleted documents .
      1. Use Snapshots: Select this for a snapshot based crawl.
        • Discovery Signature Fields: Enter the name of the fields to be used to generate the documents signature.
        • Discovery Query: The query to run for discovering documents. This query is used for full and incremental crawls.
          The slice and size sections are required and must contain the placeholders {{sliceNumber}}, {{slizeTotal}}, {{pageSize}}".
          Use track_total_hits : true for ES version 7 or above.
      2. Use timestamp: Select this for a timestamp based crawl.
        • Timestamp field: The field that contains the timestamp of the document.
        • Discovery Query: The query to run for discovering documents. This query is used for full crawls.
          The slice and size sections are required and must contain the placeholders {{sliceNumber}}, {{slizeTotal}}, {{pageSize}}".
          Use track_total_hits : true for ES version 7 or above.
    4. Limit Extracted Items: Check to limit how many items are selected from the index
      1. Limit: The number of items to be crawled, since this connector uses slices and scrolls, this number is an approximation and you could get a little bit more items.
    5. Ensure Unique Id: Check to ensure unique documents ids when crawling multiple indexes, if not checked id collision could happen.
      This will be done by appending the index name with a delimiter to the id.
      This option will be ignored if only a single index without wildcard (*) is specified.
      1. Id Delimiter: The delimiter that will be used to append the index name to the document id.
    6. Store source as connector specific fields: Check to keep the Elasticsearch connector metadata and to store all the fields of the Elasticsearch source as connector specific fields. If not checked, the Elasticsearch source will be used as the document metadata in the same format that it was retrieved.
  6. Connector:
    1. The ID of the connector to be used with this seed. The connector type must match the seed type.
  7. Connection:
    1. The ID of the connection to be used with this seed. The connection type must match the seed type.
  8. Workflow:
    1. The IDs of the workflows that will be executed for the documents crawled.
  9. Tag: 
    1. The tags of the seed, these can be used to filter the seed.
  10. Policies
    1. Throttle Policy: Select the throttle policy that applies to this connection object.
    2. Routing Policies: Select the routing policies that this connection will use.





Image Added


Image Added

Image Added

Image Added