Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.



Easy Heading Free
navigationTitleOn this Page
wrapNavigationTexttrue
navigationExpandOptionexpand-all-by-default


Panel
borderColorCCCCCC
borderWidth1
borderStylesolid
titleOn this page

Step 1. Launch Aspire and Open the Content Source Management Page

Launch Aspire (if it's not already running). See:



Step 2. Add or select a Workflow

  • Add a new workflow or open an existing workflow.
  • For this step, please refer to the Workflow Introduction.



Step 3. Add the Elasticsearch Publisher to the Workflow

  • Select the event for which you want to add the Elasticsearch Publisher to, from the Event combo.
  • To add a an Elasticsearch Publisher, drag the Elasticsearch Publisher from the Rules Section on the right side of the screen and drop it below the Workflow Event to the left side of the screen. This will automatically open the Elasticsearch Publisher window for the configuration of the publisher.


Image Added

Step 3a. Specify a description for the Publisher

 In the top section of the ElasticSearchElasticsearch Publisher configuration window, specify the description for the publisher.



Step 3b. Specify Server Configuration

 In the Server section of the ElasticSearchElasticsearch Publisher configuration, specify the information related to the server.

  1. ElasticSearch Elasticsearch URL: Select how you want to enter the ElasticSearch Elasticsearch URL
    1. Host and Port
      • ElasticSearch Elasticsearch HostEnter the ElasticSearch Elasticsearch host.
      • ElasticSearch Elasticsearch PortEnter the ElasticSearch port Elasticsearch port (9200 by default)
    2. Complete UrlURL
      • ElasticSearch Elasticsearch URLEnter the url URL for the ElasticSearch Elasticsearch bulk index endpoint, it must have this format format <protocol>://<host>:<port>/_bulk
  2. ElasticSearch Elasticsearch IndexEnter the index to which the jobs are going to be publishpublished.


Image Added


Step 3c. Specify Authentication Configuration

 In the Authentication section of the ElasticSearchElasticsearch Publisher configuration, specify the authentication information.

  1. None: The server requires no authentication
  2. Basic: Provide credentials for basic authentication
    1. User:Provide the user for basic authentication.
    2. Password: Provide the password for basic authentication.
  3. Amazon Web Service (AWS): Provide the configuration to authenticate using AWS
    1. Region: Specify the AWS region to use.
    2. Use Credentials Provider Chain: Enable to specify a credentials provider chain
      1. Access Key: Provide the access key for authentication with AWS.
      2. Secret Key: Provide the secret key for authentication with AWS.


Image Added

Step 3d. Specify Transform Documents

 In the Transform Documents section of the ElasticSearch Publisher configuration, specify the groovy transformation file path.Elasticsearch Publisher configuration, you can choose between specifying a Local Transform File or picking from a previously uploaded Resources Transform File:

  1. Local Transform FileGroovy Transform: the default value is set to "${component.home}/config/groovy/transform.groovy" for the default JSON transformation file provided with Aspire. To use a custom file, follow the instructions in JSON Transformation.

  2. Resources Transform File: pick the appropriate file that was previously uploaded by using Aspire's "Resources" feature.

Image Added


Image Added


Step 3e. Specify Pre- / Post-Processing Options

 In the Pre- / Post-Processing section of the ElasticSearch Publisher Elasticsearch Publisher configuration, specify the Pre- / Post-Processing configuration options.

  1. Clear Index on Full Crawl: Select to clear the index on full crawls.
    1. Clear Index by: Select the approach to clear the index.
      1. Deleting All Documents: Deletes the documents from the index.
      2. Delete index: Deletes the index completely. When deleting the index you can choose to upload mappings or not.  If you choose to do so, you can either specify the index configuration or use a file.
        1. Index Configuration: specify the index configuration in the corresponding field.
        2. Index File: either use a local mappings file or pick one from a previously uploaded resources file.


Image Added

Image Added

Image Added

Image Added



Step 3f. Specify Connection Settings Values

 In the Connection Settings section of the ElasticSearchElasticsearch Publisher configuration, specify the Connection Settings values for the connection to the server.

  1. Connection Pool: Connection pool settings.

    1. Idle Connection Timeout: Maximum time (in milliseconds) to keep an idle connection open.
    2. Max Connections: Maximum number of connections to be opened.
    3. Connections per Target: Maximum number of connections opened for the same target.
  2. Timeout Settings: Connection pool timeout settings.
    1. Connection Timeout: Maximum time (in milliseconds) to wait for the connection.
    2. Socket Timeout: Maximum time (in milliseconds) to wait for a socket response.
  3. Connection Throttling: Enable to specify Throttling Settings.
    1. Throttling Period: Time period (in milliseconds) to throttle the connection.
    2. Max Connections per Period: Maximum number of connections used during the Throttling Period.
  4. Retries:
    1. Maximum Retries: Maximum number of retries for a failed document.
    2. Retry Delay: Time period (in milliseconds) to wait before a retry.


Image Added

Step 3g. Specify

Index Dump

Batching Configuration

 In the Index Dump Debug section of the ElasticSearch Publisher Elasticsearch Publisher configuration, specify the Index Dump batching configuration values.

  • Max Results per Request: maximum number of documents that the search engine can fetch in a single query.
  • Page Size: maximum number of documents to fetch by query page.
  • Id field: the name of the field containing the document id, relative to the top level "hits" node in Elasticsearch.
  • Url field: the name of the field containing the document url, relative to the top level "hits" node in Elasticsearch.
  • Timestamp field: the name of the field holding the document feed timestamp, relative to the top level "hits" node in Elasticsearch.
    1. Scanner Job Batch Size: Maximum size of the batches that will be created.
    2. Simultaneous Batches: Number of batches that will be processed simultaneously.
    3. Batch Timeout: Period (in ms) after which a batch of documents will be closed and executed.


    Image Added

    Step 3h. Specify Debug Configuration

     In the Debug section of the ElasticSearchElasticsearch Publisher configuration, specify the Debug flag.

    1. Debug: Check to enable debug mode to show debug messages from the publisher.

    Image Added


    Image Removed

    Step 3i. Click on the Add button

    Once you click the add button, the Elasticsearch Publisher settings will be saved.

    Image Removed

    Image Removed

    Image Removed

    Image Removed

    Image Removed

    Image Removed

    Image Removed