Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The Job Summarizer Executor can be configured using the Aspire Admin UI from the Workflow page and by clicking the workflow where the component will be used.

Easy Heading Free
navigationTitleOn this Page

Step 1. Launch Aspire and Open the Content Source Management Page.

Launch Aspire (if it's not already running). See:


Step 2. Add or select a Workflow.

  • Add a new workflow or open an existing workflow.
  • For this step, please refer to the Workflow Introduction.


Step 3. Add the Parquet Summarizer Executor to the Workflow.

  • Select the event for which you want to add the Job Summarizer Executor to, from the Event combo.
  • To add the component, drag the Job Summarizer Executor from the Rules Section on the right side of the screen and drop it below the Workflow Event to the left side of the screen. This will automatically open the Parquet Summarizer Executor window for the configuration of the component.

Step 3a. Specify a description for the application.

 In the top section of the Parquet Summarizer Executor configuration window, specify the description for the application.


Step 3b. Specify the executor configuration.

General

  1. Tables Data Path: The path of the job that contains the tables' data.
  2. Table Object Path: The sub path of the data that contains each table.
  3. Table ID Path: The sub path of table data that contains the table ID.
  4. Seed ID Path: The sub path of table data that contains the seed ID.
  5. Columns Path: The sub path of table objects that contains the columns' information.
  6. Column Name Path: The sub path of column objects that contains the column name
  7. Column Type Path: The sub path of column objects that contains the column type.
  8. Columns Patterns: The columns patterns used to detect each column type.
    1. Field Type: The data type to use for the specified pattern.
    2. Pattern: The pattern to match
  9. Processed Rows Log Frequency: The frequency for reporting the processed rows.
  10. Use row filter: Check to filter the rows to process
    1. Use groovy file: Enable to use a groovy file to filter the rows
      1. Groovy Script Path: The path of the groovy script that contains the filter logic.  It must return a boolean value. If true, the row will be filtered.
      2. Filter Script: Script used to filter the rows. It must return a boolean value, if true, the row will be filtered.

Elasticsearch Settings

  1. Server URL: The ES server URL.
  2. Authentication
    1. Basic
      1. Username: User with the permissions to read from the Elastic index specified.
      2. Password: The password for the specified user.
    2. AWS
      1. Use credentials provider chain: Enables the AWS Credentials Provider Chain.
      2. Access Key: Key utilized to access AWS.
      3. Secret Key: Secret key for the access key.
      4. Assume another role: Check to assume the specified role to get the credentials.
        1. Role ARN: The Role ARN to assume.
  3. Index: The elastic index to use.
  4. Query: The query for fetching the unique values. The placeholders ${seedId} and ${tableId}.
  5. Use Unique Values: If checked, the expected row format will be the one used for unique values; if not, it will use the _source content as the row body.
  6. Scroll timeout: The time to keep each scroll request active.

Connection Settings

  1. Idle connection timeout: Maximum time (in milliseconds) to keep an idle connection open.
  2. Max connections: Maximum number of connections to be opened.
  3. Connections per target: Maximum number of connections opened for the same target.
  4. Connection timeout: Maximum time (in milliseconds) to wait for the connection.
  5. Socket timeout: Maximum time (in milliseconds) to wait for a socket response. 
  6. Connection throttling: Check to enable connection throttling.
    1. Throttling period: Time period (in milliseconds) to throttle the connection.
    2. Max connections per period: Maximum number of connections used during the throttling period.
  7. Maximum retries: Maximum number of retries for each request.
  8. Retry delay: Time (in milliseconds) to wait before a retry.