Page History

The Job Summarizer Executor can be configured using the Aspire Admin UI from the Workflow page and by clicking the workflow where the component will be used.

Easy Heading Free

navigationTitle	On this Page

Step 3b. Specify the executor configuration.

General

Tables Data Path: The path of the job that contains the tables' data.
Table Object Path: The sub path of the data that contains each table.
Table ID Path: The sub path of table data that contains the table ID.
Seed ID Path: The sub path of table data that contains the seed ID.
Columns Path: The sub path of table objects that contains the columns' information.
Column Name Path: The sub path of column objects that contains the column name
Column Type Path: The sub path of column objects that contains the column type.
Columns Patterns: The columns patterns used to detect each column type.
1. Field Type: The data type to use for the specified pattern.
2. Pattern: The pattern to match
Processed Rows Log Frequency: The frequency for reporting the processed rows.
Use row filter: Check to filter the rows to process
1. Use groovy file: Enable to use a groovy file to filter the rows
  1. Groovy Script Path: The path of the groovy script that contains the filter logic. It must return a boolean value. If true, the row will be filtered.
  2. Filter Script: Script used to filter the rows. It must return a boolean value, if true, the row will be filtered.

Elasticsearch Settings

Server URL: The ES server URL.
Authentication
1. Basic
  1. Username: User with the permissions to read from the Elastic index specified.
  2. Password: The password for the specified user.
2. AWS
  1. Use credentials provider chain: Enables the AWS Credentials Provider Chain.
  2. Access Key: Key utilized to access AWS.
  3. Secret Key: Secret key for the access key.
  4. Assume another role: Check to assume the specified role to get the credentials.
    1. Role ARN: The Role ARN to assume.
Index: The elastic index to use.
Query: The query for fetching the unique values. The placeholders ${seedId} and ${tableId}.
Use Unique Values: If checked, the expected row format will be the one used for unique values; if not, it will use the _source content as the row body.
Scroll timeout: The time to keep each scroll request active.

Connection Settings

Idle connection timeout: Maximum time (in milliseconds) to keep an idle connection open.
Max connections: Maximum number of connections to be opened.
Connections per target: Maximum number of connections opened for the same target.
Connection timeout: Maximum time (in milliseconds) to wait for the connection.
Socket timeout: Maximum time (in milliseconds) to wait for a socket response.
Connection throttling: Check to enable connection throttling.
1. Throttling period: Time period (in milliseconds) to throttle the connection.
2. Max connections per period: Maximum number of connections used during the throttling period.
Maximum retries: Maximum number of retries for each request.
Retry delay: Time (in milliseconds) to wait before a retry.

Page tree

Versions Compared

Old Version 7

New Version Current

Key

Step 1. Launch Aspire and Open the Content Source Management Page.

Step 2. Add or select a Workflow.

Step 3. Add the Parquet Summarizer Executor to the Workflow.

Step 3a. Specify a description for the application.

Step 3b. Specify the executor configuration.

General

Elasticsearch Settings

Connection Settings