- Attachments (10)
- Page History

Pages
…
Aspire 5 Home
Connectors
Amazon S3 Connector
Amazon S3 Connector - How to Configure
UI - Amazon S3 Connector Configuration

Jira links

UI - Seed - Amazon S3 Connector

Created by Ricardo Fonseca, last modified by Pablo Bonilla on Jan 10, 2022

This section describes the Configuration for Amazon S3 Seeds and how to run a crawl for them.

Step 1. Open the Aspire Admin UI.

Browse to the Aspire Admin UI. It is typically located at http://localhost:50505.

Step 2. Select the Seed option from the left-hand menu.

The "Seed" option, identified by a "seed" image , is located on the left side of the application, just above the "Workflows" option. Click on it to navigate to the "Seed" page.

Step 3. Specify Connection Description and Type

Once on the "Seed" page, click on the "+New" option to create a new Seed or select an existing one to modify it.

Description: specify a description for the Seed. It is advised for it to be concise and meaningful.
Type: select "S3" as the type for the Seed.

Step 4. Specify Seed Information

Once the type has been selected, you will be presented with the "Seed" section of the "Seed" page. A single parameter is required in this section:

Crawl path: the path to be crawled. It can be a bucket, folder or file.

Step 5. Specify Split Files Configuration (Optional)

The "Split Files" section is located between the "Seed" and "Connector" sections of the "Seed" page. Here you need to set the following options for this section of the Seed. If no options are modified, default values are used:

Process Split Documents: if enabled, files that are split are treated as a single document instead of multiple documents.
Split Patterns: list of regular expressions to match folders that contain split files.

Step 6. Specify a Connector

The "Connector" section is located between the "Split Files" section and the "Connection" section of the "Seeds" page. Here, you must select a previously created Amazon S3 Connector for the Seed, from the Connector combo box.

Step 7. Specify a Connection

The "Connection" section is located between the "Connector" section and the "Workflows" section of the "Seeds" page. Here, you must select a previously created Amazon S3 Connection for the Seed, from the Connection combo box.

Step 8. Specify Workflows (Optional)

The "Workflows" section is located between the "Connection" section and the "Tag" section of the "Seeds" page. Here, you can select previously created Workflows that apply to the seed. If no workflow is specified, a default workflow is assigned.

Step 9. Specify a Tag (Optional)

The "Tag" section is located between the "Workflows" section and the "Policies" section of the "Seeds" page. Here you can, if desired, specify a tag for seeds filtering.

Step 10. Specify Policies (Optional)

The "Policies" section is the last section, located right below the "Tag" section of the "Seeds" page:

Throttle Policy: here, you can select a previously created Throttling Policy from the Throttle Policy combo box.
Route Policy: here you can select a previously created Routing Policy from the Route Policy combo box.

Step 11. Save the Seed

Click on the "Complete" button to save the new Seed (when updating, the button option will read "Save" instead of "Complete").

Step 12. Running the crawl

To run a crawl for the Amazon S3 Seed, click on the button for the seed you want to run and select Full or Incremental Crawl. This will start the chosen crawl for your seed.

No labels

Overview

Content Tools

Page tree