You are viewing an old version of this page. View the current version.
Compare with Current
View Page History
« Previous
Version 10
Next »
Aspire Manager/Worker Architecture
Aspire 5.0 introduces two types of nodes: Manager and Worker.
Crawl Configuration
Crawls are now configured in separate entities which allows for maximum re-usability.
- Connector instance
- Common connector behavior, define number of threads, queue sizes, text extraction capabilities, etc.
- Credential
- To authenticate to a specific repository
- Authentication Type, user/password, access/secret keys, etc
- Connection
- Properties related on how to connect to the repository
- Server IP/host/port
- Connection properties (timeouts, concurrency, etc)
- Can be associated with 1 credential (if the connector requires credentials to be set).
- Must be associated with 1 connector instance
- Workflow
- Sequence of rules to be executed for each document
- Seed
- Starting point of a single crawl to execute
- Can be associated with 0 or more workflows
- Can be associated with 0 or more routing policies
- Can be associated with 0 or 1 throttle policy
- Can be part of 0 or more schedules
- Must be associated with 1 connection
- Schedule
- Define how often to execute crawls for a set of seeds
- Define sequence of crawls (chained schedules)
- For example: start seeds [ d, e, f ] (chained schedule#2) after seeds [ a, b, c ] (schedule#1) are done.
- Throttle and Routing Policies
- How often and where should documents be processed
- Allowing for geo-located job processing
- Routing policies can be associated with seeds only
- Throttle policies can be attached to seeds, connections and credentials
Aspire 5.0 config entity model
![](/download/attachments/753009019/Aspire%205.0%20model.png?version=2&modificationDate=1681860025944&api=v2)