Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Aspire Manager/Worker Architecture

Aspire 5.0 introduces two types of nodes: Manager and Worker.

A Manager is responsible for coordinating the execution of "jobs" from any given crawl and the crawl state, it prepares batches of jobs for eventual assigning to Worker nodes.

  • There is an automatically elected main Manager who coordinates which manager will get to manage each crawl
    • it also takes appropriate actions when either a Manager or Worker node is detected to be down.

A Worker is responsible for processing batch of "jobs" obtained from the Manager nodes.

  • Also executes all rules inside any workflows configured for the associated job crawls.
  • Fetching of content from repositories
  • Content and metadata modification/extraction
  • Indexing of documents with Publishers


Crawl Configuration

Crawls are now configured in separate entities which allows for maximum re-usability.

  • Connector
    • Common connector behavior
  • Credential
    • To authenticate to a specific repository
  • Connection
    • Server IP/host/port
    • Connection properties (timeouts, concurrency, etc)
  • Throttle and Routing Policies
    • How often should documents be processed
    • Which nodes should the documents be processed
  • Workflow
    • Sequence of rules to be executed for each document
  • Seed
    • Starting point of a single crawl to execute
  • Schedule
    • Define how often to execute crawls for a set of seeds
    • Define sequence of crawls

Aspire 5.0 config entity model

Image Added