Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Aspire 5.0 had a major architecture re-design, compared to its predecessors Aspire 3.x and 4.x had a major architecture re-design, for the purpose of tackling the most common source of complexity in managing Aspire deployments: configuration, availability, and coordination of crawl execution.

The biggest change you would noticed notice compared to prior versions is that there isn't a content-source anymore. The configuration of crawls have has been split into re-usable pieces entities with relationships to one another.

...

  • Connector
    • Common connector behavior
  • Credential
    • To authenticate to a specific repository
  • Connection
    • Server IP/host/port
    • Connection properties (timeouts, concurrency, etc.)
  • Throttle and Routing Policies
    • How often should documents be processed
    • Which nodes should the documents be processed
  • Workflow
    • Sequence of rules to be executed for each document
  • Seed
    • Starting point of a single crawl to execute

...

Another big change is the introduction of a manager/worker architecture, where the manager nodes coordinate configuration, crawls and failure recovery, and the worker nodes only care about executing jobs (representing documents)

Other features:

  • Chained schedules, allowing for crawls to start only after other crawls have finished.
  • Tag based crawling, jobs of certain crawls can be delegated to certain worker nodes, allowing for Geo-located crawls
  • Out of the box throttling policies, allowing the crawls to throttle the execution of jobs across the cluster for certain crawls or related crawls (with the same connection or credential objects).
  • Brand-new UI
  • Re-designed REST API
  • Optimized for containerization
    • Official docker image available for download and use