Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Another big change is the introduction of a manager/worker architecture, where the manager nodes coordinate configuration, crawls and failure recovery, and the worker nodes only care about executing jobs (representing documents)

Other features:

  • Chained schedules, allowing for crawls to start only after other crawls have finished.
  • Tag based crawling, jobs of certain crawls can be delegated to certain worker nodes, allowing for Geo-located crawls
  • Out of the box throttling policies, allowing the crawls to throttle the execution of jobs across the cluster for certain crawls or related crawls (with the same connection or credential objects).
  • Brand-new UI
  • Re-designed REST API
  • Optimized for containerization
    • Official docker image available for download and use