Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In this new approach of configuration, you can configure everything only once and reuse them to create multiple seeds for the same source repository. So if you need to change the credentials, you don't have to do it on all seeds but rather on the credentials object only, and all seeds related to it will be affected.

Another big change is the introduction of a manager/worker architecture, where the manager nodes coordinate configuration, crawls and failure recovery, and the worker nodes only care about executing jobs (representing documents)

Other features:

  • Chained schedules, allowing for crawls to start only after other crawls have finished.
  • Tag based crawling, jobs of certain crawls can be delegated to certain worker nodes, allowing for Geo-located crawls
  • Out of the box throttling policies, allowing the crawls to throttle the execution of jobs across the cluster for certain crawls or related crawls (with the same connection or credential objects).
  • Brand new UI
  • Re-designed REST API
  • Optimized for containerization