What is the crawl state?

The crawl state is the representation of the progress made by an in-progress crawl. By using the crawl state, Aspire is able to distribute processing tasks between multiple Aspire nodes, as well as pause and resume crawls. In order to synchronize the crawl state between multiple Aspire nodes, as well as preserve the state on restarts, it must be stored in an external database.


Which databases does Aspire use?

As of Aspire 4.0, we support different databases:

  1. MongoDB 3.0 and up - Open Source, document-oriented database.
  2. Apache HBase - Open Source, distributed, scalable, big data store.
  3. Elasticsearch - RESTful search and analytics engine


  • No labels