Before starting to migrate your Aspire deployments to Aspire 5.0, it is strongly advised to understand the architectural change: Aspire 5.0 Architecture
Migrating to Aspire 5.0 is a process that not only changes how the configuration for the crawls are done, but also changes to the hardware architecture must be considered.
The current guide describes the typical journey a migration from Aspire 3/4 would look like.
Aspire 3 and 4 had a horizontal distributed architecture, where all the Aspire nodes executed the exact same software and configuration. All nodes were equal, which meant more complex synchronization, and hard to balance throughput and resource utilization.
Aspire 5.0 consists of two distinct types of nodes: Manager and Workers. More Manager nodes means more simultaneous crawls. The more worker nodes higher the throughput, but you can have an heterogeneous set of worker nodes, where some would run certain crawls, and the others would run other types of crawls.
For production deployments, where high availability is required, it is recommended to have at least 2 manager nodes, as if one fails, the other one can assume the work from the failed one, while the failed one recovers and re-claims work.
Node | Minimum nodes | Recommended nodes | Minimum | Recommended |
---|---|---|---|---|
Manager | 1 | 2 | 4 GB RAM 2 CPU cores | 4 GB RAM 4 CPU cores |
Worker | 1 | 2 | 8 GB RAM 4 CPU cores | 16 GB RAM 4 CPU cores |
Aspire 5 was developed and tested using OpenJDK 11
Choose a content source on Aspire 3/4 you want to migrate to Aspire 5. Verify the availability of the connector in Aspire 5 at Connectors.
Aspire 5 splits the configuration of crawls into several interconnected configuration objects, which combined can run a crawl, we need to take our "content-source" configuration and split it into the required Aspire 5 configuration objects.
Each connector determines what goes where, but roughly speaking this is how they should now be configured: