Before starting to migrate your Aspire deployments to Aspire 5.0, it is strongly advised to understand the architectural change: Aspire 5.0 Architecture
Migrating to Aspire 5.0 is a process that not only changes how the configuration for the crawls are done, but also changes to the hardware architecture must be considered.
The following areas must be taken into consideration:
Aspire 5.0 deployments consists of two distinct types of nodes: Manager and Workers.
In prior versions increasing the number of nodes implied: high availability but also horizontally scaling crawl capacity, which in some cases high availability was desired but without increasing the crawl throughput. So if you had 2 Aspire nodes, you had twice the capacity of a single server.
In Aspire 5.0 you can separate the high availability requirements from the craw capacity requirements by allocating only the number of worker nodes needed to match your required throughput.
For high availability it is recommended to have at least 2 manager nodes, as if one fails, the other one can assume the work from the failed one.
Node | Minimum nodes | Recommended nodes | Minimum | Recommended |
---|---|---|---|---|
Manager | 1 | 2 | 4 GB RAM 2 CPU cores | 8 GB RAM 4 CPU cores |
Worker | 1 | 2 | 8 GB RAM 4 CPU cores | 16 GB RAM 4 CPU cores |
These recommendations are based on usual workloads, fine tuning is recommended specially if the workload consists of large files (over 100MB of average size)
Configuring Aspire 5.0 is where the most time could be spent during a migration, as the old "content-source" configurations have been split into different sections (it used to be 4 xml files per content source, now it can be more than 7 entities related to each other), depending on each connector.
Each connector determines what goes where, but roughly speaking this is how they should now be configured: