Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Migrating to Aspire 5.0 is a process that not only changes how the configuration for the crawls are done, but also changes to the hardware architecture must be considered.

The following areas must be taken into consideration:

  • Hardware
  • Crawl configuration

...

current guide describes the typical journey a migration from Aspire 3/4 would look like.

Step 1. Resource allocation considerations

Aspire 3 and 4 had a horizontal distributed architecture, where all the Aspire nodes executed the exact same software and configuration. All nodes were equal, which meant more complex synchronization, and hard to balance throughput and resource utilization.

Aspire 5.0 deployments consists of two distinct types of nodes: Manager and Workers.

In prior versions increasing the number of nodes implied: high availability, but also horizontally scaling crawl capacity which in some cases high availability was desired, but without increasing the crawl throughput. So if you had 2 Aspire nodes, you had twice the capacity of a single server.

In Aspire 5.0 you can separate the high availability requirements from the crawl capacity requirements, by allocating only the number of worker nodes needed to match your required throughput.

More Manager nodes means more simultaneous crawls. The more worker nodes higher the throughput, but you can have an heterogeneous set of worker nodes, where some would run certain crawls, and the others would run other types of crawls.

For production deployments, where high availability is required, For high availability it is recommended to have at least 2 manager nodes, as if one fails, the other one can assume the work from the failed one, while the failed one recovers and re-claims work.

Resource requirements:

NodeMinimum nodesRecommended nodesMinimumRecommended
Manager12

4 GB RAM

2 CPU cores

8 4 GB RAM

4 CPU cores

Worker12

8 GB RAM

4 CPU cores

16 GB RAM

4 CPU cores

These recommendations are based on usual workloads, fine tuning is recommended especially if the workload consists of large files (over 100MB of average size)

Crawl Configuration

Java version

Aspire 5 was developed and tested using OpenJDK 11

Step 2. Choose a content source to migrate

Choose a content source on Aspire 3/4 you want to migrate to Aspire 5. Verify the availability of the connector in Aspire 5 at Connectors.

Aspire 5 splits the configuration of crawls into several interconnected configuration objects, which combined can run a crawl, we need to take our "content-source" configuration and split it into the required Aspire 5 configuration objectsConfiguring Aspire 5.0 is where the most time could be spent during a migration, as the old "content-source" configurations have been split into different sections (it used to be 4 xml files per content source, now it can be more than 7 entities related to each other), depending on each connector.

Each connector determines what goes where, but roughly speaking this is how they should now be configured:

...