Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Even though it can be used on development and testing, on production deployments starting manager and worker capabilities in the same JVM is not supported or recommended. There should be at least one distribution for the worker and one for the manager in different VMs.

Manager nodes

The number of manager nodes impact on the availability of the cluster and responsiveness of the cluster, as each manager node handles a set of active seeds (seeds for which there is a running crawl).

The optimum number of manager nodes also depends on how many worker nodes there are, as the more worker nodes there are, the harder the manager node would have to work to keep up with their requests. If the manager/worker node ratio is not right, the manager nodes might not serve quick enough the worker requests, or there would be very few workers to consume the work created by the manager, under-utilizing the manager's resources.

Minimum nodesRecommended nodesResources
12

4 GB RAM

2 CPU cores

For each manager node it is recommended to increase the CPU cores by one for every 100 concurrent seeds each node will manage. For instance:

Suppose you have 2 manager nodes, and initially you calculated 200 concurrent seeds a time. This means each manager will manage at most 100 seeds concurrently. If it is needed to increase it to 400 concurrent seeds, it implies 100 extra seeds per manager node, thus it is recommended to increase the CPU cores of each manager node by 1.

Worker nodes

The number of worker nodes impact directly on the crawl throughput, as these are the ones doing the actual work.

Minimum nodesRecommended nodesResources
12

16 GB RAM

4 CPU cores

Security Settings

Create a customized Encryption Key File

...