When deploying Aspire nodes, it is important to correctly size each VM or container running the node as different node type has different resource consumption behaviors.
Even though it can be used on development and testing, on production deployments starting manager and worker capabilities in the same JVM is not supported or recommended. There should be at least one distribution for the worker and one for the manager in different VMs.
The number of manager nodes impact on the availability of the cluster and responsiveness of the cluster, as each manager node handles a set of active seeds (seeds for which there is a running crawl).
The optimum number of manager nodes also depends on how many worker nodes there are, as the more worker nodes there are, the harder the manager node would have to work to keep up with their requests. If the manager/worker node ratio is not right, the manager nodes might not serve quick enough the worker requests, or there would be very few workers to consume the work created by the manager, under-utilizing the manager's resources.
Minimum nodes | Recommended nodes | Resources |
---|---|---|
1 | 2 | 4 GB RAM 2 CPU cores |
For each manager node it is recommended to increase the CPU cores by one for every 100 concurrent seeds each node will manage. For instance:
Suppose you have 2 manager nodes, and initially you calculated 200 concurrent seeds a time. This means each manager will manage at most 100 seeds concurrently. If it is needed to increase it to 400 concurrent seeds, it implies 100 extra seeds per manager node, thus it is recommended to increase the CPU cores of each manager node by 1.
The number of worker nodes impact directly on the crawl throughput, as these are the ones doing the actual work.
Minimum nodes | Recommended nodes | Resources |
---|---|---|
1 | 2 | 16 GB RAM 4 CPU cores |
Aspire stores sensitive configuration such as credentials encrypted with AES-256 algorithm. For that it uses an encryption key located in a file accessible by the Aspire process. If such a key is not configured, a constant default key will be used to encrypt and decrypt.
Using the default key is not secure!, as anything encrypted with it can be decrypted in any other Aspire deployment using the default key.
It is strongly recommended to create a random 256 bit key file (32 bytes) and configure it as the encryption key for all Aspire nodes in the same cluster. See Encryption properties for details on setting it.
If an engineering team will be managing Aspire, it is recommended to secure access to the UI by using LDAP to control who gets access to certain actions. See Security API for information on the security model and the roles and Ldap Configuration on how to configure it.
It is recommended to secure access to Aspire HTTP endpoints with a TLS/SSL certificate (HTTPS), this is important since some requests will contain sensitive information like credentials. See Enable HTTPS for information on this.
If using HTTPS services (such as Elasticsearch provider, or crawling HTTPS repositories), and you need to trust the CA of those services it is recommended to include a Java Keystore providing the custom trusted certificates. See Crawling via HTTPs.