Step 2. The server running the content source failed so the crawl items are restored in the other servers.
Aspire synchronizes the content sources configuration and the workflow libraries, so if you create a content source or a library in one server the other one will install that same content source or library.
Aspire by default starts an embedded ZooKeeper server to run in stand-alone mode, you could use this same server but the failover will only work if that Aspire server is always up and running.
We recommend to use external ZooKeeper servers (cluster), so the uptime of the overall failover functionality can be maximized with ZooKeeper failover too.
For ZooKeeper installation go to: http://zookeeper.apache.org/doc/trunk/zookeeperStarted.html
Also check Zookeeper machine requirements at: http://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html#Single+Machine+Requirements
Once you have installed Aspire in each server, edit the config/settings.xml file in each distribution folder.
For each Aspire Server make the following change to the <configAdministration> section of the settings.xml file to Uncomment the line with: And write the zookeeper server that you have installed as follows: If you are using a cluster of zookeeper servers separate each server with a comma: By default if no external server is specified, Aspire will start an embedded ZooKeeper server on the port specified in the <clientPort> tag, and it will not be connected to any other ZooKeeper. This is the default for non-failover installations.<zookeeper enabled="false" libraryFolder="config/workflow-libraries" root="/aspire" updatesEnabled="false">
<zookeeper enabled="true" libraryFolder="config/workflow-libraries" root="/aspire" updatesEnabled="true">
<!-- <externalServer>127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2181</externalServer> -->
<externalServer> host:port </externalServer>
<externalServer> host1:port1, host2:port2, host3:port3, ... </externalServer>
If you configured a content source to crawl using a schedule of time, the same configuration will be applied to all servers. And when the time comes to crawl by the schedule, only one server will perform the crawl start trigger, the others will eventually start the Distributed Crawl when all the initial setup is done.
The Failover feature does not enforce Aspire to use security among all Aspire servers, it is completely possible to have a scenario where you have three Aspire Servers in a cluster with only two of them have security access restrictions, the third one will not know about this and will perform its failover crawls without caring about it, so you would end up with an unsecured server crawling your sensitive data.
It is recommended to configure all Aspire servers with security if your project requirements demand it. Go to Aspire Security for more details about how to configure security access restrictions to Aspire.