You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 8 Next »

Overview

Since the 3.1 release, Aspire connectors are able to crawl in distributed mode automatically. Since all the crawl  control data is stored in MongoDB, by just adding more Aspire servers configured to use the same MongoDB, the common connectors are going to crawl distributively.

Each connector is responsible for talking to the repositories, scanning through all the items to fetch and store its IDs to MongoDB for being processed later by any other server or itself.

 

On this page:

Configuration

In order to setup an Aspire Cluster for Distributed Processing, you need to do the following steps:

  1. Setup MongoDB

    You need to configure all Aspire servers to use the same MongoDB Installation, configure all the Aspire Servers config/settings.xml file

     

    MongoDB Settings
      <!-- noSql database provider for the 3.1 connector framework -->
      <noSQLConnectionProvider connectionsPerHost="10" sslEnabled="false" sslInvalidHostNameAllowed="false">
        <implementation>com.searchtechnologies.aspire:aspire-mongodb-provider</implementation>
        <dropOnClear>false</dropOnClear>
        <servers>mongodb-host:27017</servers>
      </noSQLConnecitonProvider>


    If you need to connect to a multi node MongoDB installation, check: Connect to a Multi-node MongoDB Installation

  2. Install the content sources to distribute


    Now it is time to think about which content sources you want to crawl distributively, and from what Aspire Servers, according to your solution architecture.

    For this, configure the content sources in one of the servers and once you have them correctly configured export the content source and import it into the Aspire Servers you want to crawl this content source in parallel.
  • No labels