You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 7 Next »

As of Aspire 4.0, Elasticsearch is a supported NoSQL database that can be used to maintain the Crawl State.

The Aspire Elasticsearch Provider is the component that is responsible for talking to Elasticsearch on behalf of Aspire. All configuration for the Elasticsearch Provider in Aspire is done in the settings.xml file.


The Elasticsearch NoSQL Provider for Aspire requires Elasticsearch 7.x to run. It does not run with previous versions.

Basic Example

<!-- noSql database provider for the 4.0 connector framework -->
<noSQLConnectionProvider>
    <implementation>com.searchtechnologies.aspire:aspire-elasticsearch-provider</implementation>
    <url>http://localhost:9200</url>
</noSQLConnectionProvider>

Aspire will create one set of Elasticsearch indexes for each content source configured. When the content source is deleted, the indexes will be dropped. The index name has following structure:

  • prefix "aspire-"
  • cluster id defined in settings.xml - e.g. "dev"
  • normalized value of the content source name - e.g. "aspider_web_crawler"
  • provider object name - e.g. "processqueue"

Examples of index names: aspire-dev-aspider_web_crawler-processqueue, aspire-dev-aspider_web_crawler-snapshot, aspire-dev-group_expansion_manager-usersandgroups


Configuration Example

<!-- noSql database provider for the 4.0 connector framework -->
<noSQLConnectionProvider>
    <implementation>com.searchtechnologies.aspire:aspire-elasticsearch-provider</implementation>
    <url>http://localhost:9200</url>
    <claimPrefetch>300</claimPrefetch>
    <claim>100</claim>
    <keepSearchContextAlive>5m</keepSearchContextAlive>
    <authentication type="basic">
        <username>admin</username>
        <password>encrypted:password</password>
    </authentication>
    <debugOutFile>/tmp/aspire/profile.txt</debugOutFile>
    <maxRetries>3</maxRetries>
</noSQLConnectionProvider>
  • No labels