Aspire depends on a relational database (RDB) to store administrative and process information for crawled data. The RDB is used to hold:

  • The list of content sources to crawl
  • The properties for each content source

    These properties provide all of the details necessary (URL, username, password, files to skip, etc.) for the connector to crawl the source.

  • Outstanding crawls

    Every crawl is given a unique "Crawl ID."

  • Crawl statistics

    These are updated by connectors as documents are acquired and processed.

  • Crawl errors

Using the Internal RDB



Internal RDB Architecture

The internal RDB option installs Apache Derby and creates a Derby database on disk in the ${aspire.home}/data/CSManager/db directory.

The internal RDB is appropriate for basic installations and is the default option. It requires no special RDB installations or configuration and is a "single click and go" option.

Backups

To create backups of your database, periodically backup the ${aspire.home}/data/CSManager/db directory. To create the backup, do the following:

  1. Wait until all outstanding content source crawls are complete.
  2. Stop the CS Manager Aspire application.

    This can be done using the "Stop Button" either from the main System Administration interface (click on the server in which the CS Manager application is installed) or from the Debug interface. Alternatively, you can shut down the entire Aspire server.

  3. Copy the "${aspire.home}/data/CSManager/db" to a safe place.
  4. Restart the CS Manager Aspire application.

Note: It is also possible to use the Apache Derby SYSCS_UTIL.SYSCS_BACKUP_DATABASE() procedure to back up the database without taking the CS Manager application off-line, using an Apache Derby client such as "IJ". Contact Search Technologies for more details.

The Internal RDB has No Failover

The internal RDB is actually embedded and installed with the CS Manager application. If the CS Manager fails (or the node or JVM in which it runs), then the Administration RDB is no longer available, crawl statistics can no longer be updated, and crawls can not be marked as completed.

Therefore, the Internal RDB is not a good solution for large production systems where failover and recovery are required. In these instances, we recommend using an external RDB.

Appropriate uses for the Internal RDB:

  • Single-node installations
  • Proof of Concept installations
  • Installations with a small number of content sources

Using the External RDB



CS Manager with External RDB

For production environments, Search Technologies recommends configuring your Aspire system for an external RDB. This will ensure proper failover should any Aspire node fail.

What You Will Need

You will need the following items before you begin:

  • The JDBC driver jar for your external relational database
  • The JDBC URL required to access your external relational database
  • A username and password of a user with sufficient access for accessing the external relational database

We also recommend that a new, separate SQL database be created just for the Aspire application, so that Aspire tables can be kept separate from other applications that use the same database server.

The SQL database for Aspire to use will be encoded into the JDBC URL that is provided to Aspire when you install the CS Manager application.

MySQL Example

The following is an example of the items available:

  • The JDBC Driver JAR is called "connector/J" and can be downloaded from http://dev.mysql.com/downloads/connector/j/

    The resulting JAR file is called "mysql-connector-java-5.1.18-bin.jar"

  • The JDBC URL for MySQL has the following format: jdbc:mysql://<server>/<database>

    For example: jdbc:mysql://192.168.40.27/aspiredb

  • The user name and password can be anything but we recommend: aspire_crawl_account/*****

Oracle Example


Configuring CS Manager to use the External RDB

Use the following steps to configure the CS Manager for using the external RDB:

  1. Copy your JDBC driver JAR to the Aspire "lib" directory.
  2. Go to the Aspire main admin page (the "servers list" page, typically http://localhost:50505/).
  3. Click "Add Application" to add a new CWS Manager to your Aspire server.
  4. Select "Use External RDB."
  5. Fill out the information requested:
    • JDBC URL - the format will depend on what type of external RDB you use.
    • JDBC Driver Jar - We recommend using a path such as "${aspire.home}/lib/<DRIVER-JAR-FILE>".

      For example, "${aspire.home}/lib/mysql-connector-java-5.1.18-bin.jar"

    • RDB Username
    • RDB Password

Note that the password will be automatically encrypted by Aspire.

When Aspire first connects to the external RDB, it will automatically create the databases that it needs and then start using them.

Configuring Multiple CS Managers

When using an external RDB, you can configure multiple CS Managers on different Aspire nodes for failover. When one CS manager is not available, admin requests will be automatically routed by Aspire to the backup.

We do not recommend creating any more than two CS Managers.

Connecting Two CS Managers to the same Internal Database

You can connect two CS Managers to the same internal Derby database by configuring one with an internal database and the second with an external database, configured to use the database from the first.

When configuring the external database, use the following parameters:

  • JDBC URL: jdbc:derby://localhost/data/CSManager/db/csManager
  • JDBC driver file: lib/derbyclient.jar
    • You will need to ensure this file exists in the lib directory
  • The JDBC driver class field should be left blank
  • JDBC user: app
  • The JDBC password field should be left blank
  • No labels