Step 1. Launch Aspire and open the Content Source Management Page

Launch Aspire (if it's not already running). See:

Step 2. Add a new Content Source

  • For this step please follow the step from the Configuration Tutorial of the connector of you choice, please refer to Connector list


Step 3. Add a new
Publish To File System Staging Repository to the Workflow

To add a Publish to Publish To File System Staging Repository drag from the Publish To File System Staging Repository rule from the Workflow Library and drop to the Workflow Tree where you want to add it. This will automatically open the Publish to Publish To File System Staging Repository window for the configuration of the publisher.

Step 3a. Specify Publisher Information

 In the Publish to Publish To File System Staging Repository window, specify the connection information to publish to the .

Enter the URL of your repository in the input box. This directory must exist and be writable. Next check the Save stream (to ensure the original file is saved in to the repository). You can leave the Content source and owner boxes empty as these will default to a content source of for example FileSysToStagingRepo (the content source the publisher is publishing) and an owner of default (the default owner for the publisher).

At the configuration screen, you configure the repository location. This is the directory on disk that will be the base directory for the repository. All information will be stored under this directory, so you should ensure that the directory is on a disk with sufficient capacity.

You may choose to compress or encrypt the data. If you turn on encryption, you must choose the encryption algorithm and configure a password. Be default, the publisher will only publish the document metadata to the repository. If the connector crawling the original content source repository produces a stream, say to a file or attachment, you may choose to publish this stream to the repository as well. If you do, be aware that the stream can only be published if it has not already been consumed by some other stage such as extract text. In fact, for most Aspire connectors, you will need to disable extract text in the advanced configuration if you wish to save the stream in the repository. If you plan to use more than one Java virtual machine to access the file staging reposition at the same time (for example if you are using failover or distributed processing) you should turn on file locking to ensure the transaction log is consistent.

Content Source and Owner

When you configure the publisher, you will optionally configure the content source and owner. These will determine the exact location of the published item in the repository. If you don’t specify the content source, it will be taken from the document being published. If you don’t specify the owner, it will default to default.

Real Time Updates

The publisher supports sending of JMS messages when transactions occur. This allows the publisher to be closely coupled to a second crawler, allowing a crawl of an original content source repository to publish to a staging repository that submits an event to a JMS queue that is read by another crawler. This allows the separation of the crawl and index processes as described in the introduction.

If you turn on this functionality, you will need to configure the JMS server and queue to connect to. Currently the publisher only supports ActiveMQ and you can use an external broker, or install the Aspire JMS Server service.

JMS Message Format

If real time updates are configured, JMS messages are emitted in the following format:

<transactions>
   <transaction id="1234" timestamp="2014/07/13T12:34:56Z" action="[add|update|delete]">
      <item id="????" repositoryLocation="????" contentSource="????" owner="????"/>
   </transaction>
</transactions>

Once you've clicked on the Add button, it will take a moment for Aspire to download all of the necessary components (the Jar files) from the Maven repository and load them into Aspire. Once that's done, the publisher will appear in the Workflow Tree.

For details on using the Workflow section, please refer to Workflow introduction.