Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents


The Publish to 

Publishing to the File System Staging

Repository publisher will post documents to File System Staging Repository index through 

Features

Some of the features of the Publish to File System Staging Repository publisher include:

Example Feature

Repository

To publish to any staging repository, simply replace the usual search engine publisher in the work flow with a publisher for the desired staging repository.

Select the File System Staging Repository from the Publishers section of the workflow, or install a custom publisher with the coordinates com.searchtechnologies.aspire:app-file-repo-publisher.

At the configuration screen, you configure the repository location. This is the directory on disk that will be the base directory for the repository. All information will be stored under this directory, so you should ensure that the directory is on a disk with sufficient capacity.

You may choose to compress or encrypt the data. If you turn on encryption, you must choose the encryption algorithm and configure a password. Be default, the publisher will only publish the document metadata to the repository. If the connector crawling the original content source repository produces a stream, say to a file or attachment, you may choose to publish this stream to the repository as well. If you do, be aware that the stream can only be published if it has not already been consumed by some other stage such as extract text. In fact, for most Aspire connectors, you will need to disable extract text in the advanced configuration if you wish to save the stream in the repository. If you plan to use more than one Java virtual machine to access the file staging reposition at the same time (for example if you are using failover or distributed processing) you should turn on file locking to ensure the transaction log is consistent.

Content Source and Owner

When you configure the publisher, you will optionally configure the content source and owner. These will determine the exact location of the published item in the repository. If you don’t specify the content source, it will be taken from the document being published. If you don’t specify the owner, it will default to default.

Real Time Updates

The publisher supports sending of JMS messages when transactions occur. This allows the publisher to be closely coupled to a second crawler, allowing a crawl of an original content source repository to publish to a staging repository that submits an event to a JMS queue that is read by another crawler. This allows the separation of the crawl and index processes as described in the introduction.

If you turn on this functionality, you will need to configure the JMS server and queue to connect to. Currently the publisher only supports ActiveMQ and you can use an external broker, or install the Aspire JMS Server service.

JMS Message Format

If real time updates are configured, JMS messages are emitted in the following format:

Code Block
<transactions>
   <transaction id="1234" timestamp="2014/07/13T12:34:56Z" action="[add|update|delete]">
      <item id="????" repositoryLocation="????" contentSource="????" owner="????"/>
   </transaction>
</transactions>