File System Staging Repository Connector App-bundle

The File System Staging Repository Connector reads data from a File System Staging Repository and passes to to Aspire for processing. It is much like any other connector and allows submitted jobs to be passed to workflow stages and processed by workflow rules. Optionally, text can be extracted from the items in the repository using Apache Tika.

For information on the File System Staging Repository, see here and for more information on Staging Repository (Aspire 2) in general, see here.

The bundle uses the following components:

File System Staging Repository Connector App-bundle
AppBundle Name File System Staging Repository Connector
Maven Coordinates com.searchtechnologies.aspire:app-file-repo-connector
Versions 2.2.2
Type Flags None
Inputs N/A
Outputs N/A

Configuration

This section lists all configuration parameters available to install the File System Staging Repository Connector Application Bundle.

General Application Configuration

PropertyTypeDefaultDescription
enableTextExtractboolean[Required]If true, and streams returned from the repository will be passed to Apache Tika for text extraction
jmsbooleanfalseEnable JMS updates
brokerstring[Required (JMS)]The JMS broker to connect to
channelstring[Required (JMS)]The JMS channel (topic/queue) to connect to
useTopicbooleanfalseThe value in the channel is a topic
subJobThreadslong10The number of threads to process the jobs
subJobQueuelong30The size of the sub job queue
subJobTimeoutlong5mThe period to try to put a job on the queue before failing
workflowReloadPeriodlong15mThe period after which the workflow will reload
workflowErrorTolerantbooleanfalseWhen set to true this allows workflows to continue even when they encounter an error and complete normally regardless of the document fields available
emitStartJobbooleantrueEmit a startCrawl job when the crawl start
emitEndJobbooleantrueEmit a endCrawl job when the crawl stops
fullRecoveryfull/incremental
The type of full recovery crawl
incrementalRecoveryfull/incremental
The type of incremental recovery crawl
batchSizelong50The maximum number of items submitted to a batch
batchTimeoutlong60,000The time in ms before batches are timed out
enableAuditingboolean
Enable auditing
snapshotDirStringsnapshotsThe directory for snapshot files.
debugBooleanfalseControls whether debugging is enabled for the application. Debug messages will be written to the log files.

Configuration Example

To install the application bundle, connecting to an LDAP server to for cache population, add the configuration, as follows, to the <autoStart> section of the Aspire settings.xml.

<?xml version="1.0" encoding="UTF-8"?>
<application config="com.searchtechnologies.aspire:app-file-repo-connector">
  <properties>
    <property name="enableTextExtract">true</property>
    <property name="jms">true</property>
    <property name="broker">tcp://localhost:61616</property>
    <property name="channel">demoQueue</property>
    <property name="useTopic">true</property>
    <property name="generalConfiguration">true</property>
    <property name="snapshotDir">${dist.data.dir}/${app.name}/snapshots</property>
    <property name="subJobThreads">10</property>
    <property name="subJobQueue">30</property>
    <property name="subJobTimeout">10m</property>
    <property name="workflowReloadPeriod">15s</property>
    <property name="workflowErrorTolerant">false</property>
    <property name="emitStartJob">false</property>
    <property name="emitEndJob">false</property>
    <property name="fullRecovery">incremental</property>
    <property name="incrementalRecovery">incremental</property>
    <property name="batchSize">50</property>
    <property name="batchTimeout">60000</property>
    <property name="enableAuditing">true</property>
    <property name="debug">false</property>
  </properties>
</application>

Note: Any optional properties can be removed from the configuration to use the default value described on the table above.


  • No labels