File System Staging Repository Connector App-bundle

The File System Staging Repository Connector reads data from a File System Staging Repository and passes to to Aspire for processing. It is much like any other connector and allows submitted jobs to be passed to workflow stages and processed by workflow rules. Optionally, text can be extracted from the items in the repository using Apache Tika.

For information on the File System Staging Repository, see here and for more information on Staging Repository (Aspire 2) in general, see here.

The bundle uses the following components:

File System Staging Repository Connector Component
- To scan the staging repository
Workflow Engine
- Processing workflow rules for processed items from the store
Text Extraction
- Text extraction where configured
Performance Logger
- Performance information
Job Logger
- Debugging information
Groovy
- Configuration extraction

File System Staging Repository Connector App-bundle
AppBundle Name	File System Staging Repository Connector
Maven Coordinates	com.searchtechnologies.aspire:app-file-repo-connector
Versions	2.2.2
Type Flags	None
Inputs	N/A
Outputs	N/A

Configuration

This section lists all configuration parameters available to install the File System Staging Repository Connector Application Bundle.

General Application Configuration

Property	Type	Default	Description
enableTextExtract	boolean	[Required]	If true, and streams returned from the repository will be passed to Apache Tika for text extraction
jms	boolean	false	Enable JMS updates
broker	string	[Required (JMS)]	The JMS broker to connect to
channel	string	[Required (JMS)]	The JMS channel (topic/queue) to connect to
useTopic	boolean	false	The value in the channel is a topic
subJobThreads	long	10	The number of threads to process the jobs
subJobQueue	long	30	The size of the sub job queue
subJobTimeout	long	5m	The period to try to put a job on the queue before failing
workflowReloadPeriod	long	15m	The period after which the workflow will reload
workflowErrorTolerant	boolean	false	When set to true this allows workflows to continue even when they encounter an error and complete normally regardless of the document fields available
emitStartJob	boolean	true	Emit a startCrawl job when the crawl start
emitEndJob	boolean	true	Emit a endCrawl job when the crawl stops
fullRecovery	full/incremental		The type of full recovery crawl
incrementalRecovery	full/incremental		The type of incremental recovery crawl
batchSize	long	50	The maximum number of items submitted to a batch
batchTimeout	long	60,000	The time in ms before batches are timed out
enableAuditing	boolean		Enable auditing
snapshotDir	String	snapshots	The directory for snapshot files.
debug	Boolean	false	Controls whether debugging is enabled for the application. Debug messages will be written to the log files.

Configuration Example

To install the application bundle, connecting to an LDAP server to for cache population, add the configuration, as follows, to the <autoStart> section of the Aspire settings.xml.

<?xml version="1.0" encoding="UTF-8"?>
<application config="com.searchtechnologies.aspire:app-file-repo-connector">
  <properties>
    <property name="enableTextExtract">true</property>
    <property name="jms">true</property>
    <property name="broker">tcp://localhost:61616</property>
    <property name="channel">demoQueue</property>
    <property name="useTopic">true</property>
    <property name="generalConfiguration">true</property>
    <property name="snapshotDir">${dist.data.dir}/${app.name}/snapshots</property>
    <property name="subJobThreads">10</property>
    <property name="subJobQueue">30</property>
    <property name="subJobTimeout">10m</property>
    <property name="workflowReloadPeriod">15s</property>
    <property name="workflowErrorTolerant">false</property>
    <property name="emitStartJob">false</property>
    <property name="emitEndJob">false</property>
    <property name="fullRecovery">incremental</property>
    <property name="incrementalRecovery">incremental</property>
    <property name="batchSize">50</property>
    <property name="batchTimeout">60000</property>
    <property name="enableAuditing">true</property>
    <property name="debug">false</property>
  </properties>
</application>

Note: Any optional properties can be removed from the configuration to use the default value described on the table above.

Page tree

File System Staging Repository Connector App-bundle