- Created by Johnny Vargas on Jun 28, 2018
File System Staging Repository Connector App-bundle
The File System Staging Repository Connector reads data from a File System Staging Repository and passes to to Aspire for processing. It is much like any other connector and allows submitted jobs to be passed to workflow stages and processed by workflow rules. Optionally, text can be extracted from the items in the repository using Apache Tika.
For information on the File System Staging Repository, see here and for more information on Staging Repository (Aspire 2) in general, see here.
The bundle uses the following components:
- File System Staging Repository Connector Component
- To scan the staging repository
- Workflow Engine
- Processing workflow rules for processed items from the store
- Text Extraction
- Text extraction where configured
- Performance Logger
- Performance information
- Job Logger
- Debugging information
- Groovy
- Configuration extraction
File System Staging Repository Connector App-bundle | |
---|---|
AppBundle Name | File System Staging Repository Connector |
Maven Coordinates | com.searchtechnologies.aspire:app-file-repo-connector |
Versions | 2.2.2 |
Type Flags | None |
Inputs | N/A |
Outputs | N/A |
Configuration
This section lists all configuration parameters available to install the File System Staging Repository Connector Application Bundle.
General Application Configuration
Property | Type | Default | Description |
---|---|---|---|
enableTextExtract | boolean | [Required] | If true, and streams returned from the repository will be passed to Apache Tika for text extraction |
jms | boolean | false | Enable JMS updates |
broker | string | [Required (JMS)] | The JMS broker to connect to |
channel | string | [Required (JMS)] | The JMS channel (topic/queue) to connect to |
useTopic | boolean | false | The value in the channel is a topic |
subJobThreads | long | 10 | The number of threads to process the jobs |
subJobQueue | long | 30 | The size of the sub job queue |
subJobTimeout | long | 5m | The period to try to put a job on the queue before failing |
workflowReloadPeriod | long | 15m | The period after which the workflow will reload |
workflowErrorTolerant | boolean | false | When set to true this allows workflows to continue even when they encounter an error and complete normally regardless of the document fields available |
emitStartJob | boolean | true | Emit a startCrawl job when the crawl start |
emitEndJob | boolean | true | Emit a endCrawl job when the crawl stops |
fullRecovery | full/incremental | The type of full recovery crawl | |
incrementalRecovery | full/incremental | The type of incremental recovery crawl | |
batchSize | long | 50 | The maximum number of items submitted to a batch |
batchTimeout | long | 60,000 | The time in ms before batches are timed out |
enableAuditing | boolean | Enable auditing | |
snapshotDir | String | snapshots | The directory for snapshot files. |
debug | Boolean | false | Controls whether debugging is enabled for the application. Debug messages will be written to the log files. |
Configuration Example
To install the application bundle, connecting to an LDAP server to for cache population, add the configuration, as follows, to the <autoStart> section of the Aspire settings.xml.
<?xml version="1.0" encoding="UTF-8"?> <application config="com.searchtechnologies.aspire:app-file-repo-connector"> <properties> <property name="enableTextExtract">true</property> <property name="jms">true</property> <property name="broker">tcp://localhost:61616</property> <property name="channel">demoQueue</property> <property name="useTopic">true</property> <property name="generalConfiguration">true</property> <property name="snapshotDir">${dist.data.dir}/${app.name}/snapshots</property> <property name="subJobThreads">10</property> <property name="subJobQueue">30</property> <property name="subJobTimeout">10m</property> <property name="workflowReloadPeriod">15s</property> <property name="workflowErrorTolerant">false</property> <property name="emitStartJob">false</property> <property name="emitEndJob">false</property> <property name="fullRecovery">incremental</property> <property name="incrementalRecovery">incremental</property> <property name="batchSize">50</property> <property name="batchTimeout">60000</property> <property name="enableAuditing">true</property> <property name="debug">false</property> </properties> </application>
Note: Any optional properties can be removed from the configuration to use the default value described on the table above.
- No labels