Configuration
This section lists all configuration parameters available to install the File System Application Bundle and to execute crawls using the connector.
Application Configuration
Property | Type | Default | Description |
---|---|---|---|
snapshotDir | string | ${aspire.home}/snapshots | The directory for snapshot files to be stored. |
disableTextExtract | boolean | false | By default, connectors use Apache Tika to extract text from downloaded documents. If you wish to apply special text processing to the downloaded document in the workflow, you should disable text extraction. The downloaded document is then available as a content stream. |
workflowReloadPeriod | int | 15m | The period after which to reload the business rules. Defaults to ms, but can be suffixed with ms, s, m, h or d to indicate the required units. |
workflowErrorTolerant | boolean | false | When set, exceptions in workflow rules will only effect the execution of the rule in which the exception occurs. Subsequent rules will be executed and the job will complete the workflow sucessfully. If not set, exceptions in workflow rules will be re-thrown and the job will be moved to the error workflow. |
useGE | boolean | false | true if group expansion is going to be used. |
geSchedule | string | 0 0 0 * * ? | Indicates how often the group expansion is going to fetch all the security groups. |
URL | string | none | The Jira instance Url. |
Redirect Url | string | https://localhost:4000 | The a valid URL to redirect the authorization tokens (for authorization process) |
User | String | none | Login username of the Jira account. |
Password | string | none | Password for Jira accound. |
PageSize | integer | 100 | Indicates the amount of documents or folders that will be returned by the API each call. |
ExcludeExtensions | string | none | Indicates the list of extensions (separeted by comma) you don't want to extract the text, for instance dll or exe. |
fileNamePatterns/include | regex | none | Optional. A regular expression pattern to evaluate file urls against; if the file name matches the pattern, the file is included by the scanner. Multiple include nodes can be added. |
fileNamePatterns/exclude | regex | none | Optional. A regular expression pattern to evaluate file urls against; if the file name matches the pattern, the file is excluded by the scanner. Multiple exclude nodes can be added. |
Configuration Example
To install the application bundle, add the configuration, as follows, to the <autoStart> section of the Aspire settings.xml.
<application config="com.searchtechnologies.aspire:app-filesystem-connector"> <properties> <property name="generalConfiguration">true</property> <property name="snapshotDir">${aspire.home}/snapshots</property> <property name="disableTextExtract">false</property> <property name="workflowReloadPeriod">15s</property> <property name="workflowErrorTolerant">false</property> <property name="useGE">false</property> <property name="geSchedule">0 0 0 * * ?</property> <property name="URL">https://jirainstance.atlassian.net/</property> <property name="Redirect Url">https://localhost:4000/</property> <property name="PageSize">100</property> <property name="ExcludeExtensions">jpg</property> <property name="fileNamePatterns/include">.*xml.*</property> <property name="fileNamePatterns/exclude">.*jpg.*</property> <property name="debug">true</property> </properties> </application>
Note: Any optional properties can be removed from the configuration to use the default value described on the table above.
Group Expansion Configuration
Property | Type | Default | Description |
---|---|---|---|
Group cache refresh | Drop down menu | Every hour | Refresh rate for Group cache |
URL | string | Jira repository URL | The URL to get security group information. |
Username | string | The username to connect for Socialcast community. | |
Password | string | The password to connect for Jira instance. |
Source Configuration
Property | Type | Default | Description |
---|---|---|---|
URL | string | The Url to crawl (you have to specify the protocol). | |
Username | string | The Username to connect with. | |
Password | string | The password of the Username to connect with. |
Overview
Content Tools