Configuration
This section lists all configuration parameters available to configure the Subversion component.
General Scanner Component Configuration
Basic Scanner Configuration
Element | Type | Default | Description |
---|
snapshotDir | String | snapshots | The directory for snapshot files. |
numOfSnapshotBackups | int | 2 | The number of snapshots to keep after processing. |
waitForSubJobsTimeout | long | 600000 (=10 mins) | Scanner timeout while waiting for published jobs to complete. |
maxOutstandingTimeStatistics | long | 1m | The max about of time to wait before updating the statistics file. Whichever happens first between this property and maxOutstandingUpdatesStatistics will trigger an update to the statistics file. |
maxOutstandingUpdatesStatistics | long | 1000 | The max number of files to process before updating the statistics file. Whichever happens first between this property and maxOutstandingTimeStatistics will trigger an update to the statistics file. |
usesDomain | boolean | true | Indicates if the group expansion request will use a domain\user format (useful for connectors that does not support domain in the group expander). |
Branch Handler Configuration
This component publishes to the onAdd, onDelete and onUpdate, so a branch must be configured for each of these three events.
Element | Type | Description |
---|
branches/branch/@event | string | The event to configure - onAdd, onDelete or onUpdate. |
branches/branch/@pipelineManager | string | The name of the pipeline manager to publish to. Can be relative. |
branches/branch/@pipeline | string | The name of the pipeline to publish to. If missing, publishes to the default pipeline for the pipeline manager. |
branches/branch/@allowRemote | boolean | Indicates if this pipeline can be found on remote servers (see Distributed Processing for details). |
branches/branch/@batching | boolean | Indicates if the jobs processed by this pipeline should be marked for batch processing (useful for publishers or other components that support batch processing). |
branches/branch/@batchSize | int | The max size of the batches that the branch handler will created. |
branches/branch/@batchTimeout | long | Time to wait before the batch is closed if the batchSize hasn't been reached. |
branches/branch/@simultaneousBatches | int | The max number of simultanous batches that will be handled by the branch handler. |
Subversion Specific Configuration
Element | Type | Default | Description |
---|
maxBytes | long | unlimited | The maximum file size in bytes. Files whose size is greater than this parameter will not be sent to the pipeline. |
Configuration Example
<component name="Scanner" subType="scanner" factoryName="aspire-subversion-scanner">
<debug>${debug}</debug>
<metadataMap>
<map from="action" to="action" />
<map from="doc-type" to="docType" />
<map from="last-modified-date" to="lastModified" />
<map from="content-length-bytes" to="dataSize" />
<map from="owner" to="owner" />
</metadataMap>
<snapshotDir>${snapshotDir}</snapshotDir>
<enableAuditing>${enableAuditing}</enableAuditing>
<fileNamePatterns>
<include pattern=".*" />
<exclude pattern=".*tmp$" />
</fileNamePatterns>
<emitCrawlStartJobs>${emitStartJob}</emitCrawlStartJobs>
<emitCrawlEndJobs>${emitEndJob}</emitCrawlEndJobs>
<branches>
<branch event="onAdd" pipelineManager="../ProcessPipelineManager" pipeline="addUpdatePipeline"
allowRemote="true" batching="true"batchSize="50" batchTimeout="60000" simultaneousBatches="2" />
<branch event="onUpdate" pipelineManager="../ProcessPipelineManager" pipeline="addUpdatePipeline"
allowRemote="true" batching="true" batchSize="50" batchTimeout="60000" simultaneousBatches="2" />
<branch event="onDelete" pipelineManager="../ProcessPipelineManager" pipeline="deletePipeline"
allowRemote="true" batching="true" batchSize="50" batchTimeout="60000" simultaneousBatches="2" />
<branch event="onCrawlStart" pipelineManager="../ProcessPipelineManager" pipeline="crawlStartEndPipeline"
allowRemote="true"/>
<branch event="onCrawlEnd" pipelineManager="../ProcessPipelineManager" pipeline="crawlStartEndPipeline"
allowRemote="true"/>
</branches>
</component>
Source Configuration
Scanner Control Configuration
The following table describes the list of attributes that the AspireObject of the incoming scanner job requires to correctly execute and control the flow of a scan process.
Element | Type | Options | Description |
---|
@action | string | start, stop, pause, resume, abort | Control command to tell the scanner which operation to perform. Use start option to launch a new crawl. |
@actionProperties | string | full, incremental | When a start @action is received, it will tell the scanner to either run a full or an incremental crawl. |
@normalizedCSName | string |
|
| Unique identifier name for the content source that will be crawled. |
displayName | string |
|
| Display or friendly name for the content source that will be crawled. |
<doc action="start" actionProperties="full" actionType="manual" crawlId="0" dbId="0" jobNumber="0" normalizedCSName="FeedOne_Connector"
scheduleId="0" scheduler="##AspireSystemScheduler##" sourceName="ContentSourceName">
...
<displayName>testSource</displayName>
...
</doc>
All configuration properties described in this section are relative to /doc/connectorSource of the AspireObject of the incoming Job.
Element | Type | Default | Description |
---|
url | string |
|
| The URL of the root subversion repository to crawl. |
relativeurl | string |
|
| The path to crawl, relative to the root url of the subversion repository. |
username | string |
|
| The user name to connect to the subversion repository. |
password | string |
|
| The password to connect to the subversion repository. |
indexContainers | boolean | false | true if folders (as well as files) should be indexed. |
scanRecursively | boolean | false | true if subfolders of the given URL should be scanned. |
Scanner Configuration Example
<doc action="start" actionProperties="full" normalizedCSName="testFile" scheduleId="1">
<connectorSource>
<url>https://svn.searchtechnologies.com/svn/aspire</url>
<relativeurl>/trunk-test/svn-connector-test/</relativeurl>
<username>pmartinez</username>
<password>encrypted:7EB516228DE9A26107CA04E735BAC76B</password>
<indexContainers>true</indexContainers>
<scanRecursively>true</scanRecursively>
</connectorSource>
<displayName>testFile</displayName>
</doc>
Output
<doc>
<url>https://svn.searchtechnologies.com/svn/aspire/trunk-test/svn-connector-test/pom.xml</url>
<id>https://svn.searchtechnologies.com/svn/aspire/trunk-test/svn-connector-test/pom.xml</id>
<fetchUrl>https://svn.searchtechnologies.com/svn/aspire/trunk-test/svn-connector-test/pom.xml</fetchUrl>
<displayUrl>https://svn.searchtechnologies.com/svn/aspire/trunk-test/svn-connector-test/pom.xml</displayUrl>
<snapshotUrl>001 https://svn.searchtechnologies.com/svn/aspire/trunk-test/svn-connector-test/pom.xml</snapshotUrl>
<docType>item</docType>
<repItemType>aspire/file</repItemType>
<sourceType>subversion</sourceType>
<sourceName>Subversion</sourceName>
<repositoryUrl>https://svn.searchtechnologies.com/svn/aspire/trunk-test/svn-connector-test/</repositoryUrl>
<createdBy>pmartinez</createdBy>
<lastModified>2014-10-21T17:59:25Z</lastModified>
<title>pom.xml</title>
<fileType>xml</fileType>
<connectorSpecific type="subversion">
<field name="revision">12992</field>
<field name="message">Adding a test comment</field>
</connectorSpecific>
<connectorSource type="subversion">
<url>https://svn.searchtechnologies.com/svn/aspire</url>
<relativeurl>/trunk-test/svn-connector-test/</relativeurl>
<username>pmartinez</username>
<password>encrypted:7EB516228DE9A26107CA04E735BAC76B</password>
<indexContainers>true</indexContainers>
<scanRecursively>true</scanRecursively>
<displayName>Subversion</displayName>
</connectorSource>
<action>add</action>
<hierarchy>
<item id="FCFF2064CAB68EA8ACE2D9E1127B252C" level="1" name="pom.xml"
url="https://svn.searchtechnologies.com/svn/aspire/trunk-test/svn-connector-test/pom.xml">
<ancestors>
<ancestor id="50485699E821C74A58BA6E5989751770" level="0" name="Subversion" parent="true" type="aspire/root"
url="https://svn.searchtechnologies.com/svn/aspire/trunk-test/svn-connector-test/"/>
</ancestors>
</item>
</hierarchy>
<httpResponse code="200" source="SubversionFetcher">OK</httpResponse>
<protocol source="SubversionFetcher/protocol">https</protocol>
<host source="SubversionFetcher/host">svn.searchtechnologies.com</host>
<mimeType source="SubversionFetcher/mimeType">text/plain</mimeType>
<encoding source="SubversionFetcher/encoding">UTF-8</encoding>
<extension source="SubversionFetcher">
<field name="modificationDate">2014-10-21T17:59:25Z</field>
<field name="status">HTTP/1.1 200 OK</field>
<field name="Date">Mon, 05 Jan 2015 20:00:17 GMT</field>
<field name="Server">Apache/2.2.15 (CentOS)</field>
<field name="Last-Modified">Tue, 21 Oct 2014 17:59:25 GMT</field>
<field name="ETag">"12992//trunk-test/svn-connector-test/pom.xml"</field>
<field name="Accept-Ranges">bytes</field>
<field name="Content-Length">6406</field>
<field name="Connection">close</field>
<field name="Content-Type">text/plain; charset=UTF-8</field>
</extension>
</doc>