Page History
Configuration
This section lists all configuration parameters available to configure the CIFS Scanner component.
General Scanner Component Configuration
Basic Scanner Configuration
Element | Type | Default | Description |
---|---|---|---|
snapshotDir | String | snapshots | The directory for snapshot files. |
numOfSnapshotBackups | int | 2 | The number of snapshots to keep after processing. |
waitForSubJobsTimeout | long | 600000 (=10 mins) | Scanner timeout while waiting for published jobs to complete. |
maxOutstandingTimeStatistics | long | 1m | The max about of time to wait before updating the statistics file. Whichever happens first between this property and maxOutstandingUpdatesStatistics will trigger an update to the statistics file. |
maxOutstandingUpdatesStatistics | long | 1000 | The max number of files to process before updating the statistics file. Whichever happens first between this property and maxOutstandingTimeStatistics will trigger an update to the statistics file. |
usesDomain | boolean | true | Indicates if the group expansion request will use a domain\user format (useful for connectors that do not support domain in the group expander). |
Branch Handler Configuration
This component publishes to the onAdd, onDelete and onUpdate, so a branch must be configured for each of these three events.
Element | Type | Description |
---|---|---|
branches/branch/@event | string | The event to configure - onAdd, onDelete or onUpdate. |
branches/branch/@pipelineManager | string | The name of the pipeline manager to publish. Can be relative. |
branches/branch/@pipeline | string | The name of the pipeline to publish. If missing, publishes to the default pipeline for the pipeline manager. |
branches/branch/@allowRemote | boolean | Indicates if this pipeline can be found on remote servers (see Distributed Processing for details). |
branches/branch/@batching | boolean | Indicates if the jobs processed by this pipeline should be marked for batch processing (useful for publishers or other components that support batch processing). |
branches/branch/@batchSize | int | The max size of the batches that the branch handler will be created. |
branches/branch/@batchTimeout | long | Time to wait before the batch is closed if the batchSize hasn't been reached. |
branches/branch/@simultaneousBatches | int | The max number of simultaneous batches that will be handled by the branch handler. |
Configuration Example
Code Block |
---|
<component name="Scanner" subType="default" factoryName="aspire-cifs-connector"> <debug>true</debug> <snapshotDir>${aspire.home}/data/snapshots</snapshotDir> <branches> <branch event="onAdd" pipelineManager="../ProcessPipelineManager" pipeline="addUpdatePipeline" allowRemote="true" batching="true" batchSize="50" batchTimeout="60000" simultaneousBatches="2" /> <branch event="onUpdate" pipelineManager="../ProcessPipelineManager" pipeline="addUpdatePipeline" allowRemote="true" batching="true" batchSize="50" batchTimeout="60000" simultaneousBatches="2" /> <branch event="onDelete" pipelineManager="../ProcessPipelineManager" pipeline="deletePipeline" allowRemote="true" batching="true" batchSize="50" batchTimeout="60000" simultaneousBatches="2" /> </branches> </component> |
Source Configuration
Scanner Control Configuration
The following table describes the list of attributes that the AspireObject of the incoming scanner job requires to correctly execute and control the flow of a scan process.
Element | Type | Options | Description |
---|---|---|---|
@action | string | start, stop, pause, resume, abort | Control command to tell the scanner which operation to perform. Use start option to launch a new crawl. |
@actionProperties | string | full, incremental | When a start @action is received, it will tell the scanner to either run a full or an incremental crawl. |
@normalizedCSName | string | Unique identifier name for the content source that will be crawled. | |
displayName | string | Display or friendly name for the content source that will be crawled. |
Header Example
Code Block |
---|
<doc action="start" actionProperties="full" actionType="manual" crawlId="0" dbId="0" jobNumber="0" normalizedCSName="FeedOne_Connector" scheduleId="0" scheduler="##AspireSystemScheduler##" sourceName="ContentSourceName"> ... <displayName>testSource</displayName> ... </doc> |
All configuration properties described in this section are relative to /doc/connectorSourceof the AspireObject of AspireObjec of the incoming Job.
Element | Type | Default | Description |
---|---|---|---|
url | string | ThesmbURL to crawl (smb:// format). | |
partialScan | boolean | false | To run a partial scan – i.e. to only scan a portion of the larger directory. This is useful to re-process portions of your system without having to process the entire content source. |
subDirUrl | string | Configurable when partialScanis set to true. The sub-directory which contains the documents to be processed for this partial scan. This directory must be a relative path to the parent directory. Only the documents in this sub-directory will be scanned. This is useful to re-process portions of your system without having to process the entire content source. | |
domain | string | The domain where the username, connecting to the shared folder, belongs to. | |
username | string | The username to connect with. | |
password | string | The password of the username to connect with. | |
indexContainers | boolean | false | true if folders (as well as files) should be indexed. |
scanRecursively | boolean | false | true if subfolders of the given URL should be scanned. |
fileNamePatterns/include/@pattern | string | Optional. A regular expression pattern to evaluate file URLs against; if the file name matches the pattern, the file is included by the scanner. Multiple include nodes can be added. | |
fileNamePatterns/exclude/@pattern | string | Optional. A regular expression pattern to evaluate file URLs against; if the file name matches the pattern, the file is included by the scanner. Multiple include nodes can be added. |
Scanner Configuration Example
Code Block |
---|
<doc action="start" actionProperties="full" normalizedCSName="cifsTest"> <connectorSource> <url>smb://localhost/AspireTesting/</url> <partialScan>true</partialScan> <subDirUrl>LSA</subDirUrl> <domain>search</domain> <username>ralfaro</username> <password>encrypted:6A2B871F3F30D3B5BF8D406B9C185FAF</password> <indexContainers>true</indexContainers> <scanRecursively>true</scanRecursively> <fileNamePatterns> <include pattern=".*LSA.*"/> <exclude pattern=".*tmp.*"/> </fileNamePatterns> </connectorSource> <displayName>cifsTest</displayName> </doc> |
Output
Code Block |
---|
<doc> <url>smb://localhost/AspireTesting/LSA/Videos youtube.txt</url> <snapshotUrl>003 smb://localhost/AspireTesting/LSA/Videos youtube.txt</snapshotUrl> <docType>item</docType> <repItemType>aspire/file</repItemType> <fetchUrl>smb://localhost/AspireTesting/LSA/Videos youtube.txt</fetchUrl> <displayUrl>smb://localhost/AspireTesting/LSA/Videos youtube.txt</displayUrl> <id>smb://localhost/AspireTesting/LSA/Videos youtube.txt</id> <connectorSpecific type="fileshare"> <field name="smbUrl">smb://localhost/AspireTesting/LSA/Videos youtube.txt</field> </connectorSpecific> <lastModified>2012-07-25T05:57:30Z</lastModified> <dataSize>111</dataSize> <acls> <acl access="allow" domain="NT AUTHORITY" entity="group" fullname="NT AUTHORITY\SYSTEM" inherited="true" name="SYSTEM" scope="machine" sid="S-1-5-18" sidType="windowsGroup"/> <acl access="allow" domain="SEARCH" entity="user" fullname="SEARCH\ralfaro" inherited="true" name="ralfaro" scope="global" sid="S-1-5-21-4065858124-1791371549-2540926932-1286" sidType="user"/> <acl access="allow" domain="BUILTIN" entity="group" fullname="BUILTIN\Administrators" inherited="true" name="Administrators" scope="machine" sid="S-1-5-32-544" sidType="localGroup"/> <acl access="allow" domain="" entity="group" fullname="\Everyone" inherited="true" name="Everyone" scope="machine" sid="S-1-1-0" sidType="localGroup"/> <acl access="allow" domain="S-1-5-21-4065858124-1791371549-2540926932" entity="group" fullname="S-1-5-21-4065858124-1791371549-2540926932\2687" inherited="true" name="2687" scope="machine" sid="S-1-5-21-4065858124-1791371549-2540926932-2687" sidType="windowsGroup"/> </acls> <owner>BUILTIN\Administrators</owner> <sourceName>cifsTest</sourceName> <sourceType>fileshare</sourceType> <connectorSource> <url>smb://localhost/AspireTesting/</url> <partialScan>true</partialScan> <subDirUrl>LSA</subDirUrl> <domain>search</domain> <username>ralfaro</username> <password>encrypted:6A2B871F3F30D3B5BF8D406B9C185FAF</password> <indexContainers>true</indexContainers> <scanRecursively>true</scanRecursively> <fileNamePatterns> <include pattern=".*LSA.*"/> <exclude pattern=".*tmp.*"/> </fileNamePatterns> <displayName>cifsTest</displayName> <partialScanUrl>LSA</partialScanUrl> </connectorSource> <action>add</action> <hierarchy> <item id="16C82BC703B3D0BD1A11CBCB9136FCE7" level="3" name="Videos youtube.txt" url="smb://localhost/AspireTesting/LSA/Videos youtube.txt"> <ancestors> <ancestor id="4646C909DA9EE32E5C36C8D5843DCEE3" level="2" name="LSA" parent="true" type="aspire/folder" url="smb://localhost/AspireTesting/LSA/"/> <ancestor id="D2B7229CA1C9BCC12AA27BB354BC2DD4" level="1" name="cifsTest" type="aspire/fileshare" url="smb://localhost/AspireTesting/"/> </ancestors> </item> </hierarchy> </doc> |
Overview
Content Tools