Configuration
This section lists all configuration parameters available to configure the Lotus Notes Scanner component.
General Scanner Component Configuration
Basic Scanner Configuration
Element | Type | Default | Description |
---|---|---|---|
snapshotDir | String | snapshots | The directory for snapshot files. |
numOfSnapshotBackups | int | 2 | The number of snapshots to keep after processing. |
waitForSubJobsTimeout | long | 600000 (=10 mins) | Scanner timeout while waiting for published jobs to complete. |
maxOutstandingTimeStatistics | long | 1m | The max about of time to wait before updating the statistics file. Whichever happens first between this property and maxOutstandingUpdatesStatistics will trigger an update to the statistics file. |
maxOutstandingUpdatesStatistics | long | 1000 | The max number of files to process before updating the statistics file. Whichever happens first between this property and maxOutstandingTimeStatistics will trigger an update to the statistics file. |
usesDomain | boolean | true | Indicates if the group expansion request will use a domain\user format (useful for connectors that does not support domain in the group expander). |
Branch Handler Configuration
This component publishes to the onAdd, onDelete and onUpdate, so a branch must be configured for each of these three events.
Element | Type | Description |
---|---|---|
branches/branch/@event | string | The event to configure - onAdd, onDelete or onUpdate. |
branches/branch/@pipelineManager | string | The name of the pipeline manager to publish to. Can be relative. |
branches/branch/@pipeline | string | The name of the pipeline to publish to. If missing, publishes to the default pipeline for the pipeline manager. |
branches/branch/@allowRemote | boolean | Indicates if this pipeline can be found on remote servers (see Distributed Processing for details). |
branches/branch/@batching | boolean | Indicates if the jobs processed by this pipeline should be marked for batch processing (useful for publishers or other components that support batch processing). |
branches/branch/@batchSize | int | The max size of the batches that the branch handler will created. |
branches/branch/@batchTimeout | long | Time to wait before the batch is closed if the batchSize hasn't been reached. |
branches/branch/@simultaneousBatches | int | The max number of simultanous batches that will be handled by the branch handler. |
SharePoint 2010 Specific Configuration
Element | Type | Default | Description |
---|---|---|---|
defaultDisplayName | String | SharePoint | The name of the crawl, if one is not given in the control job. |
waitForSubJobsTimeout | long | 600000 (=10 mins) | Scanner time out while waiting for published jobs to complete. |
useLDAPCache | boolean | false | Check for an installed "Aspire LDAP Cache" component for group expansion. |
externalGroupServerPath | string | empty | List of installed "Aspire LDAP Cache" components. |
Configuration Example
<component name="Scanner" subType="default" factoryName="aspire-lotus-scanner"> <debug>false</debug> <useLDAPCache>false</useLDAPCache> <externalGroupServerPath></externalGroupServerPath> <displayName>Doc-Test</displayName> <branches> <branch event="onAdd" pipelineManager="../ProcessPipelineManager" pipeline="addUpdatePipeline" allowRemote="true" batching="true" batchSize="50" batchTimeout="60000" simultaneousBatches="2" /> <branch event="onUpdate" pipelineManager="../ProcessPipelineManager" pipeline="addUpdatePipeline" allowRemote="true" batching="true" batchSize="50" batchTimeout="60000" simultaneousBatches="2" /> <branch event="onDelete" pipelineManager="../ProcessPipelineManager" pipeline="deletePipeline" allowRemote="true" batching="true" batchSize="50" batchTimeout="60000" simultaneousBatches="2" /> </branches> </component>
Source Configuration
Scanner Control Configuration
The following table describes the list of attributes that the AspireObject of the incoming scanner job requires to correctly execute and control the flow of a scan process.
Element | Type | Options | Description |
---|---|---|---|
@action | string | start, stop, pause, resume, abort | Control command to tell the scanner which operation to perform. Use start option to launch a new crawl. |
@actionProperties | string | full, incremental | When a start @action is received, it will tell the scanner to either run a full or an incremental crawl. |
@normalizedCSName | string | Unique identifier name for the content source that will be crawled. | |
displayName | string | Display or friendly name for the content source that will be crawled. |
Header Example
<doc action="start" actionProperties="full" actionType="manual" crawlId="0" dbId="0" jobNumber="0" normalizedCSName="FeedOne_Connector" scheduleId="0" scheduler="##AspireSystemScheduler##" sourceName="ContentSourceName"> ... <displayName>testSource</displayName> ... </doc>
All configuration properties described in this section are relative to /doc/connectorSource of the AspireObject of the incoming Job.
Element | Type | Default | Description |
---|---|---|---|
url | string | The host URL to scan (allowed http or https). | |
username | string | The username to connect to SharePoint with. | |
password | string | The password to connect to SharePoint with. | |
indexContainers | boolean | false | true if folders (as well as files) should be indexed. |
scanRecursively | boolean | false | true if subfolders of the given URL should be scanned. |
indexMailDbs | boolean | false | Select this option when you want to extract all mail databases. |
includeDBs/database | xml | List of databases to include in the crawl. | |
fileNamePatterns/include/@pattern | regex | none | Optional. A regular expression pattern to evaluate file urls against; if the file name matches the pattern, the file is included by the scanner. Multiple include nodes can be added. |
fileNamePatterns/include/@pattern | regex | none | Optional. A regular expression pattern to evaluate file urls against; if the file name matches the pattern, the file is excluded by the scanner. Multiple exclude nodes can be added. |
Scanner Configuration Example
<doc action="start" actionProperties="full" actionType="manual" crawlId="0" dbId="1" jobNumber="1" normalizedCSName="Doc_Test" scheduleId="1" scheduler="AspireScheduler" sourceName="Doc_Test"> <connectorSource> <url>10.40.40.114</url> <username>Admin Administrator</username> <password>encrypted:317C2FA5E804421BA2375D8B9FAE23A6</password> <includeDBs> <database database="docLibra.nsf"/> </includeDBs> <indexMailDbs>false</indexMailDbs> <scanRecursively>true</scanRecursively> <indexContainers>true</indexContainers> <fileNamePatterns/> </connectorSource> <displayName>Doc-Test</displayName> </doc>
Output
<doc> <url>docLibra.nsf:74A1AD06B9A1DC7388257B6600611042</url> <id>docLibra.nsf:74A1AD06B9A1DC7388257B6600611042</id> <displayUrl>docLibra.nsf:74A1AD06B9A1DC7388257B6600611042</displayUrl> <snapshotUrl>003 docLibra.nsf:74A1AD06B9A1DC7388257B6600611042</snapshotUrl> <repItemType>aspire/document</repItemType> <connectorSpecific> <field name="attachmentCount">0</field> <field name="parentUNID"/> <field name="parentDatabase">docLibra.nsf</field> <field name="Form">Document</field> <field name="CurrentUser">CN=Admin Administrator/O=search</field> <field name="Resubmit">0</field> <field name="ReviewType">1</field> <field name="ReviewWindow">0</field> <field name="NotifyAfter">0</field> <field name="Subject">favorite doc</field> <field name="SubmitNow">0</field> <field name="$UpdatedBy">CN=Admin Administrator/O=search</field> </connectorSpecific> <created>2013-05-09T17:40:11Z</created> <lastAccessed>2013-05-09T17:40:31Z</lastAccessed> <lastModified>2013-05-09T17:40:31Z</lastModified> <fetchUrl>http://10.40.40.114/__88257AC90071FA4C.nsf/0/74A1AD06B9A1DC7388257B6600611042?OpenDocument</fetchUrl> <acls> <parentAcl access="allow" domain="domain" entity="group" fullname="CN=lotusserver/O=search|Developers Top Level" name="CN=lotusserver/O=search|Developers Top Level" scope="global"/> <parentAcl access="allow" domain="domain" entity="user" fullname="CN=Ernesto Harler/O=search" name="CN=Ernesto Harler/O=search" scope="global"/> <parentAcl access="allow" domain="domain" entity="user" fullname="CN=Admin Administrator/O=search" name="CN=Admin Administrator/O=search" scope="global"/> <parentAcl access="allow" domain="domain" entity="group" fullname="CN=lotusserver/O=search|docLibra.nsf|-Default-" name="CN=lotusserver/O=search|docLibra.nsf|-Default-" scope="global"/> <parentAcl access="allow" domain="domain" entity="group" fullname="CN=lotusserver/O=search|docLibra.nsf|$PublicAccess" name="CN=lotusserver/O=search|docLibra.nsf|$PublicAccess" scope="global"/> <parentAcl access="allow" domain="domain" entity="group" fullname="CN=lotusserver/O=search|Human Resources Group" name="CN=lotusserver/O=search|Human Resources Group" scope="global"/> <parentAcl access="allow" domain="domain" entity="user" fullname="CN=lotusserver/O=search" name="CN=lotusserver/O=search" scope="global"/> <parentAcl access="allow" domain="domain" entity="group" fullname="CN=lotusserver/O=search|Developers Level 2" name="CN=lotusserver/O=search|Developers Level 2" scope="global"/> <acl access="allow" domain="domain" entity="user" fullname="CN=Admin Administrator/O=search" name="CN=Admin Administrator/O=search" scope="global"/> <acl access="deny" domain="domain" entity="user" fullname="CN=Andres Coto/O=search" name="CN=Andres Coto/O=search" scope="global"/> <acl access="deny" domain="domain" entity="user" fullname="Anonymous" name="Anonymous" scope="global"/> <intersectionAcl access="allow" domain="domain" entity="user" name="CN=Admin Administrator/O=search+CN=lotusserver/O=search|Developers Top Level" scope="global"/> <intersectionAcl access="allow" domain="domain" entity="user" name="CN=Admin Administrator/O=search+CN=lotusserver/O=search|docLibra.nsf|-Default-" scope="global"/> <intersectionAcl access="allow" domain="domain" entity="user" name="CN=Admin Administrator/O=search+CN=lotusserver/O=search|Human Resources Group" scope="global"/> <intersectionAcl access="allow" domain="domain" entity="user" name="CN=Admin Administrator/O=search+CN=lotusserver/O=search|docLibra.nsf|$PublicAccess" scope="global"/> <intersectionAcl access="allow" domain="domain" entity="user" name="CN=Admin Administrator/O=search+CN=lotusserver/O=search|Developers Level 2" scope="global"/> </acls> <isDeleted>false</isDeleted> <size>339</size> <docType>container</docType> <sourceName>Doc-Test</sourceName> <sourceType/> <connectorSource> <url>10.40.40.114</url> <username>Admin Administrator</username> <password>encrypted:317C2FA5E804421BA2375D8B9FAE23A6</password> <includeDBs> <database database="docLibra.nsf"/> </includeDBs> <indexMailDbs>false</indexMailDbs> <scanRecursively>true</scanRecursively> <indexContainers>true</indexContainers> <fileNamePatterns/> <displayName>Doc-Test</displayName> </connectorSource> <action>add</action> <hierarchy> <item id="E1E534BCB7F143C1FDBF45B0EF5090CE" level="3" name=":74A1AD06B9A1DC7388257B6600611042" type="aspire/document" url="docLibra.nsf:74A1AD06B9A1DC7388257B6600611042"> <ancestors> <ancestor id="BD81AE08922788F13AEDB857E17940E2" level="2" name="docLibra.nsf" parent="true" type="aspire/applicationDatabase" url="docLibra.nsf"/> <ancestor id="6CF9BA5ED4A271C76FDBE6B63DAE2BA1" level="1" name="Doc-Test" type="aspire/server" url="CN=lotusserver/O=search"/> </ancestors> </item> </hierarchy> <contentType source="ExtractTextStage/Content-Type">text/plain; charset=windows-1252</contentType> <extension source="ExtractTextStage"> <field name="Content-Encoding">windows-1252</field> <field name="resourceName">docLibra.nsf:74A1AD06B9A1DC7388257B6600611042</field> </extension> <content source="ExtractTextStage"> content of favorite doc </content> </doc>
Overview
Content Tools