Configuration
This section lists all configuration parameters available to configure the Documentum Scanner component.
General Scanner Component Configuration
Basic Scanner Configuration
Element | Type | Default | Description |
---|
snapshotDir | String | snapshots | The directory for snapshot files. |
numOfSnapshotBackups | int | 2 | The number of snapshots to keep after processing. |
waitForSubJobsTimeout | long | 600000 (=10 mins) | Scanner timeout while waiting for published jobs to complete. |
maxOutstandingTimeStatistics | long | 1m | The max about of time to wait before updating the statistics file. Whichever happens first between this property and maxOutstandingUpdatesStatistics will trigger an update to the statistics file. |
maxOutstandingUpdatesStatistics | long | 1000 | The max number of files to process before updating the statistics file. Whichever happens first between this property and maxOutstandingTimeStatistics will trigger an update to the statistics file. |
usesDomain | boolean | true | Indicates if the group expansion request will use a domain\user format (useful for connectors that does not support domain in the group expander). |
Branch Handler Configuration
This component publishes to the onAdd, onDelete and onUpdate, so a branch must be configured for each of these three events.
Element | Type | Description |
---|
branches/branch/@event | string | The event to configure - onAdd, onDelete or onUpdate. |
branches/branch/@pipelineManager | string | The name of the pipeline manager to publish to. Can be relative. |
branches/branch/@pipeline | string | The name of the pipeline to publish to. If missing, publishes to the default pipeline for the pipeline manager. |
branches/branch/@allowRemote | boolean | Indicates if this pipeline can be found on remote servers (see Distributed Processing for details). |
branches/branch/@batching | boolean | Indicates if the jobs processed by this pipeline should be marked for batch processing (useful for publishers or other components that support batch processing). |
branches/branch/@batchSize | int | The max size of the batches that the branch handler will created. |
branches/branch/@batchTimeout | long | Time to wait before the batch is closed if the batchSize hasn't been reached. |
branches/branch/@simultaneousBatches | int | The max number of simultanous batches that will be handled by the branch handler. |
Configuration
The scanner recognizes the following configuration parameters:
Element | Type | Default | Description |
---|
url | String |
|
| The URL to crawl. |
username | String |
|
| The username to use when accessing Documentum. |
password | String |
|
| The password to use when accessing Documentum. |
dfcPropsFilePath | String |
|
| The location of the DFC properties file. You must copy the dfc.keystore file to the location specified in the dfc.properties file as well. |
webtopUrl | String |
|
| The URL to access the Webtop interface. This string is prefixed to each object path so it can be accessed through an URL. |
maxFileSize | int | unlimited | The limit size in MB of the content to be crawled, or unlimited to extract the whole content. |
usePrefix | boolean | false | When doing group expansion, the component will return the groups with a predefined prefix in the form of PREFIX@group. |
scanSystemCabinets | boolean | false | true if hidden and private cabinets of Documentum should be scanned. |
Configuration Example
<component name="scanner" factoryName="aspire-documentum-connector" subType="default">
<username>admin</username>
<password>admin</password>
<dfcPropsFilePath>C:/Documentum/config/dfc.properties</dfcPropsFilePath>
<webTopUrl>http:/localhost:9080/webtop</webTopUrl>
<debug>true</debug>
<snapshotDir>${aspire.home}/data/snapshots</snapshotDir>
<branches>
<branch event="onAdd" pipelineManager="../ProcessPipelineManager" pipeline="addUpdatePipeline" allowRemote="true" batching="true"
batchSize="50" batchTimeout="60000" simultaneousBatches="2" />
<branch event="onUpdate" pipelineManager="../ProcessPipelineManager" pipeline="addUpdatePipeline" allowRemote="true" batching="true"
batchSize="50" batchTimeout="60000" simultaneousBatches="2" />
<branch event="onDelete" pipelineManager="../ProcessPipelineManager" pipeline="deletePipeline" allowRemote="true" batching="true"
batchSize="50" batchTimeout="60000" simultaneousBatches="2" />
</branches>
</component>
Source Configuration
Scanner Control Configuration
The following table describes the list of attributes that the AspireObject of the incoming scanner job requires to correctly execute and control the flow of a scan process.
Element | Type | Options | Description |
---|
@action | string | start, stop, pause, resume, abort | Control command to tell the scanner which operation to perform. Use start option to launch a new crawl. |
@actionProperties | string | full, incremental | When a start @action is received, it will tell the scanner to either run a full or an incremental crawl. |
@normalizedCSName | string |
|
| Unique identifier name for the content source that will be crawled. |
displayName | string |
|
| Display or friendly name for the content source that will be crawled. |
<doc action="start" actionProperties="full" actionType="manual" crawlId="0" dbId="0" jobNumber="0" normalizedCSName="FeedOne_Connector"
scheduleId="0" scheduler="##AspireSystemScheduler##" sourceName="ContentSourceName">
...
<displayName>testSource</displayName>
...
</doc>
All configuration properties described in this section are relative to /doc/connectorSource of the AspireObject of the incoming Job.
Element | Type | Default | Description |
---|
url | string |
| |
| The username to use when connecting to Documentum. |
password | string |
|
| The password to use when connecting to Documentum. |
indexContainers | boolean | false | true if folders (as well as files) should be sent to the pipeline. |
scanRecursively | boolean | false | true if sub folders of the given URL should be scanned. |
scanSystemCabinets | boolean | false | true if private and system cabinets should be scanned. |
maxFileSize | long |
|
| The limit size in MB of the content to be crawled, or unlimited if the whole file should be extracted. |
webtopUrl | string |
|
| The URL to access the Webtop interface. This will be prefixed to each object path so it can be accessed through a URL. |
dfcPropsFilePath | string |
|
| The location of the DFC properties file. |
Scanner Configuration Example
<doc action="start" actionProperties="full" normalizedCSName="cifsTest">
<connectorSource>
<url>dctm://10.10.21.73:1489/DocumentumRepository/5000-500KB</url>
<username>Administrator</username>
<password>pass1234</password>
<dfcPropsFilePath>config/dfc.properties</dfcPropsFilePath>
<webtopUrl>http://10.10.21.73:9080/webtop/objectId=</webtopUrl>
<maxFileSize>Unlimited</maxFileSize>
<indexContainers>true</indexContainers>
<scanRecursively>true</scanRecursively>
<scanSystemCabinets>false</scanSystemCabinets>
<fileNamePatterns>
<include pattern=".*LSA.*"/>
<exclude pattern=".*tmp.*"/>
</fileNamePatterns>
</connectorSource>
<displayName>documentum</displayName>
</doc>
Output
<doc>
<url>dctm://10.10.21.73:1489/DocumentumRepository/5000-500KB/folder-1/folder-1-3/dm_document-0024.txt</url>
<fetchUrl>dctm://10.10.21.73:1489/DocumentumRepository/5000-500KB/folder-1/folder-1-3/dm_document-0024.txt</fetchUrl>
<snapshotUrl>006 dctm://10.10.21.73:1489/DocumentumRepository/5000-500KB/folder-1/folder-1-3/dm_document-0024.txt</snapshotUrl>
<docType>item</docType>
<id>090010e18001d15c</id>
<connectorSpecific type="documentum">
<field name="object_name">dm_document-0024.txt</field>
<field name="r_object_type">dm_document</field>
<field name="r_creation_date">12/5/2013 2:13:59 PM</field>
<field name="r_modify_date">12/5/2013 2:13:59 PM</field>
<field name="r_modifier">Administrator</field>
<field name="r_access_date">1/22/2014 2:42:11 PM</field>
<field name="a_is_hidden">F</field>
<field name="i_is_deleted">F</field>
<field name="a_retention_date">nulldate</field>
<field name="a_archive">F</field>
<field name="a_link_resolved">F</field>
<field name="i_reference_cnt">1</field>
<field name="i_has_folder">T</field>
<field name="i_folder_id">0b0010e18001d14b</field>
<field name="r_link_cnt">0</field>
<field name="r_link_high_cnt">0</field>
<field name="r_assembled_from_id">0000000000000000</field>
<field name="r_frzn_assembly_cnt">0</field>
<field name="r_has_frzn_assembly">F</field>
<field name="r_is_virtual_doc">0</field>
<field name="i_contents_id">060010e18001c31b</field>
<field name="a_content_type">crtext</field>
<field name="r_page_cnt">1</field>
<field name="r_content_size">511425</field>
<field name="a_full_text">T</field>
<field name="a_storage_type">filestore_01</field>
<field name="i_cabinet_id">0c0010e1800175ca</field>
<field name="owner_name">Administrator</field>
<field name="owner_permit">7</field>
<field name="group_name">docu</field>
<field name="group_permit">5</field>
<field name="world_permit">3</field>
<field name="i_antecedent_id">0000000000000000</field>
<field name="i_chronicle_id">090010e18001d15c</field>
<field name="i_latest_flag">T</field>
<field name="r_lock_date">nulldate</field>
<field name="r_version_label">1.0,CURRENT</field>
<field name="i_branch_cnt">0</field>
<field name="i_direct_dsc">F</field>
<field name="r_immutable_flag">F</field>
<field name="r_frozen_flag">F</field>
<field name="r_has_events">F</field>
<field name="acl_domain">Administrator</field>
<field name="acl_name">dm_450010e180000101</field>
<field name="i_is_reference">F</field>
<field name="r_creator_name">Administrator</field>
<field name="r_is_public">T</field>
<field name="r_policy_id">0000000000000000</field>
<field name="r_resume_state">0</field>
<field name="r_current_state">0</field>
<field name="r_alias_set_id">0000000000000000</field>
<field name="a_is_template">F</field>
<field name="r_full_content_size">511425</field>
<field name="a_is_signed">F</field>
<field name="a_last_review_date">nulldate</field>
<field name="i_retain_until">nulldate</field>
<field name="i_partition">0</field>
<field name="i_is_replica">F</field>
<field name="i_vstamp">0</field>
</connectorSpecific>
<lastModified>2013-12-05T20:13:59Z</lastModified>
<modifiedBy>Administrator</modifiedBy>
<dataSize>511425</dataSize>
<owner>Administrator</owner>
<createdBy>Administrator</createdBy>
<repItemType>aspire/dm_document</repItemType>
<displayUrl>http://10.10.21.73:9080/webtop/objectId=090010e18001d15c</displayUrl>
<acls>
<acl access="allow" domain="dctm://10.10.21.73:1489/DocumentumRepository" entity="group" fullname="dctm://10.10.21.73:1489/DocumentumRepository@dm_world" name="dm_world" scope="global"/>
<acl access="allow" domain="dctm://10.10.21.73:1489/DocumentumRepository" entity="group" fullname="dctm://10.10.21.73:1489/DocumentumRepository@Administrator" name="Administrator" scope="global"/>
<acl access="allow" domain="dctm://10.10.21.73:1489/DocumentumRepository" entity="group" fullname="dctm://10.10.21.73:1489/DocumentumRepository@docu" name="docu" scope="global"/>
</acls>
<sourceName>documentum</sourceName>
<sourceType>documentum</sourceType>
<connectorSource>
<url>dctm://10.10.21.73:1489/DocumentumRepository/5000-500KB</url>
<username>Administrator</username>
<password>encrypted:562E81591F85B858E5A5D3876F9C9FDB</password>
<dfcPropsFilePath>config/dfc.properties</dfcPropsFilePath>
<webtopUrl>http://10.10.21.73:9080/webtop/objectId=</webtopUrl>
<maxFileSize>Unlimited</maxFileSize>
<indexContainers>true</indexContainers>
<scanRecursively>true</scanRecursively>
<scanSystemCabinets>false</scanSystemCabinets>
<fileNamePatterns/>
<docbase>DocumentumRepository</docbase>
<host>10.10.21.73</host>
<port>1489</port>
<displayName>documentum</displayName>
</connectorSource>
<action>add</action>
<hierarchy>
<item id="246AEB4224DF69E86C83D5AFC357A3FD" level="6" name="dm_document-0024.txt" url="dctm://10.10.21.73:1489/DocumentumRepository/5000-500KB/folder-1/folder-1-3/dm_document-0024.txt">
<ancestors>
<ancestor id="6846F1C598D288AC85E2DDE6F178B488" level="5" name="folder-1-3" parent="true" type="aspire/dm_folder" url="dctm://10.10.21.73:1489/DocumentumRepository/5000-500KB/folder-1/folder-1-3/"/>
<ancestor id="2760A49EEC1E0C469929E66A909BBCAA" level="4" name="folder-1" type="aspire/dm_folder" url="dctm://10.10.21.73:1489/DocumentumRepository/5000-500KB/folder-1/"/>
<ancestor id="2F224B8B2365BE6BFD1554713BFFF190" level="3" name="5000-500KB" type="aspire/dm_cabinet" url="dctm://10.10.21.73:1489/DocumentumRepository/5000-500KB/"/>
<ancestor id="A5ECC7C3A8738BB297CC336536AD3B60" level="2" name="DocumentumRepository" type="aspire/docbase" url="dctm://10.10.21.73:1489/DocumentumRepository/"/>
<ancestor id="93EAEBDC4E5AF3FFF212F8E80902AF01" level="1" name="documentum" type="aspire/documentum" url="dctm://10.10.21.73:1489/"/>
</ancestors>
</item>
</hierarchy>
</doc>