Page History
Configuration
This section lists all configuration parameters available to configure the Documentum Scanner component.
General Scanner Component Configuration
Basic Scanner Configuration
Element | Type | Default | Description |
---|---|---|---|
snapshotDir | String | snapshots | The directory for snapshot files. |
numOfSnapshotBackups | int | 2 | The number of snapshots to keep after processing. |
waitForSubJobsTimeout | long | 600000 (=10 mins) | Scanner timeout while waiting for published jobs to complete. |
maxOutstandingTimeStatistics | long | 1m | The max about of time to wait before updating the statistics file. Whichever happens first between this property and maxOutstandingUpdatesStatistics will trigger an update to the statistics file. |
maxOutstandingUpdatesStatistics | long | 1000 | The max number of files to process before updating the statistics file. Whichever happens first between this property and maxOutstandingTimeStatistics will trigger an update to the statistics file. |
usesDomain | boolean | true | Indicates if the group expansion request will use a domain\user format (useful for connectors that does not support domain in the group expander). |
Branch Handler Configuration
This component publishes to the onAdd, onDelete and onUpdate, so a branch must be configured for each of these three events.
Element | Type | Description |
---|---|---|
branches/branch/@event | string | The event to configure - onAdd, onDelete or onUpdate. |
branches/branch/@pipelineManager | string | The name of the pipeline manager to publish to. Can be relative. |
branches/branch/@pipeline | string | The name of the pipeline to publish to. If missing, publishes to the default pipeline for the pipeline manager. |
branches/branch/@allowRemote | boolean | Indicates if this pipeline can be found on remote servers (see Distributed Processing for details). |
branches/branch/@batching | boolean | Indicates if the jobs processed by this pipeline should be marked for batch processing (useful for publishers or other components that support batch processing). |
branches/branch/@batchSize | int | The max size of the batches that the branch handler will created. |
branches/branch/@batchTimeout | long | Time to wait before the batch is closed if the batchSize hasn't been reached. |
branches/branch/@simultaneousBatches | int | The max number of simultanous batches that will be handled by the branch handler. |
Configuration
The scanner recognizes the following configuration parameters:
Element | Type | Default | Description |
---|---|---|---|
url | String | The URL to crawl. | |
username | String | The username to use when accessing Documentum. | |
password | String | The password to use when accessing Documentum. | |
dfcPropsFilePath | String | The location of the DFC properties file. You must copy the dfc.keystore file to the location specified in the dfc.properties file as well. | |
webtopUrl | String | The URL to access the Webtop interface. This string is prefixed to each object path so it can be accessed through an URL. | |
maxFileSize | int | unlimited | The limit size in MB of the content to be crawled, or unlimited to extract the whole content. |
usePrefix | boolean | false | When doing group expansion, the component will return the groups with a predefined prefix in the form of PREFIX@group. |
scanSystemCabinets | boolean | false | true if hidden and private cabinets of Documentum should be scanned. |
Configuration Example
<component name="scanner" factoryName="aspire-documentum-connector" subType="default"> <username>admin</username> <password>admin</password> <dfcPropsFilePath>C:/Documentum/config/dfc.properties</dfcPropsFilePath> <webTopUrl>http:/localhost:9080/webtop</webTopUrl> <debug>true</debug> <snapshotDir>${aspire.home}/data/snapshots</snapshotDir> <branches> <branch event="onAdd" pipelineManager="../ProcessPipelineManager" pipeline="addUpdatePipeline" allowRemote="true" batching="true" batchSize="50" batchTimeout="60000" simultaneousBatches="2" /> <branch event="onUpdate" pipelineManager="../ProcessPipelineManager" pipeline="addUpdatePipeline" allowRemote="true" batching="true" batchSize="50" batchTimeout="60000" simultaneousBatches="2" /> <branch event="onDelete" pipelineManager="../ProcessPipelineManager" pipeline="deletePipeline" allowRemote="true" batching="true" batchSize="50" batchTimeout="60000" simultaneousBatches="2" /> </branches> </component>
Source Configuration
Scanner Control Configuration
The following table describes the list of attributes that the AspireObject of the incoming scanner job requires to correctly execute and control the flow of a scan process.
Element | Type | Options | Description |
---|---|---|---|
@action | string | start, stop, pause, resume, abort | Control command to tell the scanner which operation to perform. Use start option to launch a new crawl. |
@actionProperties | string | full, incremental | When a start @action is received, it will tell the scanner to either run a full or an incremental crawl. |
@normalizedCSName | string | Unique identifier name for the content source that will be crawled. | |
displayName | string | Display or friendly name for the content source that will be crawled. |
Header Example
<doc action="start" actionProperties="full" actionType="manual" crawlId="0" dbId="0" jobNumber="0" normalizedCSName="FeedOne_Connector" scheduleId="0" scheduler="##AspireSystemScheduler##" sourceName="ContentSourceName"> ... <displayName>testSource</displayName> ... </doc>
All configuration properties described in this section are relative to /doc/connectorSource of the AspireObject of the incoming Job.
Element | Type | Default | Description |
---|---|---|---|
url | string | The Documentum URL to scan. The format of the dctm url is as follows: dctm://server:port/docbase/cabinet/folder, where the port is optional and the URL requires at least up to docbase. | |
username | string | The username to use when connecting to Documentum. | |
password | string | The password to use when connecting to Documentum. | |
indexContainers | boolean | false | true if folders (as well as files) should be sent to the pipeline. |
scanRecursively | boolean | false | true if sub folders of the given URL should be scanned. |
scanSystemCabinets | boolean | false | true if private and system cabinets should be scanned. |
maxFileSize | long | The limit size in MB of the content to be crawled, or unlimited if the whole file should be extracted. | |
webtopUrl | string | The URL to access the Webtop interface. This will be prefixed to each object path so it can be accessed through a URL. | |
dfcPropsFilePath | string | The location of the DFC properties file. |
Scanner Configuration Example
<doc action="start" actionProperties="full" normalizedCSName="cifsTest"> <connectorSource> <url>dctm://10.10.21.73:1489/DocumentumRepository/5000-500KB</url> <username>Administrator</username> <password>pass1234</password> <dfcPropsFilePath>config/dfc.properties</dfcPropsFilePath> <webtopUrl>http://10.10.21.73:9080/webtop/objectId=</webtopUrl> <maxFileSize>Unlimited</maxFileSize> <indexContainers>true</indexContainers> <scanRecursively>true</scanRecursively> <scanSystemCabinets>false</scanSystemCabinets> <fileNamePatterns> <include pattern=".*LSA.*"/> <exclude pattern=".*tmp.*"/> </fileNamePatterns> </connectorSource> <displayName>documentum</displayName> </doc>
Output
<doc> <url>dctm://10.10.21.73:1489/DocumentumRepository/5000-500KB/folder-1/folder-1-3/dm_document-0024.txt</url> <fetchUrl>dctm://10.10.21.73:1489/DocumentumRepository/5000-500KB/folder-1/folder-1-3/dm_document-0024.txt</fetchUrl> <snapshotUrl>006 dctm://10.10.21.73:1489/DocumentumRepository/5000-500KB/folder-1/folder-1-3/dm_document-0024.txt</snapshotUrl> <docType>item</docType> <id>090010e18001d15c</id> <connectorSpecific type="documentum"> <field name="object_name">dm_document-0024.txt</field> <field name="r_object_type">dm_document</field> <field name="r_creation_date">12/5/2013 2:13:59 PM</field> <field name="r_modify_date">12/5/2013 2:13:59 PM</field> <field name="r_modifier">Administrator</field> <field name="r_access_date">1/22/2014 2:42:11 PM</field> <field name="a_is_hidden">F</field> <field name="i_is_deleted">F</field> <field name="a_retention_date">nulldate</field> <field name="a_archive">F</field> <field name="a_link_resolved">F</field> <field name="i_reference_cnt">1</field> <field name="i_has_folder">T</field> <field name="i_folder_id">0b0010e18001d14b</field> <field name="r_link_cnt">0</field> <field name="r_link_high_cnt">0</field> <field name="r_assembled_from_id">0000000000000000</field> <field name="r_frzn_assembly_cnt">0</field> <field name="r_has_frzn_assembly">F</field> <field name="r_is_virtual_doc">0</field> <field name="i_contents_id">060010e18001c31b</field> <field name="a_content_type">crtext</field> <field name="r_page_cnt">1</field> <field name="r_content_size">511425</field> <field name="a_full_text">T</field> <field name="a_storage_type">filestore_01</field> <field name="i_cabinet_id">0c0010e1800175ca</field> <field name="owner_name">Administrator</field> <field name="owner_permit">7</field> <field name="group_name">docu</field> <field name="group_permit">5</field> <field name="world_permit">3</field> <field name="i_antecedent_id">0000000000000000</field> <field name="i_chronicle_id">090010e18001d15c</field> <field name="i_latest_flag">T</field> <field name="r_lock_date">nulldate</field> <field name="r_version_label">1.0,CURRENT</field> <field name="i_branch_cnt">0</field> <field name="i_direct_dsc">F</field> <field name="r_immutable_flag">F</field> <field name="r_frozen_flag">F</field> <field name="r_has_events">F</field> <field name="acl_domain">Administrator</field> <field name="acl_name">dm_450010e180000101</field> <field name="i_is_reference">F</field> <field name="r_creator_name">Administrator</field> <field name="r_is_public">T</field> <field name="r_policy_id">0000000000000000</field> <field name="r_resume_state">0</field> <field name="r_current_state">0</field> <field name="r_alias_set_id">0000000000000000</field> <field name="a_is_template">F</field> <field name="r_full_content_size">511425</field> <field name="a_is_signed">F</field> <field name="a_last_review_date">nulldate</field> <field name="i_retain_until">nulldate</field> <field name="i_partition">0</field> <field name="i_is_replica">F</field> <field name="i_vstamp">0</field> </connectorSpecific> <lastModified>2013-12-05T20:13:59Z</lastModified> <modifiedBy>Administrator</modifiedBy> <dataSize>511425</dataSize> <owner>Administrator</owner> <createdBy>Administrator</createdBy> <repItemType>aspire/dm_document</repItemType> <displayUrl>http://10.10.21.73:9080/webtop/objectId=090010e18001d15c</displayUrl> <acls> <acl access="allow" domain="dctm://10.10.21.73:1489/DocumentumRepository" entity="group" fullname="dctm://10.10.21.73:1489/DocumentumRepository@dm_world" name="dm_world" scope="global"/> <acl access="allow" domain="dctm://10.10.21.73:1489/DocumentumRepository" entity="group" fullname="dctm://10.10.21.73:1489/DocumentumRepository@Administrator" name="Administrator" scope="global"/> <acl access="allow" domain="dctm://10.10.21.73:1489/DocumentumRepository" entity="group" fullname="dctm://10.10.21.73:1489/DocumentumRepository@docu" name="docu" scope="global"/> </acls> <sourceName>documentum</sourceName> <sourceType>documentum</sourceType> <connectorSource> <url>dctm://10.10.21.73:1489/DocumentumRepository/5000-500KB</url> <username>Administrator</username> <password>encrypted:562E81591F85B858E5A5D3876F9C9FDB</password> <dfcPropsFilePath>config/dfc.properties</dfcPropsFilePath> <webtopUrl>http://10.10.21.73:9080/webtop/objectId=</webtopUrl> <maxFileSize>Unlimited</maxFileSize> <indexContainers>true</indexContainers> <scanRecursively>true</scanRecursively> <scanSystemCabinets>false</scanSystemCabinets> <fileNamePatterns/> <docbase>DocumentumRepository</docbase> <host>10.10.21.73</host> <port>1489</port> <displayName>documentum</displayName> </connectorSource> <action>add</action> <hierarchy> <item id="246AEB4224DF69E86C83D5AFC357A3FD" level="6" name="dm_document-0024.txt" url="dctm://10.10.21.73:1489/DocumentumRepository/5000-500KB/folder-1/folder-1-3/dm_document-0024.txt"> <ancestors> <ancestor id="6846F1C598D288AC85E2DDE6F178B488" level="5" name="folder-1-3" parent="true" type="aspire/dm_folder" url="dctm://10.10.21.73:1489/DocumentumRepository/5000-500KB/folder-1/folder-1-3/"/> <ancestor id="2760A49EEC1E0C469929E66A909BBCAA" level="4" name="folder-1" type="aspire/dm_folder" url="dctm://10.10.21.73:1489/DocumentumRepository/5000-500KB/folder-1/"/> <ancestor id="2F224B8B2365BE6BFD1554713BFFF190" level="3" name="5000-500KB" type="aspire/dm_cabinet" url="dctm://10.10.21.73:1489/DocumentumRepository/5000-500KB/"/> <ancestor id="A5ECC7C3A8738BB297CC336536AD3B60" level="2" name="DocumentumRepository" type="aspire/docbase" url="dctm://10.10.21.73:1489/DocumentumRepository/"/> <ancestor id="93EAEBDC4E5AF3FFF212F8E80902AF01" level="1" name="documentum" type="aspire/documentum" url="dctm://10.10.21.73:1489/"/> </ancestors> </item> </hierarchy> </doc>
Overview
Content Tools