FAQs

Specific

What ID is used for indexing?

We use Documentum chronicle_id as the id for indexing because this number stays the same for all versions of one document.

Which version of the document is used for indexing?

Although document can have many versions in Documentum we use only "current" version of the document for indexing

We are using {SLICES} param in the DQL query. Even though the scanner threads set at '20', at max, the scanner threads count will always be '16'. Is that correct?

In the current DQL connector with {SLICES} Aspire would use up to 16 scanner threads. Without "slices" the whole scan phase would be handled by one scanner thread only.

Can you explain the usage of 'Scanner threads' Vs 'Processing threads'?

Scanner threads in all connectors are basically used for getting list of items for further processing. For classical hierarchical connectors - i.e. File system - scanner threads provides list of files for each traversed directory. DQL connector is somehow "flat" and all items are provided by the specified DQL statement. "Slices" means that we artificially create more DQL statement to achieve some concurrency. But usually the scanning tasks only run the DQL statement and store chronicleId, objectId to the Mongo queue. The time here should not have been critical unless some real slow DQL statements were to be processed.

Processing threads in DQL connector work for all documents like that: 1. getting the object detail by "id" from Documentum 2. populate the Aspire item from scan phase by attributes from object detail - getting metadata 3. fetching the content from Documentum and store the content as a stream into the Aspire job (fetcher) 4. extracting text using TIKA for "text" files 5. processing in workflow components.

Why the FetchUrl is consuming so much time?

Fetch URL is implemented in DQL as reading the whole content of the Documentum file into the memory as a byte array and exposing this array as the ByteArrayInputStream object to later stages.

Scanning threads are not relevant here

Increasing the number of processing threads can help but it must be balanced with heap size assigned to JVM. It also of course depends on the size of the fetching files. More processing threads means also more memory consumed since more possibly large files are processed in parallel way. This whole process could be tuned with the help of for example visualVm graphs which could show also the garbage collector activity etc.

The most atomic operation here is the actual reading the content from Documentum by DFC classes – something like iDfSysObject.getContent() . If this operation is slow then no Aspire related configuration can help.

General

Warning! The question: Why does an incremental crawl last as long as a full crawl is not relevant for this connector!

Include Page

	Connectors FAQ & Troubleshooting
	Connectors FAQ & Troubleshooting

Troubleshooting

Info
No available troubleshooting at this moment

Problem

FetchUrl is consuming much time

Solution

Fetch URL is implemented in DQL as reading the whole content of the Documentum file into the memory as a byte array and exposing this array as the ByteArrayInputStream object to later stages.

Scanning threads are not relevant here

Increasing the number of processing threads can help but it must be balanced with heap size assigned to JVM. It also of course depends on the size of the fetching files. More processing threads means also more memory consumed since more possibly large files are processed in parallel way. This whole process could be tuned with the help of for example visualVm graphs which could show also the garbage collector activity etc.

The most atomic operation here is the actual reading the content from Documentum by DFC classes – something like iDfSysObject.getContent() . If this operation is slow then no Aspire related configuration can help.

Solution

Panel

bgColor	#fff

Page tree

Versions Compared

Old Version 7

New Version 8

Key

FAQs

Specific

What ID is used for indexing?

Which version of the document is used for indexing?

We are using {SLICES} param in the DQL query. Even though the scanner threads set at '20', at max, the scanner threads count will always be '16'. Is that correct?

Can you explain the usage of 'Scanner threads' Vs 'Processing threads'?

Why the FetchUrl is consuming so much time?

General

Troubleshooting

Problem

Problem

FetchUrl is consuming much time

Solution

Page tree

Page History

Versions Compared

Old Version 7

New Version 8

Key

FAQs

Specific

What ID is used for indexing?

Which version of the document is used for indexing?

We are using {SLICES} param in the DQL query. Even though the scanner threads set at '20', at max, the scanner threads count will always be '16'. Is that correct?

Can you explain the usage of 'Scanner threads' Vs 'Processing threads'?

Why the FetchUrl is consuming so much time?

General

Troubleshooting

Problem

Problem

FetchUrl is consuming much time

Solution