General

Is incremental indexing supported by the connector?

Yes. This is done using a snapshot file which is created when doing a full crawl. It saves the state of every crawled item and on the next incremental crawl it’ll use this file to compare and get any adds, updates or deletes. A new snapshot file is created after an incremental crawl to save the changes done.

Why is does an incremental crawl last as long as a full crawl?

As explained on the previous question, the connector uses snapshot files to perform incremental crawls, so on an incremental it fully crawls the eRoom repository the same way as a full crawl, but it only indexes the modified, new or deleted documents during that crawl.

For a discussion on crawling, see here

Why do I need to set up Administrator permissions?

Those permissions ("Administrator") are required so the connector can fetch additional information from site collections. This information is required for incremental indexing.

Consider setting "Site Administrator" rights to the crawler account on each site you want to crawl.

Which urls I need for the crawl?

This usually to use a specific eRoom url like:

eRoom Connector can crawl archived eRooms?

No, the eRoom Connector only crawl active eRooms in the server. In order to crawl the archived eRooms you need to restore it before.

Technical

How to crawl over https sites

The following instructions describe how to import a certificate to crawl HTTPs sites:

Eroom SSL Connector Options

  1. Access URL with the browser and download a copy the certificate.
  2. Create folder on Aspire for the certificate. For example: data\ssl\certName.cer
  3. Run %JAVA_HOME%\bin\keytool -import -file data\ssl\certName.cer -keystore data\ssl\keyStore.ks

    On step 3, keytool program can be found on bin folder under Java installation path.

  4. You can set the complete path of the certificate keystore and password in the connector configuration or Add to bin\startup.bat:
  -Djavax.net.ssl.trustStore=C:\pathToKeyStore\keyStore.ks
  -Djavax.net.ssl.trustStorePassword=password

On step 4, the password you specify here is asked on step 3 by keytool program.


Note: To import multiple certificates (for different connectors) use -alias parameter in step 3

e.g. to import certificate from Eroom site: 
%JAVA_HOME%\bin\keytool -import -file data\ssl\spCertName.cer -alias eroom -keystore data\ssl\keyStore.ks 
 
And to import certificate from SharePoint site: 
%JAVA_HOME%\bin\keytool -import -file data\ssl\spCertName.cer -alias sharepoint -keystore data\ssl\keyStore.ks 

Important Note (known limitation):

If you set a wrong path or password you will receive a SSL exception and the crawl will not start.
Also, in the case of you set those values by the connector configuration option, and you set an incorrect value and try to start a
crawl,you will receive the exception, but if you change the values by the correct one and try again you will receive again the
exception.
The problem is these are global settings so everything running in your JVM must use that truststore. And you can't alter those
system properties during runtime and expect those changes to take effect. Once you ask the JVM to make a secure connection, those
system property values appear to be cached in the JVM and are used thereafter for the life of the JVM.
You need to restart Aspire to the changes take effect.


The connector fails with SSL keystore set

In some cases when you try to crawl an https eRoom site you can get a stack trace like this:

Error scanning directory: https://test-server.com/eRoom/Test_Facility/Test_Eroom/
com.searchtechnologies.aspire.services.AspireException: Unable to read the OutputStream of the request
        at com.searchtechnologies.aspire.components.EroomSourceInfo.createRootItem(EroomSourceInfo.java:137)
        at com.searchtechnologies.aspire.scanner.AbstractHierarchicalScanner.performScan(AbstractHierarchicalScanner.java:59)
        at com.searchtechnologies.aspire.scanner.AbstractScanner.scanProcess(AbstractScanner.java:598)
        at com.searchtechnologies.aspire.scanner.AbstractScanner.process(AbstractScanner.java:290)
        at com.searchtechnologies.aspire.application.JobHandlerImpl.runNested(JobHandlerImpl.java:149)
        at com.searchtechnologies.aspire.application.PipelineManagerImpl.process(PipelineManagerImpl.java:256)
        at com.searchtechnologies.aspire.application.JobHandlerImpl.processLocalJobRoute(JobHandlerImpl.java:385)
        at com.searchtechnologies.aspire.application.JobHandlerImpl.runNested(JobHandlerImpl.java:300)
        at com.searchtechnologies.aspire.application.JobHandlerImpl.run(JobHandlerImpl.java:78)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
        at java.lang.Thread.run(Thread.java:662)
Caused by: javax.net.ssl.SSLException: java.lang.RuntimeException: Unexpected error: java.security.
InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty
        at com.sun.net.ssl.internal.ssl.Alerts.getSSLException(Alerts.java:190)
        at com.sun.net.ssl.internal.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1747)
        at com.sun.net.ssl.internal.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1708)
        at com.sun.net.ssl.internal.ssl.SSLSocketImpl.handleException(SSLSocketImpl.java:1691)
        at com.sun.net.ssl.internal.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1222)
        at com.sun.net.ssl.internal.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1199)
        at sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:434)
        at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:166)
        at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1031)
        at sun.net.www.protocol.https.HttpsURLConnectionImpl.getOutputStream(HttpsURLConnectionImpl.java:230)
        at com.searchtechnologies.aspire.components.EroomDSConnection.readOutputStream(EroomDSConnection.java:220)
        ... 16 more
Caused by: java.lang.RuntimeException: Unexpected error: java.security.
InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty
        at sun.security.validator.PKIXValidator.<init>(PKIXValidator.java:57)
        at sun.security.validator.Validator.getInstance(Validator.java:161)
        at com.sun.net.ssl.internal.ssl.X509TrustManagerImpl.getValidator(X509TrustManagerImpl.java:108)
        at com.sun.net.ssl.internal.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:204)
        at com.sun.net.ssl.internal.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:249)
        at com.sun.net.ssl.internal.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1188)
        at com.sun.net.ssl.internal.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:135)
        at com.sun.net.ssl.internal.ssl.Handshaker.processLoop(Handshaker.java:593)
        at com.sun.net.ssl.internal.ssl.Handshaker.process_record(Handshaker.java:529)
        at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:943)
        at com.sun.net.ssl.internal.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1188)
        at com.sun.net.ssl.internal.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1215)
        ... 22 more

This means that the SSL keystore or password using for crawling are incorrect so you need to change that and restart aspire that because that properties are setting in the system properties and once they are set, can not be changed again until the java virtual machine is reset.

Use SSL UI option

In order to correctly enable or disable the "Use SSL" option of the connector, the Aspire instance has to be restarted.

Save your content source before creating or editing another one

Failing to save a content source before creating or editing another content source can result in an error.

ERROR [aspire]: Exception received attempting to get execute component command com.searchtechnologies.aspire.services.AspireException: Unable to find content source

Save the initial content source before creating or working on another.

Multi-Threads Technical limitation

With Aspire 3.0 we introduce a multi-threading platform to perform crawls, but due architecture / API limitations the eRoom Connector basically work as single thread connector. Seems to be that the server does not allow multiples connections performing queries at the same time.

At some point of the crawl, the connector get this errors from the server "No target objects were found evaluating the command's select attribute" but the query that we are trying to execute works well (we tested using a SOAP/XML Test program provided for EMC).

Also if you look at the lasted comments in this post from the EMC forums:

https://community.emc.com/message/105700#105700

Seems to be that eRoom is not multi-thread safe so when multiple processes are spawned you will get unpredictable behavior. So because of that, the eRoom Connector is single thread and that affect the performance due Aspire 3.0 is multi-thread.

  • No labels