General

Why is does an incremental crawl last as long as a full crawl?

In Aspire 2.0 and 2.0.1, the Sharepoint 2010 Connector performs incremental crawls based on snapshot files, which are meant to match the exact documents that have been indexed by the connector to the search engine. On an incremental crawl, the connector fully crawls the sharepoint content the same way as a full crawl, but it only indexes the modified, new or deleted documents during that crawl.

Since Aspire 2.0.2 the Sharepoint 2010 Connector uses the change token to get only the changes instead of using the snapshot file.

Technical

The connector fails with 401 unauthorized and this stack trace:

AspireException(aspire-sharepoint-scanner.scanFailed):
com.searchtechnologies.aspire.services.AspireException: An error ocurred on SharePoint Scanner. Detail: Server was unable to process request. ---> Attempted to perform an unauthorized operation..
        at com.searchtechnologies.aspire.components.SharePointScanner.process(SharePointScanner.java:211)
       at com.searchtechnologies.aspire.application.JobHandler.runNested(JobHandler.java:120)
        at com.searchtechnologies.aspire.application.PipelineManagerImpl.process(PipelineManagerImpl.java:196)
        at com.searchtechnologies.aspire.application.JobHandler.localJobRouteProcess(JobHandler.java:354)
        at com.searchtechnologies.aspire.application.JobHandler.runNested(JobHandler.java:249)
        at com.searchtechnologies.aspire.application.JobHandler.run(JobHandler.java:64)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
Caused by: javax.xml.ws.soap.SOAPFaultException: Server was unable to process request. ---> Attempted to perform an unauthorized operation.
        at com.sun.xml.internal.ws.fault.SOAP11Fault.getProtocolException(Unknown Source)
        at com.sun.xml.internal.ws.fault.SOAPFaultBuilder.createException(Unknown Source)
        at com.sun.xml.internal.ws.client.sei.SyncMethodHandler.invoke(Unknown Source)
        at com.sun.xml.internal.ws.client.sei.SyncMethodHandler.invoke(Unknown Source)
        at com.sun.xml.internal.ws.client.sei.SEIStub.invoke(Unknown Source)
        at $Proxy39.getContent(Unknown Source)
        at com.searchtechnologies.aspire.components.SharePointScanner.getCurrentChangeToken(SharePointScanner.java:508)
        at com.searchtechnologies.aspire.components.SharePointScanner.scan(SharePointScanner.java:436)
        at com.searchtechnologies.aspire.components.SharePointScanner.process(SharePointScanner.java:202)
        ... 8 more

This means that the service account used to access SharePoint content doesn't have enough permissions. In particular, you need to grant FULL READ permissions at Web Application level to the account if you need to proceed. Check above on the required permissions to crawl SharePoint.

Check prerequisites section for more details.

The connectors keeps failing with 401 unauthorized, even when the user has enough permissions.

Make sure the user has enough permissions. You can use the 'Check Permissions' option on the SharePoint site that you want to crawl.


  • No labels