FAQs


Specific

 Is incremental indexing supported by the connector?

Yes. This is done using a snapshot file which is created when doing a full crawl. It saves the state of every crawled item and on the next incremental crawl it’ll use this file to compare and get any adds, updates or deletes. A new snapshot file is created after an incremental crawl to save the changes done.

Why do I need to set up permissions at Web Application level?

Those permissions ("Full Read") are required so the connector can fetch additional information from site collections. This information is required for incremental indexing.

If setting permissions at Web Application level is not suitable to your environment, consider setting site collection administrator rights to the crawler account on each site collection you want to crawl. (This will work without Web App permissions, but has to be set on every site collection.)

Why is the connector crawling the whole site collection? Even if the start URL is from a sub-site?

This usually happens when you are trying to access your site using an IP address instead of hostname. The traffic is redirected to the top level site by IIS or a router, so the connector receives the wrong information from SharePoint. To fix, simply use the hostname of your SharePoint front end.

Why is does an incremental crawl last as long as a full crawl?

In Aspire 2.0 and 2.0.1, the Sharepoint 2010 Connector performs incremental crawls based on snapshot files, which are meant to match the exact documents that have been indexed by the connector to the search engine. On an incremental crawl, the connector fully crawls the Sharepoint content the same way as a full crawl, but it only indexes the modified, new or deleted documents during that crawl.

Since Aspire 2.0.2 the Sharepoint 2010 Connector uses the change token to get only the changes instead of using the snapshot file.

For a discussion on crawling, see here.

Why am I seeing a duplicated ACL for an item when crawling? And why the ACL is not removed even when I deleted the permission from the list?

The web service extensions for SP2007/2010 not only takes into account the permissions on the item, but also the Web App and Zone Policies. When a user or group is part of those policies, info like the SID is not available and so it is not return on the acl. Seeing a duplicate acl, in this case, means that the account is part of the item permissions and also part of the Web App Policies. After deleting the permission on the item, the web app policy is still there, and that is why we still see one of the acl entries.

Why the user ACL is marked as "Deny" even when the user has Limited Access and Full Control permissions?

There's another account that has a similar scenario, as the one shown above, but it has 2 permissions on the item apart from the Web App Policy: Limited Access and Full Control. This is not a normal combination of permissions and because of how we are handling things right now Limited Access has a priority over the Full Control when the "Allow Limited Access" option is false on the connector configuration. This means that the item permission is set as "Deny", which will then remove any "allow" permission given to the same account. If we set the "Allow Limited Access" option to true, the acl turns into an allow and then the acl is duplicated.

General 

Why does an incremental crawl last as long as a full crawl?

Some connectors perform incremental crawls based on snapshot files, which are meant to match the exact documents that have been indexed by the connector to the search engine. On an incremental crawl, the connector fully crawls the file system the same way as a full crawl, but it only indexes the modified, new or deleted documents during that crawl.

For a discussion on crawling, see Full & Incremental Crawls.

Save your content source before creating or editing another one

Failing to save a content source before creating or editing another content source can result in an error.

ERROR [aspire]: Exception received attempting to get execute component command com.searchtechnologies.aspire.services.AspireException: Unable to find content source

Save the initial content source before creating or working on another.

Troubleshooting


The connector fails with 401 unauthorized and this stack trace:

AspireException(aspire-sharepoint-scanner.scanFailed):
com.searchtechnologies.aspire.services.AspireException: An error ocurred on SharePoint Scanner. Detail: Server was unable to process request. ---> Attempted to perform an unauthorized operation..
        at com.searchtechnologies.aspire.components.SharePointScanner.process(SharePointScanner.java:211)
       at com.searchtechnologies.aspire.application.JobHandler.runNested(JobHandler.java:120)
        at com.searchtechnologies.aspire.application.PipelineManagerImpl.process(PipelineManagerImpl.java:196)
        at com.searchtechnologies.aspire.application.JobHandler.localJobRouteProcess(JobHandler.java:354)
        at com.searchtechnologies.aspire.application.JobHandler.runNested(JobHandler.java:249)
        at com.searchtechnologies.aspire.application.JobHandler.run(JobHandler.java:64)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
Caused by: javax.xml.ws.soap.SOAPFaultException: Server was unable to process request. ---> Attempted to perform an unauthorized operation.
        at com.sun.xml.internal.ws.fault.SOAP11Fault.getProtocolException(Unknown Source)
        at com.sun.xml.internal.ws.fault.SOAPFaultBuilder.createException(Unknown Source)
        at com.sun.xml.internal.ws.client.sei.SyncMethodHandler.invoke(Unknown Source)
        at com.sun.xml.internal.ws.client.sei.SyncMethodHandler.invoke(Unknown Source)
        at com.sun.xml.internal.ws.client.sei.SEIStub.invoke(Unknown Source)
        at $Proxy39.getContent(Unknown Source)
        at com.searchtechnologies.aspire.components.SharePointScanner.getCurrentChangeToken(SharePointScanner.java:508)
        at com.searchtechnologies.aspire.components.SharePointScanner.scan(SharePointScanner.java:436)
        at com.searchtechnologies.aspire.components.SharePointScanner.process(SharePointScanner.java:202)
        ... 8 more

This means that the service account used to access SharePoint content doesn't have enough permissions. In particular, you need to grant FULL READ permissions at Web Application level to the account if you need to proceed. Check above on the required permissions to crawl SharePoint.

Check prerequisites section for more details.

The connectors keeps failing with 401 unauthorized, even when the user has enough permissions.

Make sure the user has enough permissions. You can use the 'Check Permissions' option on the SharePoint site that you want to crawl.