Features

The SharePoint 2013 connector will crawl content from any SharePoint 2013 site collection URL that you specify. The connector will retrieve Sites, Lists, Folders,List Items and Attachments, as well as other pages (in .aspx format).

The connector uses the REST API to access SharePoint database(s) directly; it doesn't do web crawling. Some of the features of the SharePoint 2013 connector include:

  • Performs incremental crawling (so that only new/updated documents are indexed) using SharePoint's change log timestamp.
  • Fetches access control lists (ACLs) for document level security
  • Is search engine independent
  • Runs from any machine with access to the given SharePoint URLs
  • Supports NTLM and HTTPs
  • Support for BCS external lists
  • Designed for supporting early binding mechanisms
  • Runs without installing anything on SharePoint
  • Regular expression patterns for files to include / exclude

Future Development Plan

The following features are not currently implemented, but are on the development plan:

  • Index and support people search

Anything we should add? Please let us know.

SharePoint Architecture

Find detailed information on MSDN article.

Summary of SharePoint organization

This is the hierarchy of processes/applications/sites/sub-sites/libraries/folders/and documents within SharePoint.

  • SharePoint Server
    • SharePoint Web Application Pool
      • SharePoint Web Application (single web application)
        • Main Site Collection (the primary or main site created for the web application, associated with the primary http://xyz.server.com URL)
          • Sub Sites
            • Document Libraries
              • Folders
                • Documents
                  • Attachments
        • Other Site Collections
          • Sub Sites
            • Document Libraries
              • Folders
                • Documents
                  • Attachments

Content Retrieved by the Connector

The SharePoint connector will retrieve the following objects:

  • Sites
  • Lists
  • External Lists (BCS)
  • Folders
  • Documents or List Items
  • Attachments

ListItems can take a number of different formats. For example, documents (pdf, doc, ppt, etc), calendar events or announcements. For more info on how ListItems content types work go to the MSDN article


Operation Mode

The connector will use the REST API over HTTP or HTTPs to acquire information of SharePoint 2013 content.

The connector acquires content by doing the following:

  • Go recursively through all sites, subsites, lists, folders and documents and creates sub-jobs for each object discovered. Each sub-job contains all metadata available, including ACLs.
  • Saves a snapshot file to compare previous item states and do incremental crawls with added, updated and deleted items. This snapshot file also contains the last saved SharePoint change log timestamp which is used on the next incremental crawl to get only modified items.
  • No labels