Some of the features of the SharePoint Online connector include:
- Performs incremental crawling (so that only new/updated documents are indexed) using SharePoint's change log timestamp
- Fetches access control lists (ACLs) for document level security
- Is search engine independent
- Runs from any machine with access to the given SharePoint URLs
- Supports ADFS and HTTPs
- Support for BCS external lists
- Designed for supporting early binding mechanisms.
- Runs without installing anything on SharePoint
- Regular expression patterns for including or excluding files
The SharePoint Online connector retrieves several types of documents, listed below are the inclusions and exclusions of these documents.
- External Lists (BCS)
- Documents or List Items
ListItems can take a number of different formats. For example, documents (pdf, doc, ppt, etc), calendar events or announcements. For more info on how ListItems content types work go to the MSDN article
Due to API limitations, SharePoint Online connector has the following limitations:
- The connector uses the REST API to access SharePoint database(s) directly; it doesn't use web crawling
Future Development Plan
The following features are not currently implemented, but are on the development plan:
- Index and support people search
Anything we should add? Please let us know.
The connector uses the REST API over HTTP or HTTPs to acquire information of SharePoint Online content.
The connector acquires content by doing the following:
- Go recursively through all sites, subsites, lists, folders and documents and creates sub-jobs for each object discovered. Each sub-job contains all metadata available, including ACLs.
- Saves a snapshot file to compare previous item states and do incremental crawls with added, updated and deleted items. This snapshot file also contains the last saved SharePoint change log timestamp which is used on the next incremental crawl to get only modified items.
Find detailed information on MSDN article.
Summary of SharePoint organization
This is the hierarchy of processes/applications/sites/sub-sites/libraries/folders/and documents within SharePoint.
SharePoint Web Application Pool
SharePoint Web Application (single web application)
Main Site Collection (the primary or main site created for the web application, associated with the primary http://xyz.server.com URL)
Other Site Collections