Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Panel

Easy Heading Free
navigationTitleOn this Page
wrapNavigationTexttrue
navigationExpandOptionexpand-all-by-default

Introduction


The SharePoint On Premises Premise connectors will crawl content from any SharePoint On Premises Premise site collection URL. The connector will retrieve Sites, Lists, Folders, List Items and Attachments, as well as other pages (in .aspx format).


Environment and Access Requirements


Repository Support

The SharePoint On Premises Premise connectors supports crawling the following the repositories:

RepositoryVersionConnector Version
SharePoint20195.1.1
SharePoint20165.1.1
SharePoint20135.1.1

Account Privileges

The connectors offer one authentication options to access the SharePoint REST API: user account.

User Account

To configure a user crawl account, use the following, see SharePoint On Premises Premise - Crawl Account Access.

Info

To use a user crawl account on multiple site collections, you'll have to follow the steps on each site collection the access is needed.


Environment Requirements

The connector uses SharePoint's REST API, so the Aspire Worker nodes must have access to connect to the SharePoint on Premises Premise environment. 


Framework and Connector Features


Framework Features

NameSupported
Content Crawlingyes
Identity Crawlingyes
Snapshot-based Incrementalsyes
Non-snapshot-based Incrementalsyes
Document Hierarchyyes

Connector Features

The SharePoint On Premises Premise connectors have the following features:

  • Item filtering using include and exclude regex patterns.  This is based on the item's display URL.
  • Access Control Lists (ACLs) fetching, for document level security.
  • Support for BCS external lists.
  • Performs non-snapshot-based incremental crawling (so that only new/updated documents are indexed) using SharePoint's change log tokens.


Content Crawled


The SharePoint Online connector can crawl the following objects:

NameType Relevant MetadataContent Fetch & ExtractionDescription
Sitescontainer
  • Last Modified Date
N/AAny site or subsite underneath a seed. Not the same as the .aspx page for a SharePoint Site
Listscontainer
  • Last Modified Date
  • Data size
N/AAny type of SharePoint list including (but not limited to): Document Libraries, External Lists, Calendars, Task Lists, etc.
Folderscontainer
N/AList Item Folders found on lists like Document Libraries or Link Lists.

ListItems

document
YesListItems can take a number of different formats. For example, documents (PDF, doc, ppt, etc.), calendar events or announcements. For more info on how ListItems content types work, go to the MSDN article.
Attachmentsdocument
YesA document attached to a SharePoint List Item.




Limitations


Due to API limitations, the SharePoint On Premises Premiseconnectors have the following limitations:

  • The connector uses the REST API to access SharePoint databases directly; it doesn't use web crawling.
  • Crawling is only supported using a Site or a List as a root URL.