On this page

Introduction

The SharePoint Online connector will crawl content from any SharePoint Online site collection URL. The connector will retrieve Sites, Lists, Folders, List Items and Attachments, as well as other pages (in .aspx format). This connector supports SharePoint running in the Microsoft 365 offering.

This is not a O365 connector, the individual repository offerings within O365, such as OneDrive, Calendar, Tasks, Yammer will have their own connectors.

Environment and Access Requirements

Repository Support

The File System supports crawling the following the repositories

Repository	Version	Connector Version
SharePoint	Microsoft 365	5.0

Account Privileges

The connector offers two authentication options to access the SharePoint REST API: user account or Azure AD application.

User Account

To configure a user crawl account use the following GUIDE.

To use a user crawl account on multiple site collections, you'll have to follow the steps on each site collection the access is needed.

Azure AD Application

To configure an Azure AD application for crawling, see Azure AD Access for SharePoint Online.

Using an Azure AD Application will grant access to all site collections under the tenant.

Environment Requirements

The connector uses SharePoint's REST API, so the Aspire Worker nodes must have internet access to connect to the Microsoft 365 environment. Optionally, you can configure a proxy on the connector to enable internet access.

Framework and Connector Features

Framework Features

Name	Supported
Content Crawling	yes
Identity Crawling	yes
Snapshot-based Incrementals	yes
Non-snapshot-based Incrementals	yes
Document Hierarchy	yes

Connector Features

The SharePoint connector has the following features:

Item filtering using include and exclude regex patterns. This is based on the item's display url.
Access Control Lists (acls) fetching, for document level security.
Support for BCS external lists.
Performs non-snapshot-based incremental crawling (so that only new/updated documents are indexed) using SharePoint's change log tokens.

Content Crawled

The File System connector is able to crawl the following objects:

Name	Type	Relevant Metadata	Content Fetch & Extraction	Description
Folder	container	Last Modified Date	NA	The directories of the file system. Each directory will be scanned to retrieve more directories or files
File	document	Last Modified Date Data size	yes	The files contained by the directories in the crawled file system.

Limitations

The File System Connector has the following limitations:

The connector does not retrieve the acls of the crawled documents.

Page tree

Introduction

Environment and Access Requirements

Repository Support

Account Privileges

User Account

Azure AD Application

Environment Requirements

Framework and Connector Features

Framework Features

Connector Features

Content Crawled

Limitations

Contact Us: [email protected]

Page tree

SharePoint Online Features

Introduction

Environment and Access Requirements

Repository Support

Account Privileges

User Account

Azure AD Application

Environment Requirements

Framework and Connector Features

Framework Features

Connector Features

Content Crawled

Limitations

Contact Us: [email protected]