The SharePoint Online Connector can be configured using the Rest API. It requires the following entities to be created:
Below are the examples of how to create the Credentials, the Connection and the Seed. For the Connector, please refer to this page.
Field | Required | Default | Multiple | Notes | Example |
---|---|---|---|---|---|
type | Yes | - | No | The value must be "sharepoint-online". | "sharepoint-online" |
description | Yes | - | No | Short description or name for the credential. | "SPO_Prod_Credentials" |
throttlePolicy | No | - | No | Throttle policy ID that will affect all seeds that use this credential. | "4f07e6c5-d2e9-48d8-a55d-094917508c0f" |
properties.useAzureAuthentication | Yes | - | No | Whether credentials are Azure AD Application (true) credentials or User account credentials (false). | true |
properties.tenantDomain | Yes | - | No | Only required if useAzureAuthentication is true. Tenant domain the Azure AD Application is part of. | "contoso.onmicrosoft.com" |
properties.clientId | Yes | - | No | Only required if useAzureAuthentication is true. Azure AD Application ID. | "1a1a-2b2b2-2cc22-aaa456" |
properties.certificatePath | Yes | - | No | Only required if useAzureAuthentication is true. Path to the certificate file. This path has to be accessible by all worker nodes that will use these credentials. | "${dist.data.dir}/${app.name}/certificates/certificate.cer" |
properties.privateKeyPath | Yes | - | No | Only required if useAzureAuthentication is true. Path to the certificate private key file. This path has to be accessible by all worker nodes that will use these credentials. | "${dist.data.dir}/${app.name}/keys/key.key" |
properties.username | Yes | - | No | Only required if useAzureAuthentication is false. Username for the crawl account when using user/password credentials. | "[email protected]" |
properties.password | Yes | - | No | Only required if useAzureAuthentication is false. Password for the crawl account when using user/password credentials. This field can also be sent as an encrypted string. Check the encryption API for more information on how to encrypt. | "password" |
{ "type": "sharepoint-online", "description": "SPO_Cred", "properties": { "useAzureAuthentication": true, "tenantDomain": "cao365.onmicrosoft.com", "clientId": "12345-abcd", "certificatePath": "${dist.data.dir}/${app.name}/certificates/certificate.cer", "privateKeyPath": "${dist.data.dir}/${app.name}/keys/key.key" } }
Field | Required | Default | Multiple | Notes | Example |
---|---|---|---|---|---|
id | Yes | - | No | ID of the credential to update | "d42e1872-02c8-4a90-a714-44f15577389a" |
type | Yes | - | No | The value must be "sharepoint-online". | "sharepoint-online" |
description | No | - | No | Short description or name for the credential. | "SPO_Prod_Credentials" |
throttlePolicy | No | - | No | Throttle policy ID that will affect all seeds that use this credential. | "4f07e6c5-d2e9-48d8-a55d-094917508c0f" |
properties.useAzureAuthentication | No | - | No | Whether credentials are Azure AD Application (true) credentials or User account credentials (false). | true |
properties.tenantDomain | No | - | No | Only required if useAzureAuthentication is true. Tenant domain the Azure AD Application is part of. | "contoso.onmicrosoft.com" |
properties.clientId | No | - | No | Only required if useAzureAuthentication is true. Azure AD Application ID. | "1a1a-2b2b2-2cc22-aaa456" |
properties.certificatePath | No | - | No | Only required if useAzureAuthentication is true. Path to the certificate file. This path has to be accessible by all worker nodes that will use these credentials. | "${dist.data.dir}/${app.name}/certificates/certificate.cer" |
properties.privateKeyPath | No | - | No | Only required if useAzureAuthentication is true. Path to the certificate private key file. This path has to be accessible by all worker nodes that will use these credentials. | "${dist.data.dir}/${app.name}/keys/key.key" |
properties.username | No | - | No | Only required if useAzureAuthentication is false. Username for the crawl account when using user/password credentials. | "aspire_crawl_account@contoso.onmicrosoft.com" |
properties.password | No | - | No | Only required if useAzureAuthentication is false. Password for the crawl account when using user/password credentials. This field can also be sent as an encrypted string. Check the encryption API for more information on how to encrypt. | "password" |
{ "id": "d42e1872-02c8-4a90-a714-44f15577389a", "type": "sharepoint-online", "description": "SPO_Cred_new", "throttlePolicy": "4f07e6c5-d2e9-48d8-a55d-094917508c0f", "properties": { "useAzureAuthentication": false, "username": "aspire_crawl_account", "password": "encrypted:F8EC95B8007E645EA15E2D5F717EB370DAA4274D73D767DA0AF0F3AB94B8430C81AF774D2A272833C3AA77C4776B75B6" } }
Field | Required | Default | Multiple | Notes | Example |
---|---|---|---|---|---|
type | yes | no | The value must be "sharepoint-online" | "sharepoint-online" | |
description | yes | no | Name of the connection object. | "SPO_Connection" | |
credential | yes | no | ID of the credential that applies to this connection object. | "d42e1872-02c8-4a90-a714-44f15577389a" | |
throttlePolicy | no | no | ID of the throttle policy that applies to this connection object. | "4f07e6c5-d2e9-48d8-a55d-094917508c0f" | |
routingPolicies | no | [ ] | yes | The IDs of the routing policies that this connection will use. | ["5c7274ef-429b-46ef-8f73-f010e479a467", "9dee4fba-14f2-4afc-a74d-297bcbbd359a"] |
properties.serverUrl | yes | no | Hostname where SharePoint is located. | "https://contoso.sharepoint.com" | |
properties.indexContainers | no | false | no | Indicates if we should index the SharePoint containers. | false |
properties.useSnapshots | no | false | no | Indicates if we should process incremental crawls using Aspire Snapshots instead of SharePoint's Change Log. | false |
properties.stopCrawlOnScannerError | no | true | no | Indicates if we should stop the crawl if an error occurs during the scan phase. | true |
properties.filterNoCrawl | no | false | no | Indicates if the crawl will exclude sites and lists using SharePoint's NoCrawl property. | false |
properties.crawlAttachments | no | true | no | Indicates if we should crawl List Item attachments. | true |
properties.logRequestAndResponses | no | false | no | Check to add debug log information about the rest requests and their responses. | false |
properties.scanExcludedItems | no | false | no | Indicates if we should force the scan of excluded directories, so child items within the scope can be found. | false |
properties.replaceContent | no | false | no | Indicates if the content field of the .aspx items will be filled with the contents of the CanvasContent1/WikiField fields. If CanvasContent1 and WikiField fields are empty, no contents will be assigned to the content field. | false |
properties.downloadLargeFiles | no | true | no | Indicates if we should download files that exceed the Data Size Threshold to disc instead of leaving the connection open. | true |
properties.dataSizeThreshold | no | "100mb" | no | Used as a limit to leave any document with size lower than this limit in memory. If the size is higher the connection to SharePoint will be open until the content is consumed otherwise the content will be downloaded to a temporary file. | "100mb" |
properties.includes | no | yes | The document will be processed by the connector if it matches one of the following patterns. | [".*\\.pdf",".*\\.pptx"] | |
properties.excludes | no | yes | The document will not be processed by the connector if it matches one of the following patterns. | ".*\\.xml" | |
properties.groupPrefixSeparator | no | "|" | no | Prefix used to separate users and groups on ACL's. | "|" |
properties.lowercaseGE | no | false | no | Indicates if entries extracted from the cache groups process will be on lower case. | false |
properties.userGroupPageSize | no | 1000 | no | Page size for fetching users and groups. | 1000 |
properties.useAzureGroups | no | false | no | Indicates if Azure AD group expansion will be used when caching groups for expansion. | true |
properties.azureADSeed | no | no | Azure AD seed to be used for group expansion purposes. | "f5587cee-9116-4011-b3a9-6b235b333a1b" | |
properties.useProxy | no | false | no | Indicates if a proxy is required for connecting to Sharepoint. | true |
properties.proxyHost | no | no | The proxy hostname. | "proxy.com" | |
properties.proxyPort | no | no | The proxy port. | 8080 | |
properties.useProxyAuthentication | no | false | no | Check if the proxy requires authentication. | true |
properties.proxyDomain | yes, if using proxy authentication | no | The domain used to authenticate to the proxy. | "DIR" | |
properties.proxyUser | yes, if using proxy authentication | no | The username used to authenticate to the proxy. | "proxy_user" | |
properties.proxyPassword | yes, if using proxy authentication | no | The password used to authenticate to the proxy. | "thePassword" | |
properties.requestProperty | no | no | Extra HTTP headers to be included with the requests. | [{"name": "user-agent","value": "agent"}] | |
properties.retryCount | no | 2 | no | Number of retries for failed requests. | 2 |
properties.retrySleep | no | "500ms" | no | Time period to wait in between failed request retries. | "500ms" |
properties.socketTimeout | no | "60s" | no | Time period of inactivity to wait for packets to arrive. Defaults to ms. e.g. 5ms, 5s, 5m, 5h or 5d | "60s" |
properties.connectTimeout | no | no | Time period to wait to establish a connection with the remote host. Defaults to ms. e.g. 5ms, 5s, 5m, 5h or 5d | "60s" | |
properties.connectionRequestTimeout | no | no | Time period to wait to fetch a connection from the connection pool. Defaults to ms. e.g. 5ms, 5s, 5m, 5h or 5d | "60s" | |
properties.idleConnectionTimeout | no | no | Time period to wait to close an idle connection. Defaults to ms. e.g. 5ms, 5s, 5m, 5h or 5d | "5m" | |
properties.maxConnections | no | 100 | no | Maximum number of open connections. | 100 |
properties.maxConnectionsPerRoute | no | 10 | no | Maximum number of open connections per route. | 10 |
{ "type": "sharepoint-online", "description": "SPO_Connection", "credential": "d42e1872-02c8-4a90-a714-44f15577389a", "throttlePolicy": "4f07e6c5-d2e9-48d8-a55d-094917508c0f", "routingPolicies": ["5c7274ef-429b-46ef-8f73-f010e479a467", "9dee4fba-14f2-4afc-a74d-297bcbbd359a"], "properties": { "serverUrl": "https://contoso.sharepoint.com", "indexContainers": false, "useSnapshots": false, "stopCrawlOnScannerError": true, "filterNoCrawl": false, "crawlAttachments": true, "logRequestAndResponses": false, "scanExcludedItems": false, "replaceContent": false, "downloadLargeFiles": true, "dataSizeThreshold": "100mb", "includes": [ ".*\\.pdf", ".*\\.pptx" ], "excludes": ".*\\.xml", "groupPrefixSeparator": "|", "lowercaseGE": false, "userGroupPageSize": 1000, "useAzureGroups": true, "azureADSeed": "f5587cee-9116-4011-b3a9-6b235b333a1b", "useProxy": true, "proxyHost": "proxy.com", "proxyPort": 8080, "useProxyAuthentication": true, "proxyDomain": "DIR", "proxyUser": "proxy_user", "proxyPassword": "thePassword", "requestProperty": [ { "name": "user-agent", "value": "agent" } ], "retryCount": 2, "retrySleep": "500ms", "socketTimeout": "60s", "connectTimeout": "60s", "connectionRequestTimeout": "60s", "idleConnectionTimeout": "5m", "maxConnections": 100, "maxConnectionsPerRoute": 10 } }
Field | Required | Default | Multiple | Notes | Example |
---|---|---|---|---|---|
type | yes | no | The value must be "sharepoint-online" | "sharepoint-online" | |
description | yes | no | Name of the connection object. | "SPO_Connection" | |
credential | yes | no | ID of the credential that applies to this connection object. | "d42e1872-02c8-4a90-a714-44f15577389a" | |
throttlePolicy | no | no | ID of the throttle policy that applies to this connection object. | "4f07e6c5-d2e9-48d8-a55d-094917508c0f" | |
routingPolicies | no | [ ] | yes | The IDs of the routing policies that this connection will use. | ["5c7274ef-429b-46ef-8f73-f010e479a467", "9dee4fba-14f2-4afc-a74d-297bcbbd359a"] |
properties.serverUrl | yes | no | Hostname where SharePoint is located. | "https://contoso.sharepoint.com" | |
properties.indexContainers | no | false | no | Indicates if we should index the SharePoint containers. | false |
properties.useSnapshots | no | false | no | Indicates if we should process incremental crawls using Aspire Snapshots instead of SharePoint's Change Log. | false |
properties.stopCrawlOnScannerError | no | true | no | Indicates if we should stop the crawl if an error occurs during the scan phase. | true |
properties.filterNoCrawl | no | false | no | Indicates if the crawl will exclude sites and lists using SharePoint's NoCrawl property. | false |
properties.crawlAttachments | no | true | no | Indicates if we should crawl List Item attachments. | true |
properties.logRequestAndResponses | no | false | no | Check to add debug log information about the rest requests and their responses. | false |
properties.scanExcludedItems | no | false | no | Indicates if we should force the scan of excluded directories, so child items within the scope can be found. | false |
properties.replaceContent | no | false | no | Indicates if the content field of the .aspx items will be filled with the contents of the CanvasContent1/WikiField fields. If CanvasContent1 and WikiField fields are empty, no contents will be assigned to the content field. | false |
properties.downloadLargeFiles | no | true | no | Indicates if we should download files that exceed the Data Size Threshold to disc instead of leaving the connection open. | true |
properties.dataSizeThreshold | no | "100mb" | no | Used as a limit to leave any document with size lower than this limit in memory. If the size is higher the connection to SharePoint will be open until the content is consumed otherwise the content will be downloaded to a temporary file. | "100mb" |
properties.includes | no | yes | The document will be processed by the connector if it matches one of the following patterns. | [".*\\.pdf",".*\\.pptx"] | |
properties.excludes | no | yes | The document will not be processed by the connector if it matches one of the following patterns. | ".*\\.xml" | |
properties.groupPrefixSeparator | no | "|" | no | Prefix used to separate users and groups on ACL's. | "|" |
properties.lowercaseGE | no | false | no | Indicates if entries extracted from the cache groups process will be on lower case. | false |
properties.userGroupPageSize | no | 1000 | no | Page size for fetching users and groups. | 1000 |
properties.useAzureGroups | no | false | no | Indicates if Azure AD group expansion will be used when caching groups for expansion. | true |
properties.azureADSeed | no | no | Azure AD seed to be used for group expansion purposes. | "f5587cee-9116-4011-b3a9-6b235b333a1b" | |
properties.useProxy | no | false | no | Indicates if a proxy is required for connecting to Sharepoint. | true |
properties.proxyHost | no | no | The proxy hostname. | "proxy.com" | |
properties.proxyPort | no | no | The proxy port. | 8080 | |
properties.useProxyAuthentication | no | false | no | Check if the proxy requires authentication. | true |
properties.proxyDomain | yes, if using proxy authentication | no | The domain used to authenticate to the proxy. | "DIR" | |
properties.proxyUser | yes, if using proxy authentication | no | The username used to authenticate to the proxy. | "proxy_user" | |
properties.proxyPassword | yes, if using proxy authentication | no | The password used to authenticate to the proxy. | "thePassword" | |
properties.requestProperty | no | no | Extra HTTP headers to be included with the requests. | [{"name": "user-agent","value": "agent"}] | |
properties.retryCount | no | 2 | no | Number of retries for failed requests. | 2 |
properties.retrySleep | no | "500ms" | no | Time period to wait in between failed request retries. | "500ms" |
properties.socketTimeout | no | "60s" | no | Time period of inactivity to wait for packets to arrive. Defaults to ms. e.g. 5ms, 5s, 5m, 5h or 5d | "60s" |
properties.connectTimeout | no | no | Time period to wait to establish a connection with the remote host. Defaults to ms. e.g. 5ms, 5s, 5m, 5h or 5d | "60s" | |
properties.connectionRequestTimeout | no | no | Time period to wait to fetch a connection from the connection pool. Defaults to ms. e.g. 5ms, 5s, 5m, 5h or 5d | "60s" | |
properties.idleConnectionTimeout | no | no | Time period to wait to close an idle connection. Defaults to ms. e.g. 5ms, 5s, 5m, 5h or 5d | "5m" | |
properties.maxConnections | no | 100 | no | Maximum number of open connections. | 100 |
properties.maxConnectionsPerRoute | no | 10 | no | Maximum number of open connections per route. | 10 |
{ "id": "89d6632a-a296-426c-adb0-d442adcab4b0", "type": "sharepoint-online", "description": "SPO_Connection", "credential": "d42e1872-02c8-4a90-a714-44f15577389a", "throttlePolicy": "4f07e6c5-d2e9-48d8-a55d-094917508c0f", "routingPolicies": ["5c7274ef-429b-46ef-8f73-f010e479a467", "9dee4fba-14f2-4afc-a74d-297bcbbd359a"], "properties": { "serverUrl": "https://contoso.sharepoint.com", "indexContainers": false, "useSnapshots": false, "stopCrawlOnScannerError": true, "filterNoCrawl": false, "crawlAttachments": true, "logRequestAndResponses": false, "scanExcludedItems": false, "replaceContent": false, "downloadLargeFiles": true, "dataSizeThreshold": "100mb", "includes": [ ".*\\.pdf", ".*\\.pptx" ], "excludes": ".*\\.xml", "groupPrefixSeparator": "|", "lowercaseGE": false, "userGroupPageSize": 1000, "useAzureGroups": true, "azureADSeed": "f5587cee-9116-4011-b3a9-6b235b333a1b", "useProxy": true, "proxyHost": "proxy.com", "proxyPort": 8080, "useProxyAuthentication": true, "proxyDomain": "DIR", "proxyUser": "proxy_user", "proxyPassword": "thePassword", "requestProperty": [ { "name": "user-agent", "value": "agent" } ], "retryCount": 2, "retrySleep": "500ms", "socketTimeout": "60s", "connectTimeout": "60s", "connectionRequestTimeout": "60s", "idleConnectionTimeout": "5m", "maxConnections": 100, "maxConnectionsPerRoute": 10 } }
Field | Required | Default | Multiple | Notes | Example |
---|---|---|---|---|---|
seed | Yes | - | No | <seed description> | |
type | Yes | - | No | The value must be "sharepoint-online" | "sharepoint-online" |
description | Yes | - | No | Name of the seed object. | "My SharePoint Seed" |
connector | Yes | - | No | The ID of the connector to be used with this seed. The connector type must match the seed type. | "82f7f0a4-8d28-47ce-8c9d-e3ca414b0d31" |
connection | Yes | - | No | The ID of the connection to be used with this seed. The connection type must match the seed type. | "602d3700-28dd-4a6a-8b51-e4a663fe9ee6" |
workflows | No | [ ] | Yes | The IDs of the workflows that will be executed for the documents crawled. | ["f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"] |
throttlePolicy | No | - | No | ID of the throttle policy that applies to this connection object. | "f5587cee-9116-4011-b3a9-6b235b333a1b" |
routingPolicies | No | [ ] | Yes | The IDs of the routing policies that this seed will use. | ["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"] |
tags | No | [ ] | Yes | The tags of the seed. These can be used to filter the seed. | ["tag1", "tag2"] |
properties.seed | Yes | - | No | This value must be a sharepoint Seed | "/sites/DemoSite" |
{ "seed": "/sites/DemoSite", "type": "sharepoint-online", "description": "sp_seed_api_test", "connector": "a3e9e1aa-a0dd-4d12-ae2c-9085e6832f80", "connection": "883f7dbe-d965-46ae-92d4-518757788b17", "workflows": [], "tags": [], "properties": { "seed": "/sites/DemoSite" } }
Field | Required | Default | Multiple | Notes | Example |
---|---|---|---|---|---|
seed | Yes | - | No | <seed description> | |
type | Yes | - | No | The value must be "sharepoint-online" | "sharepoint-online" |
description | Yes | - | No | Name of the seed object. | "My SharePoint Seed" |
connector | Yes | - | No | The ID of the connector to be used with this seed. The connector type must match the seed type. | "82f7f0a4-8d28-47ce-8c9d-e3ca414b0d31" |
connection | Yes | - | No | The ID of the connection to be used with this seed. The connection type must match the seed type. | "602d3700-28dd-4a6a-8b51-e4a663fe9ee6" |
workflows | No | [ ] | Yes | The IDs of the workflows that will be executed for the documents crawled. | ["f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"] |
throttlePolicy | No | - | No | ID of the throttle policy that applies to this connection object. | "f5587cee-9116-4011-b3a9-6b235b333a1b" |
routingPolicies | No | [ ] | Yes | The IDs of the routing policies that this seed will use. | ["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"] |
tags | No | [ ] | Yes | The tags of the seed. These can be used to filter the seed. | ["tag1", "tag2"] |
properties.seed | Yes | - | No | This value must be a sharepoint Seed | "/sites/DemoSite" |
{ "id": "dc2a920a-33fd-4c4f-be17-cc53c5238cbe", "seed": "/sites/DemoSite", "type": "sharepoint-online", "description": "sp_seed_api_test_updated", "connector": "a3e9e1aa-a0dd-4d12-ae2c-9085e6832f80", "connection": "883f7dbe-d965-46ae-92d4-518757788b17", "workflows": [], "tags": [], "properties": { "seed": "/sites/DemoSite" } }