The SharePoint Online Connector can be configured using the Rest API. It requires the following entities to be created:

  • Credentials
  • Connector
  • Connection
  • Seed

Below are the examples of how to create the Credentials, the Connection and the Seed. For the Connector, please refer to this page.


Create Credentials


FieldRequiredDefaultMultipleNotesExample
typeYes-NoThe value must be "sharepoint-online"."sharepoint-online"
descriptionYes-NoShort description or name for the credential."SPO_Prod_Credentials"
throttlePolicyNo-NoThrottle policy ID that will affect all seeds that use this credential.

"4f07e6c5-d2e9-48d8-a55d-094917508c0f"

properties.useAzureAuthenticationYes-NoWhether credentials are Azure AD Application (true) credentials or User account credentials (false).

true

properties.tenantDomainYes-NoOnly required if useAzureAuthentication is true. Tenant domain the Azure AD Application is part of. "contoso.onmicrosoft.com"
properties.clientIdYes-NoOnly required if useAzureAuthentication is true. Azure AD Application ID."1a1a-2b2b2-2cc22-aaa456"
properties.certificatePathYes-NoOnly required if useAzureAuthentication is true. Path to the certificate file. This path has to be accessible by all worker nodes that will use these credentials."${dist.data.dir}/${app.name}/certificates/certificate.cer"
properties.privateKeyPathYes-NoOnly required if useAzureAuthentication is true.  Path to the certificate private key file. This path has to be accessible by all worker nodes that will use these credentials."${dist.data.dir}/${app.name}/keys/key.key"
properties.username Yes-NoOnly required if useAzureAuthentication is false. Username for the crawl account when using user/password credentials."[email protected]"
properties.passwordYes-NoOnly required if useAzureAuthentication is false. Password for the crawl account when using user/password credentials. This field can also be sent as an encrypted string. Check the encryption API for more information on how to encrypt.

"password"

Example

POST aspire/_api/credentials
{
    "type": "sharepoint-online",
    "description": "SPO_Cred",
    "properties": {
        "useAzureAuthentication": true,
        "tenantDomain": "cao365.onmicrosoft.com",
        "clientId": "12345-abcd",
        "certificatePath": "${dist.data.dir}/${app.name}/certificates/certificate.cer",
        "privateKeyPath": "${dist.data.dir}/${app.name}/keys/key.key"
    }
}

Update Credentials


FieldRequiredDefaultMultipleNotesExample
idYes-NoID of the credential to update"d42e1872-02c8-4a90-a714-44f15577389a"
typeYes-NoThe value must be "sharepoint-online"."sharepoint-online"
descriptionNo-NoShort description or name for the credential."SPO_Prod_Credentials"
throttlePolicyNo-NoThrottle policy ID that will affect all seeds that use this credential.

"4f07e6c5-d2e9-48d8-a55d-094917508c0f"

properties.useAzureAuthenticationNo-NoWhether credentials are Azure AD Application (true) credentials or User account credentials (false).

true

properties.tenantDomainNo-NoOnly required if useAzureAuthentication is true. Tenant domain the Azure AD Application is part of. "contoso.onmicrosoft.com"
properties.clientIdNo-NoOnly required if useAzureAuthentication is true. Azure AD Application ID."1a1a-2b2b2-2cc22-aaa456"
properties.certificatePathNo-NoOnly required if useAzureAuthentication is true. Path to the certificate file. This path has to be accessible by all worker nodes that will use these credentials."${dist.data.dir}/${app.name}/certificates/certificate.cer"
properties.privateKeyPathNo-NoOnly required if useAzureAuthentication is true.  Path to the certificate private key file. This path has to be accessible by all worker nodes that will use these credentials."${dist.data.dir}/${app.name}/keys/key.key"
properties.usernameNo-NoOnly required if useAzureAuthentication is false. Username for the crawl account when using user/password credentials."aspire_crawl_account@contoso.onmicrosoft.com"
properties.passwordNo-NoOnly required if useAzureAuthentication is false. Password for the crawl account when using user/password credentials. This field can also be sent as an encrypted string. Check the encryption API for more information on how to encrypt."password"


Example

PUT /aspire/_api/credentials/d42e1872-02c8-4a90-a714-44f15577389a
{
    "id": "d42e1872-02c8-4a90-a714-44f15577389a",
    "type": "sharepoint-online",
    "description": "SPO_Cred_new",
    "throttlePolicy": "4f07e6c5-d2e9-48d8-a55d-094917508c0f",
    "properties": {
        "useAzureAuthentication": false,
        "username": "aspire_crawl_account",
        "password": "encrypted:F8EC95B8007E645EA15E2D5F717EB370DAA4274D73D767DA0AF0F3AB94B8430C81AF774D2A272833C3AA77C4776B75B6"
    }
}

Create Connector


For the creation of the Connector object using the Rest API, refer to this page.

Update Connector


For the update of the Connector object using the Rest API, refer to this page.

Create Connection


FieldRequiredDefaultMultipleNotesExample
typeyes
noThe value must be "sharepoint-online""sharepoint-online"
descriptionyes
noName of the connection object."SPO_Connection"
credentialyes
noID of the credential that applies to this connection object."d42e1872-02c8-4a90-a714-44f15577389a"
throttlePolicyno
noID of the throttle policy that applies to this connection object."4f07e6c5-d2e9-48d8-a55d-094917508c0f"
routingPoliciesno[ ]yesThe IDs of the routing policies that this connection will use.["5c7274ef-429b-46ef-8f73-f010e479a467", "9dee4fba-14f2-4afc-a74d-297bcbbd359a"]
properties.serverUrlyes
noHostname where SharePoint is located."https://contoso.sharepoint.com"
properties.indexContainersnofalsenoIndicates if we should index the SharePoint containers.false
properties.useSnapshotsnofalsenoIndicates if we should process incremental crawls using Aspire Snapshots instead of SharePoint's Change Log.false
properties.stopCrawlOnScannerErrornotruenoIndicates if we should stop the crawl if an error occurs during the scan phase.true
properties.filterNoCrawlnofalsenoIndicates if the crawl will exclude sites and lists using SharePoint's NoCrawl property.false
properties.crawlAttachmentsnotruenoIndicates if we should crawl List Item attachments. true
properties.logRequestAndResponsesnofalsenoCheck to add debug log information about the rest requests and their responses.false
properties.scanExcludedItemsnofalsenoIndicates if we should force the scan of excluded directories, so child items within the scope can be found.false
properties.replaceContentnofalsenoIndicates if the content field of the .aspx items will be filled with the contents of the CanvasContent1/WikiField fields. If CanvasContent1 and WikiField fields are empty, no contents will be assigned to the content field.false
properties.downloadLargeFilesnotruenoIndicates if we should download files that exceed the Data Size Threshold to disc instead of leaving the connection open.true
properties.dataSizeThresholdno"100mb"noUsed as a limit to leave any document with size lower than this limit in memory. If the size is higher the connection to SharePoint will be open until the content is consumed otherwise the content will be downloaded to a temporary file."100mb"
properties.includesno
yesThe document will be processed by the connector if it matches one of the following patterns.[".*\\.pdf",".*\\.pptx"]
properties.excludesno
yesThe document will not be processed by the connector if it matches one of the following patterns.".*\\.xml"
properties.groupPrefixSeparatorno"|"noPrefix used to separate users and groups on ACL's."|"
properties.lowercaseGEnofalsenoIndicates if entries extracted from the cache groups process will be on lower case.false
properties.userGroupPageSizeno1000noPage size for fetching users and groups.1000
properties.useAzureGroupsnofalsenoIndicates if Azure AD group expansion will be used when caching groups for expansion.true
properties.azureADSeedno
noAzure AD seed to be used for group expansion purposes."f5587cee-9116-4011-b3a9-6b235b333a1b"
properties.useProxynofalsenoIndicates if a proxy is required for connecting to Sharepoint.true
properties.proxyHostno
noThe proxy hostname."proxy.com"
properties.proxyPortno
noThe proxy port.8080
properties.useProxyAuthenticationnofalsenoCheck if the proxy requires authentication.true
properties.proxyDomainyes, if using proxy authentication
noThe domain used to authenticate to the proxy."DIR"
properties.proxyUseryes, if using proxy authentication
noThe username used to authenticate to the proxy."proxy_user"
properties.proxyPasswordyes, if using proxy authentication
noThe password used to authenticate to the proxy."thePassword"
properties.requestPropertyno
noExtra HTTP headers to be included with the requests.[{"name": "user-agent","value": "agent"}]
properties.retryCountno2noNumber of retries for failed requests.2
properties.retrySleepno"500ms"noTime period to wait in between failed request retries."500ms"
properties.socketTimeoutno"60s"no

Time period of inactivity to wait for packets to arrive. Defaults to ms.

e.g. 5ms, 5s, 5m, 5h or 5d

"60s"
properties.connectTimeoutno
no

Time period to wait to establish a connection with the remote host. Defaults to ms.

e.g. 5ms, 5s, 5m, 5h or 5d

"60s"
properties.connectionRequestTimeoutno
no

Time period to wait to fetch a connection from the connection pool. Defaults to ms.

e.g. 5ms, 5s, 5m, 5h or 5d

"60s"
properties.idleConnectionTimeoutno
no

Time period to wait to close an idle connection. Defaults to ms.

e.g. 5ms, 5s, 5m, 5h or 5d

"5m"
properties.maxConnectionsno100noMaximum number of open connections.100
properties.maxConnectionsPerRouteno10noMaximum number of open connections per route.10

Example

POST /aspire/_api/connections
{
    "type": "sharepoint-online",
    "description": "SPO_Connection",
    "credential": "d42e1872-02c8-4a90-a714-44f15577389a",
    "throttlePolicy": "4f07e6c5-d2e9-48d8-a55d-094917508c0f",
    "routingPolicies": ["5c7274ef-429b-46ef-8f73-f010e479a467", "9dee4fba-14f2-4afc-a74d-297bcbbd359a"],
    "properties": {
        "serverUrl": "https://contoso.sharepoint.com",
        "indexContainers": false,
        "useSnapshots": false,
        "stopCrawlOnScannerError": true,
        "filterNoCrawl": false,
        "crawlAttachments": true,
        "logRequestAndResponses": false,
        "scanExcludedItems": false,
        "replaceContent": false,
        "downloadLargeFiles": true,
         "dataSizeThreshold": "100mb",
        "includes": [
            ".*\\.pdf",
            ".*\\.pptx"
        ],
        "excludes": ".*\\.xml",
        "groupPrefixSeparator": "|",
        "lowercaseGE": false,
        "userGroupPageSize": 1000,
        "useAzureGroups": true,
        "azureADSeed": "f5587cee-9116-4011-b3a9-6b235b333a1b",
        "useProxy": true,
        "proxyHost": "proxy.com",
        "proxyPort": 8080,
        "useProxyAuthentication": true,
        "proxyDomain": "DIR",
        "proxyUser": "proxy_user",
        "proxyPassword": "thePassword",
        "requestProperty": [
            {
                "name": "user-agent",
                "value": "agent"
            }
        ],
        "retryCount": 2,
        "retrySleep": "500ms",
        "socketTimeout": "60s",
        "connectTimeout": "60s",
        "connectionRequestTimeout": "60s",
        "idleConnectionTimeout": "5m",
        "maxConnections": 100,
        "maxConnectionsPerRoute": 10
    }
}

Update Connection

FieldRequiredDefaultMultipleNotesExample
typeyes
noThe value must be "sharepoint-online""sharepoint-online"
descriptionyes
noName of the connection object."SPO_Connection"
credentialyes
noID of the credential that applies to this connection object."d42e1872-02c8-4a90-a714-44f15577389a"
throttlePolicyno
noID of the throttle policy that applies to this connection object."4f07e6c5-d2e9-48d8-a55d-094917508c0f"
routingPoliciesno[ ]yesThe IDs of the routing policies that this connection will use.["5c7274ef-429b-46ef-8f73-f010e479a467", "9dee4fba-14f2-4afc-a74d-297bcbbd359a"]
properties.serverUrlyes
noHostname where SharePoint is located."https://contoso.sharepoint.com"
properties.indexContainersnofalsenoIndicates if we should index the SharePoint containers.false
properties.useSnapshotsnofalsenoIndicates if we should process incremental crawls using Aspire Snapshots instead of SharePoint's Change Log.false
properties.stopCrawlOnScannerErrornotruenoIndicates if we should stop the crawl if an error occurs during the scan phase.true
properties.filterNoCrawlnofalsenoIndicates if the crawl will exclude sites and lists using SharePoint's NoCrawl property.false
properties.crawlAttachmentsnotruenoIndicates if we should crawl List Item attachments. true
properties.logRequestAndResponsesnofalsenoCheck to add debug log information about the rest requests and their responses.false
properties.scanExcludedItemsnofalsenoIndicates if we should force the scan of excluded directories, so child items within the scope can be found.false
properties.replaceContentnofalsenoIndicates if the content field of the .aspx items will be filled with the contents of the CanvasContent1/WikiField fields. If CanvasContent1 and WikiField fields are empty, no contents will be assigned to the content field.false
properties.downloadLargeFilesnotruenoIndicates if we should download files that exceed the Data Size Threshold to disc instead of leaving the connection open.true
properties.dataSizeThresholdno"100mb"noUsed as a limit to leave any document with size lower than this limit in memory. If the size is higher the connection to SharePoint will be open until the content is consumed otherwise the content will be downloaded to a temporary file."100mb"
properties.includesno
yesThe document will be processed by the connector if it matches one of the following patterns.[".*\\.pdf",".*\\.pptx"]
properties.excludesno
yesThe document will not be processed by the connector if it matches one of the following patterns.".*\\.xml"
properties.groupPrefixSeparatorno"|"noPrefix used to separate users and groups on ACL's."|"
properties.lowercaseGEnofalsenoIndicates if entries extracted from the cache groups process will be on lower case.false
properties.userGroupPageSizeno1000noPage size for fetching users and groups.1000
properties.useAzureGroupsnofalsenoIndicates if Azure AD group expansion will be used when caching groups for expansion.true
properties.azureADSeedno
noAzure AD seed to be used for group expansion purposes."f5587cee-9116-4011-b3a9-6b235b333a1b"
properties.useProxynofalsenoIndicates if a proxy is required for connecting to Sharepoint.true
properties.proxyHostno
noThe proxy hostname."proxy.com"
properties.proxyPortno
noThe proxy port.8080
properties.useProxyAuthenticationnofalsenoCheck if the proxy requires authentication.true
properties.proxyDomainyes, if using proxy authentication
noThe domain used to authenticate to the proxy."DIR"
properties.proxyUseryes, if using proxy authentication
noThe username used to authenticate to the proxy."proxy_user"
properties.proxyPasswordyes, if using proxy authentication
noThe password used to authenticate to the proxy."thePassword"
properties.requestPropertyno
noExtra HTTP headers to be included with the requests.[{"name": "user-agent","value": "agent"}]
properties.retryCountno2noNumber of retries for failed requests.2
properties.retrySleepno"500ms"noTime period to wait in between failed request retries."500ms"
properties.socketTimeoutno"60s"no

Time period of inactivity to wait for packets to arrive. Defaults to ms.

e.g. 5ms, 5s, 5m, 5h or 5d

"60s"
properties.connectTimeoutno
no

Time period to wait to establish a connection with the remote host. Defaults to ms.

e.g. 5ms, 5s, 5m, 5h or 5d

"60s"
properties.connectionRequestTimeoutno
no

Time period to wait to fetch a connection from the connection pool. Defaults to ms.

e.g. 5ms, 5s, 5m, 5h or 5d

"60s"
properties.idleConnectionTimeoutno
no

Time period to wait to close an idle connection. Defaults to ms.

e.g. 5ms, 5s, 5m, 5h or 5d

"5m"
properties.maxConnectionsno100noMaximum number of open connections.100
properties.maxConnectionsPerRouteno10noMaximum number of open connections per route.10

Example

PUT /aspire/_api/connections/89d6632a-a296-426c-adb0-d442adcab4b0
{
	"id": "89d6632a-a296-426c-adb0-d442adcab4b0",
    "type": "sharepoint-online",
    "description": "SPO_Connection",
    "credential": "d42e1872-02c8-4a90-a714-44f15577389a",
    "throttlePolicy": "4f07e6c5-d2e9-48d8-a55d-094917508c0f",
    "routingPolicies": ["5c7274ef-429b-46ef-8f73-f010e479a467", "9dee4fba-14f2-4afc-a74d-297bcbbd359a"],
    "properties": {
        "serverUrl": "https://contoso.sharepoint.com",
        "indexContainers": false,
        "useSnapshots": false,
        "stopCrawlOnScannerError": true,
        "filterNoCrawl": false,
        "crawlAttachments": true,
        "logRequestAndResponses": false,
        "scanExcludedItems": false,
        "replaceContent": false,
        "downloadLargeFiles": true,
        "dataSizeThreshold": "100mb",
        "includes": [
            ".*\\.pdf",
            ".*\\.pptx"
        ],
        "excludes": ".*\\.xml",
        "groupPrefixSeparator": "|",
        "lowercaseGE": false,
        "userGroupPageSize": 1000,
        "useAzureGroups": true,
        "azureADSeed": "f5587cee-9116-4011-b3a9-6b235b333a1b",
        "useProxy": true,
        "proxyHost": "proxy.com",
        "proxyPort": 8080,
        "useProxyAuthentication": true,
        "proxyDomain": "DIR",
        "proxyUser": "proxy_user",
        "proxyPassword": "thePassword",
        "requestProperty": [
            {
                "name": "user-agent",
                "value": "agent"
            }
        ],
        "retryCount": 2,
        "retrySleep": "500ms",
        "socketTimeout": "60s",
        "connectTimeout": "60s",
        "connectionRequestTimeout": "60s",
        "idleConnectionTimeout": "5m",
        "maxConnections": 100,
        "maxConnectionsPerRoute": 10
    }
}

Create Seed

FieldRequiredDefaultMultipleNotesExample
seedYes-No<seed description>
typeYes-No

The value must be "sharepoint-online"

"sharepoint-online"

descriptionYes-NoName of the seed object.

"My SharePoint Seed"

connectorYes-NoThe ID of the connector to be used with this seed. The connector type must match the seed type."82f7f0a4-8d28-47ce-8c9d-e3ca414b0d31"
connectionYes-NoThe ID of the connection to be used with this seed. The connection type must match the seed type."602d3700-28dd-4a6a-8b51-e4a663fe9ee6"
workflowsNo[ ]YesThe IDs of the workflows that will be executed for the documents crawled.["f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"]
throttlePolicyNo-NoID of the throttle policy that applies to this connection object."f5587cee-9116-4011-b3a9-6b235b333a1b"
routingPoliciesNo[ ]YesThe IDs of the routing policies that this seed will use.["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"]
tagsNo[ ]YesThe tags of the seed. These can be used to filter the seed.["tag1", "tag2"]
properties.seedYes-NoThis value must be a sharepoint Seed"/sites/DemoSite"

Example

POST /aspire/_api/seeds
 {
            "seed": "/sites/DemoSite",
            "type": "sharepoint-online",
            "description": "sp_seed_api_test",
            "connector": "a3e9e1aa-a0dd-4d12-ae2c-9085e6832f80",
            "connection": "883f7dbe-d965-46ae-92d4-518757788b17",
            "workflows": [],
            "tags": [],
            "properties": {
                "seed": "/sites/DemoSite"
            }
}

Update Seed

FieldRequiredDefaultMultipleNotesExample
seedYes-No<seed description>
typeYes-No

The value must be "sharepoint-online"

"sharepoint-online"

descriptionYes-NoName of the seed object.

"My SharePoint Seed"

connectorYes-NoThe ID of the connector to be used with this seed. The connector type must match the seed type."82f7f0a4-8d28-47ce-8c9d-e3ca414b0d31"
connectionYes-NoThe ID of the connection to be used with this seed. The connection type must match the seed type."602d3700-28dd-4a6a-8b51-e4a663fe9ee6"
workflowsNo[ ]YesThe IDs of the workflows that will be executed for the documents crawled.["f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"]
throttlePolicyNo-NoID of the throttle policy that applies to this connection object."f5587cee-9116-4011-b3a9-6b235b333a1b"
routingPoliciesNo[ ]YesThe IDs of the routing policies that this seed will use.["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"]
tagsNo[ ]YesThe tags of the seed. These can be used to filter the seed.["tag1", "tag2"]
properties.seedYes-NoThis value must be a sharepoint Seed"/sites/DemoSite"

Example

PUT /aspire/_api/seeds/dc2a920a-33fd-4c4f-be17-cc53c5238cbe
 {
            "id": "dc2a920a-33fd-4c4f-be17-cc53c5238cbe",
            "seed": "/sites/DemoSite",
            "type": "sharepoint-online",
            "description": "sp_seed_api_test_updated",
            "connector": "a3e9e1aa-a0dd-4d12-ae2c-9085e6832f80",
            "connection": "883f7dbe-d965-46ae-92d4-518757788b17",
            "workflows": [],
            "tags": [],
            "properties": {
                "seed": "/sites/DemoSite"
            }
}
  • No labels