Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The Azure Data Lake Connector can be configured using the Rest API. It requires the following entities to be created:

  • Credential
  • Connection
  • Connector
  • Seed

Below are the examples of how to create the Connection and the Seed. For the Connector, please check refer to this page.

Easy Heading Free
navigationTitleOn this Page
wrapNavigationTexttrue
navigationExpandOptionexpand-all-by-default

Create Credential


Field

Required

Default

Multiple

NotesExample
typeYes-No

The value must be "azure-data-lake".

"azure-data-lake"

descriptionYes-NoName of the credential object.

"Azure Data Lake Credential"

propertiesYes-NoConfiguration object
accountNameYes-NoStorage Account namesamplestorageaccountname
appIDYes-NoAzure application ID registeredsampleapplicationid
appSecretYes-NoAzure application secretxxxxxxxxxxxxxxxxxxxxxxxxxx
tenantIdYes-NoTenant IDsampletenantid

Example

Code Block
themeRDark
titlePOST aspire/_api/credentials
{
    "type": "azure-data-lake",
    "description": "<Connector Name> Credential",
    "properties": {
		"accountName": "samplestorageaccountname",
		"appID": "sampleapplicationid",
		"appSecret": "xxxxxxxxxxxxxxxxxxxxxxxxxx",
	 	"tenantId": "sampletenantid"
     }
}

Update Credential


Field

Required

Default

Multiple

NotesExample
idYes-NoId ID of the credential to update."2f287669-d163-4e35-ad17-6bbfe9df3778"
descriptionYes-NoName of the credential object.

"Azure Data LakeCredential"

propertiesYes-NoConfiguration object
accountNameYes-NoStorage Account namesamplestorageaccountname
appIDYes-NoAzure application ID registeredsampleapplicationid
appSecretYes-NoAzure application secretxxxxxxxxxxxxxxxxxxxxxxxxxx
tenantIdYes-NoTenant IDsampletenantid

Example 

Code Block
themeRDark
titlePUT aspire/_api/credentials/2a5ca234-e328-4d40-bb2a-2df3e550b065
{
    "type": "azure-data-lake",
    "description": "<Connector Name> Credential",
    "properties": {
		"accountName": "samplestorageaccountname",
		"appID": "sampleapplicationid",
		"appSecret": "xxxxxxxxxxxxxxxxxxxxxxxxxx",
	 	"tenantId": "sampletenantid"
     }
}

Create Connection


Field

Required

Default

Multiple

Notes

Example

typeYes-No

The value must be azure-data-lake

azure-data-lake

descriptionYes-NoName of the connection object.

"MyAzure Data LakeConnection"

throttlePolicyNo-NoId ID of the throttle policy that applies to this connection object."f5587cee-9116-4011-b3a9-6b235b333a1b"
credentialYes-NoID of the credential that applies to this connection object."d42e1872-02c8-4a90-a714-44f15577389a"
routingPoliciesNo[ ]YesThe ids IDs of the routing policies that this connection will use.["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"]
propertiesYes-NoConfiguration object
scanAllFileSystemsYesTRUENoSelect if all file systems are to be scannedTRUE
fileSystemNo-NoOnly required if "scanAllFileSystems" is disabled.  The name of the file system.fileSystemName1
indexContainersNoTRUENoSelect if containers are to be indexed. Clear to index files only.TRUE
scanRecursivelyNoTRUENoSelect if subfolders are to be scanned.TRUE
scanExcludedItemsNoFALSENoSelect so that the scanner will scan sub items of container items excluded by a patternFALSE
includesNo-YesList of regex URL patterns to include[{"include":".*tmp[^/]$"}]
includeNo-Noregex Regex URL patterns to include".*tmp[^/]$"
excludesNo-YesList of regex URL patterns to exclude[{"include":".*tmp[^/]$"}]
excludeNo-Noregex Regex URL patterns to exclude".*tmp[^/]$"

Example

Code Block
themeRDark
titlePOST aspire/_api/connections
{
    "type": "azure-data-lake",
	"credential": "d42e1872-02c8-4a90-a714-44f15577389a",
    "throttlePolicy": "",
    "routingPolicies": ["5c7274ef-429b-46ef-8f73-f010e479a467", "9dee4fba-14f2-4afc-a74d-297bcbbd359a"],
    "description": "<Connector Name> Test Connector",
    "properties": {
   	 	"scanAllFileSystems": false,
	 	"fileSystem": "fileSystemName1",
        "indexContainers": true,
        "scanRecursively": true,
        "scanExcludedItems": false,
        "includes": [
            {"include": ".*tmp[^/]$"}
        ],
        "excludes": [
            {"exclude": ".*tmp[^/]$"}
        ]
    }
}

Update Connection


indexContainersNo containers indexed. Clear to index files only.scanRecursivelyTRUE

Field

Required

Default

Multiple

Notes

Example

idYes-NoId ID of the connection to update"89d6632a-a296-426c-adb0-d442adcab4b0",
descriptionNo-NoName of the connection object.

"MyConnection"

throttlePolicyNo-NoId ID of the throttle policy that applies to this connection object."f5587cee-9116-4011-b3a9-6b235b333a1b"
credentialNo-NoID of the credential that applies to this connection object."d42e1872-02c8-4a90-a714-44f15577389a"
routingPoliciesNo[ ]YesThe ids IDs of the routing policies that this connection will use.["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"]
propertiesYes-NoConfiguration object
scanAllFileSystemsYesTRUENoSelect if all file systems are to be scannedTRUE
fileSystemNo-NoSelect if subfolders are to be scanned.TRUEOnly required if "scanAllFileSystems" is disabled.  The name of the file system.fileSystemName1
indexContainersNoTRUENoSelect if containers are to be indexed. Clear to index files only.TRUE
scanRecursivelyNoTRUENoSelect if subfolders are to be scanned.TRUE
scanExcludedItemsNoFALSENoSelect so that the scanner will scan sub items of container items excluded by a pattern.FALSE
includesNo-YesList of regex URL patterns to include.[{"include":".*tmp[^/]$"}]
includeNo-Noregex Regex URL patterns to include.".*tmp[^/]$"
excludesNo-YesList of regex URL patterns to exclude[{"include":".*tmp[^/]$"}]
excludeNo-Noregex Regex URL patterns to exclude".*tmp[^/]$"

Example

Code Block
themeRDark
{
    "id": "89d6632a-a296-426c-adb0-d442adcab4b0",
	"type": "azure-data-lake",
    "credential": "d42e1872-02c8-4a90-a714-44f15577389a",
    "throttlePolicy": "",
    "routingPolicies": ["5c7274ef-429b-46ef-8f73-f010e479a467", "9dee4fba-14f2-4afc-a74d-297bcbbd359a"],
    "description": "<Connector Name> Test Connector", 
    "properties": {         
		"scanAllFileSystems": false,
	 	"fileSystem": "fileSystemName1",
        "indexContainers": true,
        "scanRecursively": true,
        "scanExcludedItems": false,
        "includes": [
            {"include": ".*tmp[^/]$"}
        ],
        "excludes": [
            {"exclude": ".*tmp[^/]$"}
        ]     
	}
}

Create Connector Instance


For the creation of the Connector object using the Rest API, check this page

Update Connector Instance


For the update of the Connector object using the Rest API, check this page

Create Seed


Field

Required

Default

Multiple

Notes

Example

seedYes-No<seed description>
typeYes-No

The value must be azure-data-lake.

azure-data-lake

descriptionYes-NoName of the seed object.

"My Azure Data Lake Seed"

connectorYes-NoThe id ID of the connector to be used with this seed. The connector type must match the seed type."82f7f0a4-8d28-47ce-8c9d-e3ca414b0d31"
connectionYes-NoThe id ID of the connection to be used with this seed. The connection type must match the seed type."602d3700-28dd-4a6a-8b51-e4a663fe9ee6"
workflowsNo[ ]YesThe ids IDs of the workflows that will be executed for the documents crawled.["f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"]
throttlePolicyNo-NoId ID of the throttle policy that applies to this connection object."f5587cee-9116-4011-b3a9-6b235b333a1b"
routingPoliciesNo[ ]YesThe ids IDs of the routing policies that this seed will use.["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"]
tagsNo[ ]YesThe tags of the seed. These can be used to filter the seed["tag1", "tag2"]
propertiesYes-NoConfiguration object
seedYes-NoThis value must be azure_data_lake_seedazure_data_lake_seed
scanAllFileSystemsspecificPathYesNotrue-NoSelect if all file systems are to be scannedfileSystemsNo-YesOnly required if "scanAllFileSystems" is disabled. List of file system names and configurations.
fileSystemNo-NoOnly required if "scanAllFileSystems" is disabled. The name of the file system.fileSystemName1
sourceTypeNo"scanAllPaths"NoSource type ("scanAllPaths", "useSeedsFile", "useSpecificPaths")"scanAllPaths"
seedsFilePathNo-NoOnly required if sourceType "useSeedsFile" is selected. Seeds File path."/path/to/file"
pathCollectionsToCrawlNo-YesOnly required if sourceType "useSpecificPaths" is selected. List of path to crawl.[{"pathCollection": "/path/to/file1"},{"pathCollection": "/path/to/file2"}]
pathCollectionNo-NoOnly required if sourceType "useSpecificPaths" is selected. Path to crawl.{"pathCollection": "/path/to/file1"}

Example 

Code Block
themeRDark
titlePOST aspire/_api/seeds
# SCAN ALL FILE SYSTEMS { "type": "azure-data-lake", "seed": "directory", "connector": "82f7f0a4-8d28-47ce-8c9d-e3ca414b0d31", "description": "<connector>_Test_Seed", "throttlePolicy": "6b8b5f23-fc77-47a1-9b58-106577162e7b", "routingPolicies": ["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"], "connection": "602d3700-28dd-4a6a-8b51-e4a663fe9ee6", "workflows":
Path to crawl. Not required. If “Scan all Filesystems” in the Connection was checked, this path will be ignored./sample/path

Example 

Code Block
themeRDark
titlePOST aspire/_api/seeds
{
    "type": "azure-data-lake",
    "seed": "directory",
    "connector": "82f7f0a4-8d28-47ce-8c9d-e3ca414b0d31",
    "description": "<connector>_Test_Seed",
    "throttlePolicy": "6b8b5f23-fc77-47a1-9b58-106577162e7b",
    "routingPolicies": ["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"],
    "connection": "602d3700-28dd-4a6a-8b51-e4a663fe9ee6",
    "workflows": ["f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"],
    "tags": ["tag1", "tag2"],
    "properties": { 
		"seed": "azure_data_lake_seed",
        "specificPath": "/sample/path"
	}
}

Update Seed


Field

Required

Default

Multiple

Notes

Example

idYes-NoID of the seed to update."2f287669-d163-4e35-ad17-6bbfe9df3778"
seedNo-No<seed description>
descriptionNo-NoName of the seed object.

"MyAzure Data LakeSeed"

connectorNo-NoThe ID of the connector to be used with this seed. The connector type must match the seed type."82f7f0a4-8d28-47ce-8c9d-e3ca414b0d31"
connectionNo-NoThe ID of the connection to be used with this seed. The connection type must match the seed type."602d3700-28dd-4a6a-8b51-e4a663fe9ee6"
workflowsNo[ ]YesThe IDs of the workflows that will be executed for the documents crawled.["f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"]
, "tags": ["tag1", "tag2"], "properties": { "seed": "azure_data_lake_seed", "scanAllFileSystems": true, } } # SCAN ALL PATHS OF THE SPECIFIC FS { "type": "azure-data-lake", "seed": "directory", "connector": "82f7f0a4-8d28-47ce-8c9d-e3ca414b0d31", "description": "<connector>_Test_Seed", "throttlePolicy": "6b8b5f23-fc77-47a1-9b58-106577162e7b", "routingPolicies": ["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"], "connection": "602d3700-28dd-4a6a-8b51-e4a663fe9ee6", "workflows": ["f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"], "tags": ["tag1", "tag2"], "properties": { "seed": "azure_data_lake_seed", "scanAllFileSystems": false, "fileSystems": [ { "fileSystem" : "fileSystem1", "sourceType":"scanAllPaths", } ] } } # USE SEEDS FILE { "type": "azure-data-lake", "seed": "directory", "connector": "82f7f0a4-8d28-47ce-8c9d-e3ca414b0d31", "description": "<connector>_Test_Seed", "throttlePolicy": "6b8b5f23-fc77-47a1-9b58-106577162e7b", "routingPolicies": ["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"], "connection": "602d3700-28dd-4a6a-8b51-e4a663fe9ee6", "workflows": ["f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"], "tags": ["tag1", "tag2"], "properties": { "seed": "azure_data_lake_seed", "scanAllFileSystems": false, "fileSystems": [ { "fileSystem" : "fileSystem1", "sourceType":"useSeedsFile", "seedsFilePath": "/path/to/file.txt" } ] } } # USE SPECIFIC PATHS { "type": "azure-data-lake", "seed": "directory", "connector": "82f7f0a4-8d28-47ce-8c9d-e3ca414b0d31", "description": "<connector>_Test_Seed", "throttlePolicy": "6b8b5f23-fc77-47a1-9b58-106577162e7b", "routingPolicies": ["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"], "connection": "602d3700-28dd-4a6a-8b51-e4a663fe9ee6", "workflows": ["f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"], "tags": ["tag1", "tag2"], "properties": { "seed": "azure_data_lake_seed", "scanAllFileSystems": false, "fileSystems": [ { "fileSystem" : "fileSystem1", "sourceType":"useSpecificPaths", "pathCollectionsToCrawl":[ {"pathCollection": "/path/to/file1"}, {"pathCollection": "/path/to/file2"} ] } ] } }

Update Seed

Field

Required

Default

Multiple

Notes

Example

idYes-NoId of the seed to update."2f287669-d163-4e35-ad17-6bbfe9df3778"seedNo-No<seed description>descriptionNo-NoName of the seed object.

"MyAzure Data LakeSeed"

connectorNo-NoThe id of the connector to be used with this seed. The connector type must match the seed type."82f7f0a4-8d28-47ce-8c9d-e3ca414b0d31"connectionNo-NoThe id of the connection to be used with this seed. The connection type must match the seed type."602d3700-28dd-4a6a-8b51-e4a663fe9ee6"workflowsNo[ ]YesThe ids of the workflows that will be executed for the documents crawled.["f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"]workflows.addNo[ ]YesThe ids of the workflows to add.["f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"]workflows.removeNo[ ]YesThe ids of the workflows to remove.["f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"]throttlePolicyNo-NoId of the throttle policy that applies to this connection object."f5587cee-9116-4011-b3a9-6b235b333a1b"routingPoliciesNo[ ]YesThe ids of the routing policies that this seed will use.["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"]routingPolicies.addNo[ ]YesThe ids of the routingPolicies to add.["b4d2579f-1a0a-4a8b-9fd4-d42780003b36"]routingPolicies.removeNo[ ]YesThe ids of the routingPolicies to remove.["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7"]tagsNo[ ]YesThe tags of the seed. These can be used to filter the seed["tag1", "tag3"]tags.addNo[ ]YesThe tags to add["tag4"]tags.removeNo[ ]YesThe tags to remove["tag2"]propertiesYes-NoConfiguration objectseedYes-NoThis value must be azure_data_lake_seedazure_data_lake_seedscanAllFileSystemsYestrueNoSelect if all file systems are to be scannedfileSystemsNo-YesOnly required if "scanAllFileSystems" is disabled. List of file system names and configurations.
workflows.addNo[ ]YesThe IDs of the workflows to add.["f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"]
workflows.removeNo[ ]YesThe IDs of the workflows to remove.["f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"]
throttlePolicyNo-NoID of the throttle policy that applies to this connection object."f5587cee-9116-4011-b3a9-6b235b333a1b"
routingPoliciesNo[ ]YesThe IDs of the routing policies that this seed will use.["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"]
routingPolicies.addNo[ ]YesThe IDs of the routingPolicies to add.["b4d2579f-1a0a-4a8b-9fd4-d42780003b36"]
routingPolicies.removeNo[ ]YesThe IDs of the routingPolicies to remove.["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7"]
tagsNo[ ]YesThe tags of the seed. These can be used to filter the seed["tag1", "tag3"]
tags.addNo[ ]YesThe tags to add["tag4"]
tags.removeNo[ ]YesThe tags to remove["tag2"]
propertiesYes-NoConfiguration object
seedYes-NoThis value must be azure_data_lake_seedazure_data_lake_seed
specificPathNo-NoPath to crawl. Not required. If “Scan all Filesystems” in the Connection was checked, this path will be ignored./sample/path

Example 

Code Block
themeRDark
titlePUT aspire/_api/seeds/2f287669-d163-4e35-ad17-6bbfe9df3778
{
    "id": "2f287669-d163-4e35-ad17-6bbfe9df3778",
    "seed": "<seed example>",
    "connector": "82f7f0a4-8d28-47ce-8c9d-e3ca414b0d31",
    "description": "<connector>_Test_Seed",
    "throttlePolicy": "6b8b5f23-fc77-47a1-9b58-106577162e7b",
    "routingPolicies": ["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"],
    "connection": "602d3700-28dd-4a6a-8b51-e4a663fe9ee6",
    "workflows": ["b255e950-1dac-46dc-8f86-1238b2fbdf27", "f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"],
    "tags": ["tag", "tag2"],
    "properties": { 
		"seed": "azure_data_lake_seed",
        "specificPath": "/sample/path",
fileSystemNo-NoOnly required if "scanAllFileSystems" is disabled. The name of the file system.fileSystemName1sourceTypeNo"scanAllPaths"NoSource type ("scanAllPaths", "useSeedsFile", "useSpecificPaths")"scanAllPaths"seedsFilePathNo-NoOnly required if sourceType "useSeedsFile" is selected. Seeds File path."/path/to/file"pathCollectionsToCrawlNo-YesOnly required if sourceType "useSpecificPaths" is selected. List of path to crawl.[{"pathCollection": "/path/to/file1"},{"pathCollection": "/path/to/file2"}]pathCollectionNo-NoOnly required if sourceType "useSpecificPaths" is selected. Path to crawl.{"pathCollection": "/path/to/file1"}

Example 

Code Block
themeRDark
titlePUT aspire/_api/seeds/2f287669-d163-4e35-ad17-6bbfe9df3778
{
    "id": "2f287669-d163-4e35-ad17-6bbfe9df3778",
    "seed": "<seed example>",
    "connector": "82f7f0a4-8d28-47ce-8c9d-e3ca414b0d31",
    "description": "<connector>_Test_Seed",
    "throttlePolicy": "6b8b5f23-fc77-47a1-9b58-106577162e7b",
    "routingPolicies": ["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"],
    "connection": "602d3700-28dd-4a6a-8b51-e4a663fe9ee6",
    "workflows": ["b255e950-1dac-46dc-8f86-1238b2fbdf27", "f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"],
    "tags": ["tag", "tag2"],
    "properties": { 
		"seed": "azure_data_lake_seed",
        "scanAllFileSystems": false,
        "fileSystems": [
            {
                "fileSystem" : "fileSystem1",
                "sourceType":"useSpecificPaths",
                "pathCollectionsToCrawl":[
                    {"pathCollection": "/path/to/file1"},
                    {"pathCollection": "/path/to/file2"}
                ] 
            }
        ]
	 	
	}
}