Field | Required | Default | Multiple | Notes | Example | |
---|---|---|---|---|---|---|
type | Yes | - | No | The value must be "azure-data-lake". | "azure-data-lake" | |
description | Yes | - | No | Name of the credential object. | "Azure Data Lake Credential" | |
properties | Yes | - | No | Configuration object | ||
authTokenEndpointaccountName | Yes | - | No | Azure Authorization Token End Point | Storage Account name | samplestorageaccountnamehttps://login.microsoftonline.com/yourkey/oauth2/token |
appID | Yes | - | No | Azure application ID registered | sampleapplicationid | |
appSecret | Yes | - | No | Azure application secret | xxxxxxxxxxxxxxxxxxxxxxxxxx | |
accountFQDNtenantId | Yes | - | No | Fully Qualified Domain Name | Tenant ID | sampletenantidyourname.azuredatalakestore.com |
Code Block | ||||
---|---|---|---|---|
| ||||
{ "type": "<Connector Type>azure-data-lake", "description": "<Connector Name> Credential", "properties": { "authTokenEndpointaccountName": "https://login.microsoftonline.com/yourkey/oauth2/token", samplestorageaccountname", "appID": "sampleapplicationid", "appSecret": "xxxxxxxxxxxxxxxxxxxxxxxxxx", "accountFQDNtenantId": "yourname.azuredatalakestore.comsampletenantid" } } |
Field | Required | Default | Multiple | Notes | Example | |
---|---|---|---|---|---|---|
id | Yes | - | No | Id ID of the credential to update. | "2f287669-d163-4e35-ad17-6bbfe9df3778" | |
description | Yes | - | No | Name of the credential object. | "Azure Data LakeCredential" | |
properties | Yes | - | No | Configuration object | ||
authTokenEndpointaccountName | Yes | - | No | Azure Authorization Token End Point | Storage Account name | samplestorageaccountnamehttps://login.microsoftonline.com/yourkey/oauth2/token |
appID | Yes | - | No | Azure application ID registered | sampleapplicationid | |
appSecret | Yes | - | No | Azure application secret | xxxxxxxxxxxxxxxxxxxxxxxxxx | |
accountFQDNtenantId | Yes | - | No | Fully Qualified Domain Name | Tenant ID | sampletenantidyourname.azuredatalakestore.com |
Code Block | ||||
---|---|---|---|---|
| ||||
{ "idtype": "2a5ca234azure-e328-4d40-bb2a-2df3e550b065data-lake", "description": "<Connector Name> Credential", "properties": { "authTokenEndpointaccountName": "https://login.microsoftonline.com/yourkey/oauth2/tokensamplestorageaccountname", "appID": "sampleapplicationid", "appSecret": "xxxxxxxxxxxxxxxxxxxxxxxxxx", "accountFQDNtenantId": "yourname.azuredatalakestore.comsampletenantid" } } |
Field | Required | Default | Multiple | Notes | Example | |||
---|---|---|---|---|---|---|---|---|
type | Yes | - | No | The value must be azure-data-lake | azure-data-lake | |||
description | Yes | - | No | Name of the connection object. | "MyAzure Data LakeConnection" | |||
throttlePolicy | No | - | No | Id ID of the throttle policy that applies to this connection object. | "f5587cee-9116-4011-b3a9-6b235b333a1b"routingPolicies | |||
credential | NoYes | [ ]- | No | ID of the credential that applies to this connection object. | "d42e1872-02c8-4a90-a714-44f15577389a" | |||
routingPolicies | No | [ ] | Yes | The IDs | Yes | The ids of the routing policies that this connection will use. | ["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"] | |
properties | Yes | - | No | Configuration object | ||||
sourceTypescanAllFileSystems | Yes | "useRootPath"TRUE | No | Source type ("useRootPath", "useSeedsFile", "useSpecificPaths") | "useRootPath" | Select if all file systems are to be scanned | TRUE | |
fileSystemseedsFilePath | No | - | No | Only required if sourceType "useSeedsFilescanAllFileSystems" is selected. Seeds File path. | "/path/to/file" | disabled. The name of the file system. | fileSystemName1 | |
indexContainerspathCollectionsToCrawl | No | -TRUE | YesNo | Only required if sourceType "useSpecificPaths" is selected. List of path to crawl. | [{"pathCollection": "/path/to/file1"},{"pathCollection": "/path/to/file2"}] | |||
pathCollection | No | - | No | Only required if sourceType "useSpecificPaths" is selected. Path to crawl. | {"pathCollection": "/path/to/file1"} | |||
indexContainers | No | TRUE | No | Select if containers are to be indexed. Clear to index files only. | TRUE | |||
scanRecursively | No | TRUE | No | Select if subfolders are to be scanned. | TRUE | |||
Select if containers are to be indexed. Clear to index files only. | TRUE | |||||||
scanRecursively | No | TRUE | No | Select if subfolders are to be scanned. | TRUE | |||
scanExcludedItems | No | scanExcludedItems | No | FALSE | No | Select so that the scanner will scan sub items of container items excluded by a pattern | FALSE | |
includes | No | - | Yes | List of regex URL patterns to include | [{"include":".*tmp[^/]$"}] | |||
include | No | - | No | regex Regex URL patterns to include | ".*tmp[^/]$" | |||
excludes | No | - | Yes | List of regex URL patterns to exclude | [{"include":".*tmp[^/]$"}] | |||
exclude | No | - | No | regex Regex URL patterns to exclude | ".*tmp[^/]$" |
Code Block | ||||
---|---|---|---|---|
| ||||
{ "type": "<Connector Type>azure-data-lake", "description "credential": "<Connector Name> Test Connectord42e1872-02c8-4a90-a714-44f15577389a", "propertiesthrottlePolicy": { "sourceType":"useSpecificPaths", "seedsFilePathroutingPolicies": ["5c7274ef-429b-46ef-8f73-f010e479a467", "9dee4fba-14f2-4afc-a74d-297bcbbd359a"], "description": "<Connector Name> Test Connector", "pathCollectionsToCrawlproperties":[ { "scanAllFileSystems": false, "fileSystem": "fileSystemName1", {"pathCollectionindexContainers": "/path/to/file1"}true, {"pathCollectionscanRecursively": "/path/to/file2"}true, ]"scanExcludedItems": false, "indexContainersincludes": true,[ {"scanRecursivelyinclude": true,".*tmp[^/]$"} "scanExcludedItems": false, "includes": [ {"include": ".*tmp[^/]$"} ], "excludes": [ {"exclude": ".*tmp[^/]$"} ] } } |
Field | Required | Default | Multiple | Notes | Example | |||||
---|---|---|---|---|---|---|---|---|---|---|
id | Yes | - | No | Id ID of the connection to update | "89d6632a-a296-426c-adb0-d442adcab4b0", | |||||
description | No | - | No | Name of the connection object. | "MyConnection" | |||||
throttlePolicy | No | - | No | Id ID of the throttle policy that applies to this connection object. | "f5587cee-9116-4011-b3a9-6b235b333a1b" | |||||
credential | No | - | No | ID of the credential that applies to this connection object. | "d42e1872-02c8-4a90-a714-44f15577389a" | |||||
routingPolicies | No | [ ] | Yes | The ids IDs of the routing policies that this connection will use. | ["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"] | properties | Yes | - | No | Configuration object |
sourceType | Yes | "useRootPath" | No | Source type | "useRootPath" | |||||
seedsFilePath | No | - | No | Only required if sourceType "Use seed File" is selected. Seeds File path. | "/path/to/file" | |||||
pathCollectionsToCrawl | No | - | Yes | Only required if sourceType "Specific path" is selected. List of path to crawl. | [{"pathCollection": "/path/to/file1"},{"pathCollection": "/path/to/file2"}] | |||||
-d42780003b36"] | ||||||||||
properties | Yes | - | No | Configuration object | ||||||
scanAllFileSystems | Yes | TRUE | No | Select if all file systems are to be scanned | TRUE | |||||
fileSystem | pathCollectionNo | - | No | Only required if | sourceType" | Specific pathscanAllFileSystems" is | selected. Path to crawl.{"pathCollection": "/path/to/file1"}disabled. The name of the file system. | fileSystemName1 | ||
indexContainers | No | TRUE | No | Select if containers are to be indexed. Clear to index files only. | TRUE | |||||
scanRecursively | No | TRUE | No | Select if subfolders are to be scanned. | TRUE | |||||
scanExcludedItems | No | FALSE | No | Select so that the scanner will scan sub items of container items excluded by a pattern. | FALSE | |||||
includes | No | - | Yes | List of regex URL patterns to include. | [{"include":".*tmp[^/]$"}] | |||||
include | No | - | No | regex Regex URL patterns to include. | ".*tmp[^/]$" | |||||
excludes | No | - | Yes | List of regex URL patterns to exclude | [{"include":".*tmp[^/]$"}] | |||||
exclude | No | - | No | regex Regex URL patterns to exclude | ".*tmp[^/]$" |
Code Block | ||
---|---|---|
| ||
{ "id": "89d6632a-a296-426c-adb0-d442adcab4b0", "description": "<Connector Name> Test Connector", "properties": { "sourceType":"useSpecificPaths "type": "azure-data-lake", "credential": "d42e1872-02c8-4a90-a714-44f15577389a", "seedsFilePaththrottlePolicy": "", "pathCollectionsToCrawlroutingPolicies":[ {"pathCollection": "/path/to/file1"} ["5c7274ef-429b-46ef-8f73-f010e479a467", "9dee4fba-14f2-4afc-a74d-297bcbbd359a"], "description": "<Connector Name> Test Connector", {"pathCollectionproperties": { "/path/to/file2"} ] "scanAllFileSystems": false, "fileSystem": "fileSystemName1", "indexContainers": true, "scanRecursively": true, "scanExcludedItems": false, "includes": [ {"include": ".*tmp[^/]$"} ], "excludes": [ {"exclude": ".*tmp[^/]$"} ] } } |
For the creation of the Connector object using the Rest API, check this page
Field | Required | Default | Multiple | Notes | Example |
---|---|---|---|---|---|
seed | Yes | - | No | <seed description> | |
type | Yes | - | No | The value must be azure-data-lake. | azure-data-lake |
description | Yes | - | No | Name of the seed object. | "My Azure Data Lake Seed" |
connector | Yes | - | No | The id ID of the connector to be used with this seed. The connector type must match the seed type. | "82f7f0a4-8d28-47ce-8c9d-e3ca414b0d31" |
connection | Yes | - | No | The id ID of the connection to be used with this seed. The connection type must match the seed type. | "602d3700-28dd-4a6a-8b51-e4a663fe9ee6" |
workflows | No | [ ] | Yes | The ids IDs of the workflows that will be executed for the documents crawled. | ["f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"] |
throttlePolicy | No | - | No | Id ID of the throttle policy that applies to this connection object. | "f5587cee-9116-4011-b3a9-6b235b333a1b" |
routingPolicies | No | [ ] | Yes | The ids IDs of the routing policies that this seed will use. | ["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"] |
tags | No | [ ] | Yes | The tags of the seed. These can be used to filter the seed | ["tag1", "tag2"] |
properties | Yes | - | No | Configuration object | |
seed | Yes | - | No | This value must be azure_data_lake_seedazure_data_lake_seed_lake_seed | azure_data_lake_seed |
specificPath | No | - | No | Path to crawl. Not required. If “Scan all Filesystems” in the Connection was checked, this path will be ignored. | /sample/path |
Code Block | ||||
---|---|---|---|---|
| ||||
{ "type": "<Connector Type>azure-data-lake", "seed": "directory", "connector": "82f7f0a4-8d28-47ce-8c9d-e3ca414b0d31", "description": "<connector>_Test_Seed", "throttlePolicy": "6b8b5f23-fc77-47a1-9b58-106577162e7b", "routingPolicies": ["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"], "connection": "602d3700-28dd-4a6a-8b51-e4a663fe9ee6", "workflows": ["f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"], "tags": ["tag1", "tag2"], "properties": { "seed": "azure_data_lake_seed", "specificPath": "/sample/path" } } |
Field | Required | Default | Multiple | Notes | Example | ||
---|---|---|---|---|---|---|---|
id | Yes | - | No | Id ID of the seed to update. | "2f287669-d163-4e35-ad17-6bbfe9df3778" | ||
seed | No | - | No | <seed description> | |||
description | No | - | No | Name of the seed object. | "MyAzure Data LakeSeed" | ||
connector | No | - | No | The id ID of the connector to be used with this seed. The connector type must match the seed type. | "82f7f0a4-8d28-47ce-8c9d-e3ca414b0d31" | ||
connection | No | - | No | The id ID of the connection to be used with this seed. The connection type must match the seed type. | "602d3700-28dd-4a6a-8b51-e4a663fe9ee6" | ||
workflows | No | [ ] | Yes | The ids IDs of the workflows that will be executed for the documents crawled. | ["f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"] | ||
workflows.add | No | [ ] | Yes | The ids IDs of the workflows to add. | ["f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"] | ||
workflows.remove | No | [ ] | Yes | The ids IDs of the workflows to remove. | ["f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"] | ||
throttlePolicy | No | - | No | Id ID of the throttle policy that applies to this connection object. | "f5587cee-9116-4011-b3a9-6b235b333a1b" | ||
routingPolicies | No | [ ] | Yes | The ids IDs of the routing policies that this seed will use. | ["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"] | ||
routingPolicies.add | No | [ ] | Yes | The ids IDs of the routingPolicies to add. | ["b4d2579f-1a0a-4a8b-9fd4-d42780003b36"] | ||
routingPolicies.remove | No | [ ] | Yes | The ids IDs of the routingPolicies to remove. | ["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7"] | ||
tags | No | [ ] | Yes | The tags of the seed. These can be used to filter the seed | ["tag1", "tag3"] | ||
tags.add | No | [ ] | Yes | The tags to add | ["tag4"] | ||
tags.remove | No | [ ] | Yes | The tags to remove | ["tag2"] | ||
properties | Yes | - | No | Configuration object | |||
seed | Yes | - | No | This value must be azure_data_lake_seed | This value must be azure_data_lake_seedThis value must be azure_data_lake_seed | azure_data_lake_seed | |
specificPath | No | - | No | Path to crawl. Not required. If “Scan all Filesystems” in the Connection was checked, this path will be ignored. | /sample/path |
Code Block | ||||
---|---|---|---|---|
| ||||
{ "id": "2f287669-d163-4e35-ad17-6bbfe9df3778", "seed": "<seed example>", "connector": "82f7f0a4-8d28-47ce-8c9d-e3ca414b0d31", "description": "<connector>_Test_Seed", "throttlePolicy": "6b8b5f23-fc77-47a1-9b58-106577162e7b", "routingPolicies": ["313de87c-3cb9-4fe0-a2cb-17f75ce7d0c7", "b4d2579f-1a0a-4a8b-9fd4-d42780003b36"], "connection": "602d3700-28dd-4a6a-8b51-e4a663fe9ee6", "workflows": ["b255e950-1dac-46dc-8f86-1238b2fbdf27", "f8c414cb-1f5d-42ef-9cc9-5696c3f0bda4"], "tags": ["tag", "tag2"], "properties": { "seed": "azure_data_lake_seed" , "specificPath": "/sample/path", } } |