The Azure Data Lake Connector can be configured using the Aspire Admin UI. It requires the following entities to be created:
Credential
Connection
Connector
Seed
Create Credential
On the Aspire Admin UI go to the credentials page
All existing credentials will be listed. Click on the new button
Enter the new credential description.
Select Azure Data Lake from the Type list.
Account Name: Storage Account name
Application ID: Client ID for your application
Application Secret:Key supplied by Azure
Tenant ID:Tenant ID for your Application
Create Connection
On the Aspire Admin UI go to the connections page
All existing connections will be listed. Click on the new button
Enter the new connection description.
Select Azure Data Lake from the Type list.
Index Containers: Select if folders are to be indexed
Scan Recursively:Select if sub-folder are to be scanned
Scan Excluded Items:If selected, the scanner will scan sub items of container items that have been excluded by a pattern (because it matches an exclude pattern or because it doesn't match an include pattern)
Include patterns: Specify regex display URL patterns to include
Exclude patterns: Specify regex display URL patterns to exclude
Create Connector Instance
For the creation of the Connector object using the Admin UI check thispage.
Create Seed
On the Aspire Admin UI go to the seeds page
All existing seed will be listed. Click on the new button
Enter the new seed description.
Select Azure Data Lake from the Type list.
Select if all file systems are to be scanned
File System Name: Specify the name of that specific file system
Scan All Paths: Within this option, connector will crawl from root directory of that specific file system.
Use Seeds File: This option will allow to collect paths from a supplied file location, very useful if paths will be constantly changing and controlled by a 3rd party process. Paths should be listed one per line in a form of/folder/sub-folder
For Windows: D:\folder\folder1\paths.txt
For Linux: /home/user/folder/folder1/paths.txt
Specific Paths: This option will allow submit N paths. Admin is able to supply as many paths in a format of/folder/sub-folder