The Azure Data Lake Connector will crawl content from a an Azure Data Lake Store cloud at either root Storage Gen2 for either all file systems or specified file system and paths.
Easy Heading Free | ||||||
---|---|---|---|---|---|---|
|
Azure Data Lake
makes it easy for developers, data scientists, and analysts to store data of any size, shape, and speed, for all types of processing and analytics across platforms. It removes the complexities of storing data while making it faster to get up and running with batch, streaming, and interactive analytics. Azure Data Lake works with existing. It integrates seamlessly with operational stores and data warehouses so you can extend current data applications.Storage is a comprehensive, scalable, and efficient data lake solution designed for big data analysis, and it provides a hierarchical file system. It brings the capabilities of Azure Data Lake Storage Gen1 together with the Azure Blob storage.
For more information about the Azure Data Lake
StoreStorage Gen 2, see the official
MicrosoftMicrosoft Overview of Azure Data Lake
StoreStorage Gen2 documentation.
The Azure Data Lake Connector supports crawling the following the repositories
Repository | Version | Connector Version |
---|
Azure Data Lake Storage | Gen 2 | 5. |
1 |
Before installing the Azure Data Lake connector, make sure that:
To access the Azure Data Lake,
anan Application Account with sufficient privileges must be supplied. The following fields must be configured
in orderto set up a new Data Lake connection:
Authorization Token End Point: p.e. https://login.microsoftonline.com/[yourkey]/oauth2/tokenFully Qualified Domain Name (FQDN): p.e [yourdomain].azuredatalakestore.net. No HTTP prefix is required
Following are the steps on how to get the required credentials:
Name | Supported |
---|---|
Content Crawling | Yes |
Identity Crawling | YesUse Azure Identity Connector |
Snapshot-based Incrementals | Yes |
Non-snapshot-based Incrementals | No |
Document Hierarchy | Yes |
The Azure Data Lake connector has the following features:
The Azure The Azure Data Lake connector is able to can crawl the following objects:
Name | Type | Relevant Metadata | Content Fetch and Extraction | Description |
---|---|---|---|---|
File System | container | N/A | Contains Folders and files | |
Folders | container | N/A | The directories of the files. Each directory will be scanned to retrieve more subfolders or documents. | |
FilesDocuments | document | Yes | Documents Files stored in folders/subfolders |