The Azure Data Lake Connector will crawl content from a Azure Data Lake Store cloud at either root or specified paths.

Introduction

An Azure Data Lake makes it easy for developers, data scientists, and analysts to store data of any size, shape, and speed, for all types of processing and analytics across platforms. It removes the complexities of storing data while making it faster to get up and running with batch, streaming, and interactive analytics. Azure Data Lake works with existing. It integrates seamlessly with operational stores and data warehouses so you can extend current data applications.

For more information about the Azure Data Lake Store, see the official Microsoft Overview of Azure Data Lake Store documentation.

Environment and Access Requirements

Repository Support

The Azure Data Lake Connector supports crawling the following repositories

Repository	Version	Connector Version
Azure Data Lake Storage		5.1

Environment Requirements

Before installing the Azure Data Lake connector, make sure that:

You have created the necessary Service-to-Service Application account with pertinent access to your Data Lake.
The Azure Data Lake is up and running.
You have Admin rights to allow Read and Execute permissions on folders to Crawl.

User Account Requirements

In order to access the Azure Data Lake, an Application Account with sufficient privileges must be supplied. The following fields must be configured in order to set up a new Data Lake connection:

Authorization Token End Point: p.e. https://login.microsoftonline.com/[yourkey]/oauth2/token
Application ID
Application Secret (Application Key)
Fully Qualified Domain Name (FQDN): p.e [yourdomain].azuredatalakestore.net. No HTTP prefix is required

Get an Application Account

See Microsoft's Use portal to create an Azure Active Directory application and service principal that can access resources for the steps on how to properly create an Application ID and its key. Make sure to write down your Application Key at the time of creation. It will not be shown again after you exit the portal. Important: Make sure to grant the necessary Reader access to your application.
This connector uses a OAuth 2.0 authorization via Token End Point. Azure will supply this authorization. See Microsoft's Step 4: Get the OAuth 2.0 token endpoint (only for Java-based applications). After these steps are completed, you will have created a valid Application.
Make sure to grant Read and Execute access (at least) to files and folders to crawl. See Microsoft's Step 3: Assign the Azure AD application to the Azure Data Lake Store account file or folder.
Follow the recommended Advance Features of the Data Lake File Explorer to recursively apply the same parent folder permissions to sub-folders using the "Apply folder permissions to sub-folders" option. The Application does not have access to any specific folder. Aspire will log this warning during the crawl process

Framework and Connector Features

Framework Features

Name	Supported
Content Crawling	Yes
Identity Crawling	Yes
Snapshot-based Incrementals	Yes
Non-snapshot-based Incrementals	No
Document Hierarchy	Yes

Connector Features

The Azure Data Lake connector has the following features:

Performs incremental crawling (so that only new/updated documents are indexed)
Fetches Object ACLs (Access Control Lists) for Azure document-level security
Runs from any machine with access to the given Data Lake source
Service-to-Service Authentication via OAuth 2.0 token

Content Crawled

The Azure Data Lake connector is able to crawl the following objects:

Name	Type	Relevant Metadata	Content Fetch and Extraction	Description
Folders	container		N/A	The directories of the files. Each directory will be scanned to retrieve more subfolders or documents
Files	document		Yes	Files stored in folders/subfolders

Page tree

Introduction

Environment and Access Requirements

Repository Support

Environment Requirements

Get an Application Account

Framework and Connector Features

Framework Features

Connector Features

Content Crawled

Contact Us: [email protected]

Page tree

Azure Data Lake Connector - Features

Introduction

Environment and Access Requirements

Repository Support

Environment Requirements

Get an Application Account

Framework and Connector Features

Framework Features

Connector Features

Content Crawled

Contact Us: [email protected]