You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Current »

Azure Data Lake Source 

Connector will crawl Folders and Files (configuration dependent). Execution will result on the following fields to be populated:


PropertyTypeDescription

Fullname

StringFull path of the directory or file (from root path "/")
NameStringFile name (minus the path) of the directory or file
Length

Long

Length of a file (does not apply for directories
GroupStringID of the group that owns this file/directory

User

StringID of the user that owns this file/directory

Permission

StringUnix-style permission string for this file or directory

Last Access Time

DateDate Time of the last time the file was accessed

AclBit

Boolean Flag indicating whether file has ACLs set on it

Block Size

LongBlock size reported by server

Expiry Time

DateDate Time at which the file expires, as UTC time

ReplicationFactor

IntReplication Factor reported by server

isContainer

BooleanIndicates "true" if is a directory, otherwise File

Fetch Url

StringAzure Data Lake full Absolute Path including FQDN. adl://[yourdomain].azuredatalakestore.net/full/path/to.file

Last Modified Date

DateDate Time of the last time the file was modified

Acls

ACL ArrayList of access for file or folder


The following code block will show console output example of crawling of a folder called /test located at root of testing Data Lake Storage adl://dlsjose.azuredatalakestore.net


2018-06-04T17:30:26Z INFO [/aspire_azuredatalakestore/Main/CrawlController]: Received job - action: start
2018-06-04T17:30:26Z INFO [/aspire_azuredatalakestore/Main/CrawlController]: Initializing crawl: 1528133426127
2018-06-04T17:30:26Z INFO [/aspire_azuredatalakestore/Main/CrawlController]: Initializing statistics for crawl: 1528133426127
2018-06-04T17:30:26Z INFO [/aspire_azuredatalakestore/Main/CrawlController]: Clearing queues, snapshot, hierarchy and intersection acls - please wait...
2018-06-04T17:30:26Z INFO [/aspire_azuredatalakestore/Main/CrawlController]: Clearing queues, snapshot, hierarchy and intersection acls took 200 ms
2018-06-04T17:30:26Z INFO [/aspire_azuredatalakestore/Main/CrawlController]: Offering crawl root
2018-06-04T17:30:26Z INFO [/aspire_azuredatalakestore/Main/CrawlController]: Sending start job for crawl: 1528133426127 (status: I)
2018-06-04T17:30:26Z INFO [/aspire_azuredatalakestore/QueuePipelineManager/ScanQueueLoader]: QueueLoader (scan) crawl status checker thread started
2018-06-04T17:30:26Z INFO [/aspire_azuredatalakestore/QueuePipelineManager/ProcessQueueLoader]: QueueLoader (process) item claim thread started
2018-06-04T17:30:26Z INFO [/aspire_azuredatalakestore/QueuePipelineManager/ScanQueueLoader]: QueueLoader (scan) item claim thread started
2018-06-04T17:30:26Z INFO [/aspire_azuredatalakestore/QueuePipelineManager/ProcessQueueLoader]: QueueLoader (process) crawl status checker thread started
2018-06-04T17:30:26Z INFO [/aspire_azuredatalakestore/Main/CrawlController]: Published crawl start job
2018-06-04T17:30:26Z INFO [/aspire_azuredatalakestore/RAP]: >>> Processing crawl - Azure DataLake Store: [/test]
2018-06-04T17:30:28Z INFO [/aspire_azuredatalakestore/RAP]: >>> Processing crawl - Azure DataLake Store: adl://dlsjose.azuredatalakestore.net/test
2018-06-04T17:30:28Z INFO [/aspire_azuredatalakestore/ProcessPipelineManager/ProcessCrawlRoot]: Added root item: /test
2018-06-04T17:30:28Z INFO [/aspire_azuredatalakestore/ProcessPipelineManager]: Processing: adl://dlsjose.azuredatalakestore.net/test
2018-06-04T17:30:29Z INFO [/aspire_azuredatalakestore/ScanPipelineManager/Scan]: Scanning: /test
2018-06-04T17:30:30Z INFO [/aspire_azuredatalakestore/RAP]: >>> Scan Item - Azure DataLake Store: /test
2018-06-04T17:30:31Z INFO [/aspire_azuredatalakestore/RAP]: >>> Processing crawl - Azure DataLake Store: adl://dlsjose.azuredatalakestore.net/test/NOACCESS
2018-06-04T17:30:31Z INFO [/aspire_azuredatalakestore/RAP]: >>> Processing crawl - Azure DataLake Store: adl://dlsjose.azuredatalakestore.net/test/subtest
2018-06-04T17:30:31Z INFO [/aspire_azuredatalakestore/ProcessPipelineManager]: Processing: adl://dlsjose.azuredatalakestore.net/test/NOACCESS
2018-06-04T17:30:31Z INFO [/aspire_azuredatalakestore/ProcessPipelineManager]: Processing: adl://dlsjose.azuredatalakestore.net/test/subtest
2018-06-04T17:30:31Z INFO [/aspire_azuredatalakestore/RAP]: >>> Processing crawl - Azure DataLake Store: adl://dlsjose.azuredatalakestore.net/test/test4.txt
2018-06-04T17:30:32Z INFO [/aspire_azuredatalakestore/RAP]: >>> Processing crawl - Azure DataLake Store: adl://dlsjose.azuredatalakestore.net/test/test5.txt
2018-06-04T17:30:32Z INFO [/aspire_azuredatalakestore/RAP]: >>> Processing crawl - Azure DataLake Store: adl://dlsjose.azuredatalakestore.net/test/test6.txt
2018-06-04T17:30:32Z INFO [/aspire_azuredatalakestore/ScanPipelineManager/Scan]: Item /test scanned 5 subitems
2018-06-04T17:30:32Z INFO [/aspire_azuredatalakestore/ProcessPipelineManager]: Processing: adl://dlsjose.azuredatalakestore.net/test/test4.txt
2018-06-04T17:30:32Z INFO [/aspire_azuredatalakestore/ProcessPipelineManager]: Processing: adl://dlsjose.azuredatalakestore.net/test/test5.txt
2018-06-04T17:30:32Z INFO [/aspire_azuredatalakestore/ProcessPipelineManager]: Processing: adl://dlsjose.azuredatalakestore.net/test/test6.txt
2018-06-04T17:30:34Z INFO [/aspire_azuredatalakestore/QueuePipelineManager/ScanQueueLoader]: QueueLoader (scan) crawl status thread stopped
2018-06-04T17:30:34Z INFO [/aspire_azuredatalakestore/QueuePipelineManager/ProcessQueueLoader]: QueueLoader (process) crawl status thread stopped
2018-06-04T17:30:34Z INFO [/aspire_azuredatalakestore/Main/CrawlController]: Published crawl end job
2018-06-04T17:30:34Z INFO [/aspire_azuredatalakestore/Main/CrawlController]: Crawl ended with status: S
2018-06-04T17:30:34Z INFO [/aspire_azuredatalakestore/QueuePipelineManager/ProcessQueueLoader]: QueueLoader (process) item claim thread stopped
2018-06-04T17:30:35Z INFO [/aspire_azuredatalakestore/QueuePipelineManager/ScanQueueLoader]: QueueLoader (scan) item claim thread stopped


If any other Component Add after all these sections

  • No labels