Connector will CRAWL Folders and Files (configuration dependant) and will pull the following metadata
Here a RUN example for a Crawlin on a folder "/test"
2018-06-01T17:18:12Z INFO [/aspire_azuredatalakestore/Main/CrawlController]: Received job - action: start 2018-06-01T17:18:12Z INFO [/aspire_azuredatalakestore/Main/CrawlController]: Initializing crawl: 1527873492972 2018-06-01T17:18:12Z INFO [/aspire_azuredatalakestore/Main/CrawlController]: Initializing statistics for crawl: 1527873492972 2018-06-01T17:18:12Z INFO [/aspire_azuredatalakestore/Main/CrawlController]: Clearing queues, snapshot, hierarchy and intersection acls - please wait... 2018-06-01T17:18:13Z INFO [/aspire_azuredatalakestore/Main/CrawlController]: Sending start job for crawl: 1527873492972 (status: INI) 2018-06-01T17:18:13Z INFO [/aspire_azuredatalakestore/QueuePipelineManager/ProcessQueueLoader]: QueueLoader (process) crawl status checker thread started 2018-06-01T17:18:13Z INFO [/aspire_azuredatalakestore/QueuePipelineManager/ProcessQueueLoader]: QueueLoader (process) item claim thread started 2018-06-01T17:18:13Z INFO [/aspire_azuredatalakestore/QueuePipelineManager/ScanQueueLoader]: QueueLoader (scan) item claim thread started 2018-06-01T17:18:13Z INFO [/aspire_azuredatalakestore/QueuePipelineManager/ScanQueueLoader]: QueueLoader (scan) crawl status checker thread started 2018-06-01T17:18:13Z INFO [/aspire_azuredatalakestore/Main/CrawlController]: Clearing queues, snapshot, hierarchy and intersection acls took 200 ms 2018-06-01T17:18:13Z INFO [/aspire_azuredatalakestore/Main/CrawlController]: Offering crawl root 2018-06-01T17:18:14Z INFO [/aspire_azuredatalakestore/Main/CrawlController]: Published crawl start job 2018-06-01T17:18:14Z INFO [/aspire_azuredatalakestore/RAP]: >>> Processing crawl - Azure DataLake Store: [/test] 2018-06-01T17:18:15Z INFO [/aspire_azuredatalakestore/RAP]: >>> Processing crawl - Azure DataLake Store: /test 2018-06-01T17:18:15Z INFO [/aspire_azuredatalakestore/ProcessPipelineManager/ProcessCrawlRoot]: Added root item: /test 2018-06-01T17:18:16Z INFO [/aspire_azuredatalakestore/ProcessPipelineManager]: Processing: /test 2018-06-01T17:18:17Z INFO [/aspire_azuredatalakestore/ScanPipelineManager/Scan]: Scanning: /test 2018-06-01T17:18:17Z INFO [/aspire_azuredatalakestore/RAP]: >>> Scan Item - Azure DataLake Store: /test 2018-06-01T17:18:17Z INFO [/aspire_azuredatalakestore/RAP]: >>> Processing crawl - Azure DataLake Store: /test/NOACCESS 2018-06-01T17:18:17Z INFO [/aspire_azuredatalakestore/RAP]: >>> Processing crawl - Azure DataLake Store: /test/subtest 2018-06-01T17:18:17Z INFO [/aspire_azuredatalakestore/RAP]: >>> Processing crawl - Azure DataLake Store: /test/test4.txt 2018-06-01T17:18:18Z INFO [/aspire_azuredatalakestore/ProcessPipelineManager]: Processing: /test/NOACCESS 2018-06-01T17:18:18Z INFO [/aspire_azuredatalakestore/ProcessPipelineManager]: Processing: /test/test4.txt 2018-06-01T17:18:18Z INFO [/aspire_azuredatalakestore/ProcessPipelineManager]: Processing: /test/subtest 2018-06-01T17:18:18Z INFO [/aspire_azuredatalakestore/RAP]: >>> Processing crawl - Azure DataLake Store: /test/test5.txt 2018-06-01T17:18:18Z INFO [/aspire_azuredatalakestore/RAP]: >>> Processing crawl - Azure DataLake Store: /test/test6.txt 2018-06-01T17:18:18Z INFO [/aspire_azuredatalakestore/ScanPipelineManager/Scan]: Item /test scanned 5 subitems 2018-06-01T17:18:19Z INFO [/aspire_azuredatalakestore/ScanPipelineManager/Scan]: Scanning: /test/NOACCESS 2018-06-01T17:18:19Z INFO [/aspire_azuredatalakestore/ScanPipelineManager/Scan]: Scanning: /test/subtest 2018-06-01T17:18:19Z INFO [/aspire_azuredatalakestore/ProcessPipelineManager]: Processing: /test/test5.txt 2018-06-01T17:18:19Z INFO [/aspire_azuredatalakestore/ProcessPipelineManager]: Processing: /test/test6.txt 2018-06-01T17:18:19Z WARN [/aspire_azuredatalakestore/RAP]: Unable to access path: '/test/NOACCESS'. Missing READ and EXECUTE access. Please check your application created. Skipped 2018-06-01T17:18:19Z INFO [/aspire_azuredatalakestore/RAP]: >>> Scan Item - Azure DataLake Store: /test/NOACCESS 2018-06-01T17:18:19Z INFO [/aspire_azuredatalakestore/ScanPipelineManager/Scan]: Item /test/NOACCESS scanned 0 subitems 2018-06-01T17:18:19Z INFO [/aspire_azuredatalakestore/RAP]: >>> Scan Item - Azure DataLake Store: /test/subtest 2018-06-01T17:18:19Z INFO [/aspire_azuredatalakestore/RAP]: >>> Processing crawl - Azure DataLake Store: /test/subtest/sub-sub-test 2018-06-01T17:18:20Z INFO [/aspire_azuredatalakestore/ProcessPipelineManager]: Processing: /test/subtest/sub-sub-test 2018-06-01T17:18:20Z INFO [/aspire_azuredatalakestore/RAP]: >>> Processing crawl - Azure DataLake Store: /test/subtest/test1.txt 2018-06-01T17:18:20Z INFO [/aspire_azuredatalakestore/RAP]: >>> Processing crawl - Azure DataLake Store: /test/subtest/test7.txt 2018-06-01T17:18:20Z INFO [/aspire_azuredatalakestore/ScanPipelineManager/Scan]: Item /test/subtest scanned 3 subitems 2018-06-01T17:18:21Z INFO [/aspire_azuredatalakestore/ScanPipelineManager/Scan]: Scanning: /test/subtest/sub-sub-test 2018-06-01T17:18:21Z INFO [/aspire_azuredatalakestore/ProcessPipelineManager]: Processing: /test/subtest/test1.txt 2018-06-01T17:18:21Z INFO [/aspire_azuredatalakestore/ProcessPipelineManager]: Processing: /test/subtest/test7.txt 2018-06-01T17:18:21Z INFO [/aspire_azuredatalakestore/RAP]: >>> Scan Item - Azure DataLake Store: /test/subtest/sub-sub-test 2018-06-01T17:18:21Z INFO [/aspire_azuredatalakestore/RAP]: >>> Processing crawl - Azure DataLake Store: /test/subtest/sub-sub-test/test2.txt 2018-06-01T17:18:21Z INFO [/aspire_azuredatalakestore/RAP]: >>> Processing crawl - Azure DataLake Store: /test/subtest/sub-sub-test/test8.txt 2018-06-01T17:18:21Z INFO [/aspire_azuredatalakestore/ScanPipelineManager/Scan]: Item /test/subtest/sub-sub-test scanned 2 subitems 2018-06-01T17:18:22Z INFO [/aspire_azuredatalakestore/ProcessPipelineManager]: Processing: /test/subtest/sub-sub-test/test2.txt 2018-06-01T17:18:22Z INFO [/aspire_azuredatalakestore/ProcessPipelineManager]: Processing: /test/subtest/sub-sub-test/test8.txt 2018-06-01T17:18:23Z INFO [/aspire_azuredatalakestore/Main/CrawlController]: Published crawl end job 2018-06-01T17:18:23Z INFO [/aspire_azuredatalakestore/QueuePipelineManager/ProcessQueueLoader]: QueueLoader (process) crawl status thread stopped 2018-06-01T17:18:23Z INFO [/aspire_azuredatalakestore/QueuePipelineManager/ScanQueueLoader]: QueueLoader (scan) crawl status thread stopped 2018-06-01T17:18:23Z INFO [/aspire_azuredatalakestore/Main/CrawlController]: Crawl ended with status: S 2018-06-01T17:18:23Z INFO [/aspire_azuredatalakestore/QueuePipelineManager/ScanQueueLoader]: QueueLoader (scan) item claim thread stopped 2018-06-01T17:18:23Z INFO [/aspire_azuredatalakestore/QueuePipelineManager/ProcessQueueLoader]: QueueLoader (process) item claim thread stopped
Output for META DATA found for a file
<job id="192.168.56.1:50505/2018-06-01T15:56:21Z/1/18" time="2018-06-01T17:02:45Z"> <doc> <qid>/test/test6.txt</qid> <id>/test/test6.txt</id> <connectorSpecific> <field name="fullname">/test/test6.txt</field> <field name="length">0</field> <field name="group">3b891abc-b0d4-4c57-8231-b5b48ff8f912</field> <field name="user">3b891abc-b0d4-4c57-8231-b5b48ff8f912</field> <field name="permission">770</field> <field name="lastAccessTime">Mon May 28 15:31:16 CST 2018</field> <field name="aclBit">true</field> <field name="blocksize">268435456</field> <field name="expiryTime"/> <field name="replicationFactor">1</field> <field name="isContainer">false</field> </connectorSpecific> <fetchUrl>/test/test6.txt</fetchUrl> <url>/test/test6.txt</url> <lastModified>Mon May 28 15:31:16 CST 2018</lastModified> <acls> <acl access="allow" domain="xxx.azuredatalakestore.net" entity="user" fullname="xxx.azuredatalakestore.net\41599999-13e0-4431-9b35-d2da6e9ccee8" name="user:41599999-13e0-4431-9b35-d2da6e9ccee8:rwx" scope="global"/> <acl access="allow" domain="xxx.azuredatalakestore.net" entity="user" fullname="xx.azuredatalakestore.net\" name="group::rwx" scope="global"/> </acls> <displayUrl>/test/test6.txt</displayUrl> <action>add</action> <docType>item</docType> <sourceName>aspire-azuredatalakestore</sourceName> <sourceType>azureDataLakeStore</sourceType> <sourceId>aspire_azuredatalakestore</sourceId> <repItemType>aspire/file</repItemType> <hierarchy> <item id="2FED9DB88E9569860C5F71054971EC21" level="3" name="test6.txt" url="/test/test6.txt"> <ancestors> <ancestor id="4539330648B80F94EF3BF911F6D77AC9" level="2" name="test" parent="true" type="aspire/folder" url="/test"/> <ancestor id="6666CD76F96956469E7BE39D750CC7D9" level="1" type="aspire/folder" url="/"/> </ancestors> </item> </hierarchy> <contentType source="ExtractTextStage/Content-Type">text/plain; charset=UTF-8</contentType> <extension source="ExtractTextStage"> <field name="X-Parsed-By">org.apache.tika.parser.DefaultParser</field> <field name="Content-Encoding">UTF-8</field> <field name="resourceName">/test/test6.txt</field> </extension> <content source="ExtractTextStage"><![CDATA[ ]]></content> <contentLength source="ExtractTextStage">1</contentLength> </doc> </job>