The WebHDFS feature must be enabled in order to use this connector.
...
Granting READ permissions is a must since the connector won't be able to get any data if the Path to be crawled is restricted.
For Kerberized Clusters, a delegation token is required in order to crawl any path within the HDFS.
...
To obtain this token you must:
Run:
Code Block |
---|
$ kinit <your-user-principal>
$ curl -i --negotiate -u : "http://<host>:<port>/webhdfs/v1/?op=GETDELEGATIONTOKEN"
...
{"Token":{"urlString":"<A-VERY-LONG-TOKEN>"}} |