Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

WebHDFS configuration

...

The WebHDFS feature must be enabled in order to use this connector.


Grant Read Permissions to crawl path

...

Granting READ permissions is a must since the connector won't be able to get any data if the Path to be crawled is restricted.


Kerberized Clusters

...

For Kerberized Clusters a delegation token is required in order to crawl any path within the HDFS. For obtaining this token you must:

  1. SSH into your cluster
  2. Run:

    Code Block
    $ kinit <your-user-principal>
    $ curl -i --negotiate -u : "http://<host>:<port>/webhdfs/v1/?op=GETDELEGATIONTOKEN"
    ...
    {"Token":{"urlString":"<A-VERY-LONG-TOKEN>"}}
  3. Copy the "Token" field and set it into the configuration of the connector