Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Version of Web HDFS required

The Aspire Web HDFS publisher was created and tested using version XX.

Before installing the Web HDFS publisher, make sure that:

  • Web HDFS is up and running

User Account Requirements 

In order to access Web HDFS an user account with sufficient privileges must be supplied. It is recommended the account be the site administrator.

Get An User Account

Windows or Linux

...

WebHDFS configuration

...

The WebHDFS feature must be enabled in order to use this publisher.

Grant Read Permissions to Crawl Path

...

Granting READ permissions is a must since the connector won't be able to get any data if the Path to be crawled is restricted.

Kerberized Clusters

...

For Kerberized Clusters, a delegation token is required in order to crawl any path within the HDFS. To obtain this token you must:

  1. SSH into your cluster.
  2. Run:

    Code Block
    $ kinit <your-user-principal>
    $ curl -i --negotiate -u : "http://<host>:<port>/webhdfs/v1/?op=GETDELEGATIONTOKEN"
    ...
    {"Token":{"urlString":"<A-VERY-LONG-TOKEN>"}}
  3. Copy the "Token" field and set it into the configuration of the connector

...

  1. .