When running Aspire inside of Cloudera as part of the Hadoop Ecosystem, it is almost always required to interact with Kerberized Hadoop Components such as HBase and HDFS. This page provides a guide on how to prepare the credentials and the configuration needed for Aspire to be able to talk to Kerberized Hadoop Components.
It is recommended that aspire has its own credentials on Kerberos or Active Directory.
For creating an account on a MIT Kerberos Server, in the kadmin.local or kadmin shell you must run the following command:
replace REALM.COM with your own realm.
Then create a keytab for the aspire user:
Test your newly created account
First destroy any kerberos ticket on the cache:
And then authenticate using the aspire account and keytab
Set the necessary permissions on HDFS and/or HBase
You might need to execute the following commands with an existing account with sufficient permissions
HDFS user directory
First if you want to be able to write to HDFS from Aspire, you may want to create a user directory for aspire in HDFS. First make sure you have correctly authenticated with Kerberos using the kinit command from above. Then create the /user/aspire directory by executing the following commands:
If you want Aspire to be able to read from an specific HDFS directory, then make sure the aspire user can read it by looking at the permissions from the directoy:
To test connection with HBase after authenticating with the kinit command, let's open the hbase shell:
And execute the list command to test the aspire user permissions:
If you find any troubles with HBase permissions for your aspire user check Cloudera HBase Authorization for step by step instructions on how to set the appropiate permissions.
If you are using HBase for crawls metadata, take into account that you might need either Admin permissions for Aspire to be able to create new namespaces for each new content source, or the namespaces already be created and assign Create, Read and Write permissions to those namespaces.