...
...
...
...
...
...
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
<!-- noSql database provider for the 3.X connector framework -->
<noSQLConnectionProvider>
<implementation>com.searchtechnologies.aspire:aspire-hbase-provider</implementation>
<properties>
<property name="hbase.zookeeper.quorum">zookeeper-server</property>
</properties>
</noSQLConnectionProvider>
|
Aspire will create one namespace per content source
...
. Under each namespace, all of the necessary tables are going to be created. Each namespace created will match the name of the content source with a default of "aspire_" as a prefix.
For example, if you have one Content Source named "Sharepoint Documents" the namespace will be "aspire_Sharepoint_Documents". This prefix can be changed by adding the "namespacePrefix" field to the configuration:
The Provider will automatically retry the operations in case they couldn't be completed because of connections errors, the maximum retries to execute is configurable by using the "maxRetries" option, by default if nothing is provided, up to 5 retries will be executed.
By default the Provider will try to create the namespaces if they don't exist on HBase, but sometimes the HBase System is configured so that users cannot create or delete namespaces, but they are granted to create tables on particular pre-existent namespaces. To avoid Aspire from trying to create namespaces the "createNamespaces" option can be used with a value of "false". If is option is turned off, you have to make sure the namespaces are pre-created before starting the Aspire nodes.
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
<!-- noSql database provider for the 3.X connector framework -->
<noSQLConnectionProvider>
<implementation>com.searchtechnologies.aspire:aspire-hbase-provider</implementation>
<namespacePrefix>aspire_crawl_</namespacePrefix>
<maxRetries>10</maxRetries>
<createNamespaces>false</createNamespaces>
<properties>
<property name="hbase.zookeeper.quorum">zookeeper-server</property>
</properties>
</noSQLConnectionProvider>
|
If your Hadoop cluster contains non default configurations for things like: ZooKeeper root path, HDFS root directory, HBase ports, etc. You can configure all of those properties using the "properties" section on the settings file.
...
It mimics the
...
Hadoop configuration
...
file properties, so you can add the same properties here.
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
<!-- noSql database provider for the 3.X connector framework -->
<noSQLConnectionProvider>
<implementation>com.searchtechnologies.aspire:aspire-hbase-provider</implementation>
<properties>
<property name="hbase.zookeeper.quorum">zookeeper-server</property>
<property name="hbase.rootdir">hdfs://example0:9025</property>
</properties>
</noSQLConnectionProvider>
|
...
If you are running Aspire inside a Hadoop Cluster, you can use the Hadoop Configuration files in order to connect to HBase with the same configuration properties as the rest of the cluster. For that you need to find where the following files are located:
Contains information about where the NameNode runs in the cluster. It contains the configuration settings for Hadoop Core such as I/O settings that are common to HDFS and MapReduce.
Contains information regarding the zookeeper quorum to be used, the rootDirectory on HDFS, zookeeper root directory.
We need a folder containing the previous files readable by the user that is running the Aspire process.
Example: If the files are located under "/etc/hbase/conf.cloudera.hbase/"
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
<!-- noSql database provider for the 3.X connector framework -->
<noSQLConnectionProvider>
<implementation>com.searchtechnologies.aspire:aspire-hbase-provider</implementation>
<configDir>/etc/hbase/conf.cloudera.hbase</configDir>
</noSQLConnectionProvider>
|
The HBase Provider is able to connect to secured HBase databases using Kerberos. It only needs a user principal and a keytab file to authenticate with.
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
<!-- noSql database provider for the 3.X connector framework -->
<noSQLConnectionProvider>
<implementation>com.searchtechnologies.aspire:aspire-hbase-provider</implementation>
<configDir>/etc/hbase/conf.cloudera.hbase</configDir>
<security>
<kerberos>
<user>[email protected]</user>
<path>/path/to/clusteruser.keytab</path>
</kerberos>
</security>
</noSQLConnectionProvider>
|