<!-- noSql database provider for the 3.X connector framework --> <noSQLConnectionProvider> <implementation>com.searchtechnologies.aspire:aspire-hbase-provider</implementation> <properties> <property name="hbase.zookeeper.quorum">zookeeper-server</property> </properties> </noSQLConnectionProvider>
Aspire will create one namespace per content source. Under each namespace, all of the necessary tables will be created. Each namespace will match the name of the content source with a default "aspire_" prefix.
For example, if you have one Content Source named "Sharepoint Documents" the namespace will be "aspire_Sharepoint_Documents". This prefix can be changed by adding the "namespacePrefix" field to the configuration.
The Provider will automatically retry the operations in case they couldn't be completed because of connections errors. The maximum retries to execute is configurable using the "maxRetries" option. By default (if nothing is provided), up to five retries will be executed.
By default, the Provider will try to create the namespaces if they don't exist on HBase. Sometimes the HBase System is configured so that users cannot create or delete namespaces but they are granted the ability to create tables on particular pre-existing namespaces. To avoid Aspire from trying to create namespaces, the "createNamespaces" option can be used with a value of "false".
If the option is turned off, make sure the namespaces are created before starting the Aspire nodes.
<!-- noSql database provider for the 3.X connector framework --> <noSQLConnectionProvider> <implementation>com.searchtechnologies.aspire:aspire-hbase-provider</implementation> <namespacePrefix>aspire_crawl_</namespacePrefix> <maxRetries>10</maxRetries> <createNamespaces>true</createNamespaces> <properties> <property name="hbase.zookeeper.quorum">zookeeper-server</property> </properties> </noSQLConnectionProvider>
In a situation where your Hadoop cluster contains non-default configurations for things like ZooKeeper root path, HDFS root directory, HBase ports, etc., you can configure all of these properties using the "properties" section on the settings file. It mimics the Hadoop configuration file properties, so you can add the same properties here.
<!-- noSql database provider for the 3.X connector framework --> <noSQLConnectionProvider> <implementation>com.searchtechnologies.aspire:aspire-hbase-provider</implementation> <properties> <property name="hbase.zookeeper.quorum">zookeeper-server</property> <property name="hbase.rootdir">hdfs://example0:9025</property> </properties> </noSQLConnectionProvider>
If you are running Aspire inside a Hadoop Cluster, you can use the Hadoop Configuration files in order to connect to HBase with the same configuration properties as the rest of the cluster. For that you need to determine where the following files are located.
Contains information about where the NameNode runs in the cluster. It contains the configuration settings for Hadoop Core such as I/O settings that are common to HDFS and MapReduce.
Contains information regarding the zookeeper quorum to be used, the rootDirectory on HDFS, zookeeper root directory.
We need a folder containing the previous files that are readable by the user who is running the Aspire process.
Example: If the files are located under "/etc/hbase/conf.cloudera.hbase/"
<!-- noSql database provider for the 3.X connector framework --> <noSQLConnectionProvider> <implementation>com.searchtechnologies.aspire:aspire-hbase-provider</implementation> <configDir>/etc/hbase/conf.cloudera.hbase</configDir> </noSQLConnectionProvider>
The HBase Provider is able to connect to secured HBase databases using Kerberos. It only needs a user principal and a keytab file to authenticate with.
<!-- noSql database provider for the 3.X connector framework --> <noSQLConnectionProvider> <implementation>com.searchtechnologies.aspire:aspire-hbase-provider</implementation> <configDir>/etc/hbase/conf.cloudera.hbase</configDir> <security> <kerberos> <user>[email protected]</user> <path>/path/to/clusteruser.keytab</path> </kerberos> </security> </noSQLConnectionProvider>
Before launching Aspire, you need to change the felix.properties
file and add these lines if the Kerberos authentication is going to be used.
# To append packages to the default set of exported system packages, # set this value. org.osgi.framework.system.packages.extra=\ ... sun.security.krb5, \ com.sun.security.auth.callback # The following property makes specified packages from the class path # available to all bundles. You should avoid using this property. org.osgi.framework.bootdelegation=\ ... javax.security.sasl, \ sun.security.krb5