On this page



The Publish to SolrCloud CDH application sends document feeds to the SolrCloud index through SolrJ. SolrJ has a CloudSolrClient (or HTTPSolrClient) class to communicate with SolrCloud.

Instances of this class communicate with Zookeeper to discover Solr endpoints for SolrCloud collections (or directly to a Solr endpoint), and then use the LBHttpSolrClient to issue requests. The feed to Solr can be customized by editing the XSL transformation file provided by the user.

Publish to SolrCloud
Factory Namecom.searchtechnologies.aspire:app-publish-to-solrj
subTypedefault
InputsAspireObject from a connector's subjob with metadata and content extracted from a specific file/folder.
OutputsAn XML transformation of the AspireObject sent to the Solr's xmlfeed URL.
Versions3.1, 3.2
Type Flagsjob-input

Configuration


This section lists all configuration parameters available to configure the Publish to SolrCloud component.

 

PropertyTypeDefaultDescription
connectMethodstringzkConnection method to communicate with Solr
zkHostsstringnoneZookeeper hosts and port comma separated or Solr host and port (depends of connection method selected)
chrootstring/solrZookeeper route for Solr znode tree.
collectionstringnoneSolr collection name to submit post documents.
postXslstring${appbundle.home}/config/xsl/aspireToSolr.xslLocation of the XSL to transform the job data to a Solr feed. See Edit Xsl.
maxTries int3Maximum number of retry attempt to index.
retryWait int3Seconds to wait before re-trying to index.
fieldListPath  string/add/doc/fieldListXPath expression to process field list.
idFieldstringurlField used to identify the jobs on Solr Cloud
commitbooleanfalseForce commit before send a batch
commitMSint30000Milliseconds to wait before send the batch to index
zkClientTimeoutint10000Zookepper Solr Cloud Client timeout
zkConnectTimeoutint10000Zookepper Solr Cloud Connect timeout
useKerberosbooleanfalseTo use kerberos authentication
coreSite
stringnonethe hadoop core-site.xml path
jaasOptionsstringnoneThe configuration file properties path 
debugbooleanfalseEnable debug messages.

Example Configuration



<!-- Post to Solr stage. This will post to a running Solr instance a new document -->
<component name="SolrJPublisher" subType="default" factoryName="aspire-solrj-publisher-cdh">
	<debug>${debug}</debug>
	<connectMethod>${connectMethod}</connectMethod>
	<zkHosts>${zKHosts}</zkHosts>
	<chroot>${chroot}</chroot>
	<collection>${SolrCollection}</collection>
	<postXsl>${aspireToSolrXsl}</postXsl>
	<maxTries>${maxTries}</maxTries>
	<retryWait>${retryWait}</retryWait>
	<fieldListPath>${fieldListPath}</fieldListPath>
	<idField>${idField}</idField>
	<useKerberos>${useKerberos}</useKerberos>
	<commit>${commit}</commit> 
	<commitMS>${commitMS}</commitMS>
	<zkClientTimeout>${zkClientTimeout}</zkClientTimeout>
	<zkConnectTimeout>${zkConnectTimeout}</zkConnectTimeout>
	<jaasOptions>${jaasOptions}</jaasOptions>
	<coreSite>${coreSite}</coreSite>
	<trustAllCertificates>${trustAllCertificates}</trustAllCertificates>
</component>

Edit Xsl


The default XSL transformation file can be found in AspireToSolr.xsl.

The default transformation XSL file provided by the publisher expects metadata as described in Connector AspireObject Metadata.

Add metadata field

To add a new metadata field extracted by an Aspire Connector add an XSL element under the <doc> tag.

<field name="metafieldNameInSolr_t">
  <xsl:value-of select="metafieldNameFromAspireObject" />
</field>


Notice that the dynamic field _t is being used by default. If you have a Solr schema that supports your field, then just enter the field name as defined in the schema.

Change the document ID

The id of a Solr document is used to uniquely identify a file in the index. By default, Publish To SolrCloud will use the following fields from the Aspire document in order of precedency (if one is missing, then the next will be used):

  • fetchUrl
  • url
  • displayUrl
  • id

If you want to change this behavior, edit or create a new XSL file which has the following element:

<field name="id">
  <xsl:value-of select="idFieldNameFromAspireObject" />
</field>

Advanced Edit

More advanced changes can be accomplished reading the Solr Update XML Messages wiki.