Table of Contents |
---|
The Publish to Solr SolrCloud application sends document feeds, to SolrCloud index through SolrJ. SolrJ has a CloudSolrClient (or HTTPSolrClient) class to communicate with SolrCloud.
Panel | ||
---|---|---|
| ||
|
Instances of this class communicate with Zookeeper to discover Solr endpoints for SolrCloud collections (or directly to a Solr endpoint), and then use the LBHttpSolrClient to issue requests. The to the Solr index update servlet, of metadata and content of files extracted by Aspire connectors. The feed to Solr can be customized by
editing by editing the XSL transformation file provided by the user.
Publish to SolrCloud | |
---|---|
Factory Name | com.searchtechnologies.aspire:app-publish-to-solrj |
subType | default |
Inputs | AspireObject from a connector's subjob with metadata and content extracted from a specific file/folder. |
Outputs | An XML transformation of the AspireObject sent to the Solr's xmlfeed URL. |
Versions | 3.1, 4.0 |
Type Flags | job-input |
This section lists all configuration parameters available to configure the Publish to SolrCloud component.
Property | Type | Default | Description | ||||
---|---|---|---|---|---|---|---|
connectMethod | string | zk | Connection method to communicate with Solr | ||||
zkHostszkHost | string | none | Zookeeper hostname or IP address. | ||||
zkPort | interger | 8983 | Zookeeper port number where to send the feeds | ||||
hosts and port comma separated or Solr host and port (depends of connection method selected) | |||||||
chroot | string | /solr | Zookeeper route for Solr znode tree. | ||||
collectionsolrCollection | string | none | Solr collection name to submit post documents using binary request handler for increased index performance. | ||||
aspireToSolrXslpostXsl | string | ${appbundle.home}/config/xsl/aspireToSolr.xsl | Location of the XSL to transform the job data to a Solr feed. See Edit Xsl. | batchSize | int | 50 | How many documents to fetch per batch. |
maxTries | int | 3 | Maximum number of retry attempt to index. | ||||
retryWait | int | 3 | Seconds to wait before re-trying to index. | ||||
fieldListPath | Stringstring | /add/doc/fieldList | XPath expression to process field list. | ||||
idField | string | url | Field used to identify the jobs on Solr Cloud | ||||
commit | boolean | false | Force commit before send a batch | ||||
commitMS | int | 30000 | Milliseconds to wait before send the batch to index | ||||
zkClientTimeout | int | 10000 | Zookepper Solr Cloud Client timeout | ||||
zkConnectTimeout | int | 10000 | Zookepper Solr Cloud Connect timeout | ||||
useKerberos | boolean | false | To use kerberos authentication | ||||
jaasOptions | string | none | The configuration file properties path | ||||
debug | boolean | false | Enable debug messages. |
<application config="com.searchtechnologies.aspire:app-publish-to-solrj">
<properties>
<property name="BatchSize">50</property>
<property name="RetryWait">3</property>
<property name="FieldListPath">/add/doc/fieldList</property>
<property name="AspireToSolrXsl">${appbundle.home}/config/xsl/aspireToSolr.xsl</property>
<property name="debug">true</property>
<property name="MaxTries">3</property>
<property name="ZKPort">9983</property>
<property name="SolrCollection">collection1</property>
<property name="ZKHost">localhost</property>
</properties>
</application>
Code Block | ||||
---|---|---|---|---|
| ||||
<!-- Post to Solr stage. This will post to a running Solr instance a new document -->
<component name="SolrJPublisher" subType="default" factoryName="aspire-solrj-publisher">
<debug>${debug}</debug>
<connectMethod>${connectMethod}</connectMethod>
<zkHosts>${zKHosts}</zkHosts>
<chroot>${chroot}</chroot>
<collection>${SolrCollection}</collection>
<postXsl>${aspireToSolrXsl}</postXsl>
<maxTries>${maxTries}</maxTries>
<retryWait>${retryWait}</retryWait>
<fieldListPath>${fieldListPath}</fieldListPath>
<idField>${idField}</idField>
<useKerberos>${useKerberos}</useKerberos>
<commit>${commit}</commit>
<commitMS>${commitMS}</commitMS>
<zkClientTimeout>${zkClientTimeout}</zkClientTimeout>
<zkConnectTimeout>${zkConnectTimeout}</zkConnectTimeout>
<jaasOptions>${jaasOptions}</jaasOptions>
</component> |
Any optional properties can be removed from the configuration to use the default value described on the table above.
The default XSL transformation file can be found in AspireToSolr.xsl.
The default transformation XSL file provided by the publisher expects metadata as described in Connector AspireObject Metadata.
To add a new metadata field extracted by an Aspire Connector add an XSL element under the <doc> tag.
Code Block | ||
---|---|---|
| ||
<field name="metafieldNameInSolr_t"> <xsl:value-of select="metafieldNameFromAspireObject" /> </field> |
Notice that the dynamic field _t is being used by default. If you have a Solr schema that supports your field, then just enter the field name as defined in the schema.
The id of a Solr document is used to uniquely identify a file in the index. By default, Publish To Solr SolrCloud will use the following fields from the Aspire document in order of precedency (if one is missing, then the next will be used):
If you want to change this behavior, edit or create a new XSL file which has the following element:
Code Block |
---|
<field name="id"> <xsl:value-of select="idFieldNameFromAspireObject" /> </field> |
More advanced changes can be accomplished reading the Solr Update XML Messages wiki.