The Aspire Lucene component provides the Lucene classes to other bundles and methods for some commonly used Lucene functionality.

This component exists as a holder for the Lucene libraries and exports the Lucene classes for use in other components.

It also provides convenient methods for indexing and searching in an index controlled by the component, although configuration of this index is optional. The services are disabled if the index is not configured.

Lucene Services
Factory Name	com.searchtechnologies.aspire:aspire-lucene
subType	default
Inputs	Method calls
Outputs	Lucene index (optional)

Configuration

Element	Type	Default	Description
indexDirectory	string	<none>	The direcotry on disk of a Lucene index. The index will be created if if does not exist. If this parameter is not given, index and searching methods will not be available.
documentID	string	<none>	The Lucene field to be used as the document id for deletes and updates. If not specified, documents may be added to the index, but updates and deletes will not be available.
luceneMaxFieldLength	int	10000	The maximum number of terms that will be indexed for a single field in a document. This limits the amount of memory required for indexing, so that collections with very large files will not crash the indexing process by running out of memory. This setting refers to the number of running terms, not to the number of different terms. Note: this silently truncates large documents, excluding from the index all terms that occur further in the document. If you know your source documents are large, be sure to set this value high enough to accommodate the expected size. If you set it to Integer.MAX_VALUE, then the only limit is your memory, but you should anticipate an OutOfMemoryError. By default, no more than 10,000 terms will be indexed for a field.
luceneMaxBufferedDocuments	string	-1 = disabled	Determines the minimal number of documents required before the buffered in-memory documents are flushed as a new Segment. Large values generally gives faster indexing. When this is set, the writer will flush every luceneMaxBufferedDocuments added documents. Pass in -1 to prevent triggering a flush due to number of buffered documents. Note that if flushing by RAM usage is also enabled, then the flush will be triggered by whichever comes first. Disabled by default (writer flushes by RAM usage).
luceneMergeFactor	int	2	Sets the index writer merge factor.
luceneRAMBufferSizeMB	int	2048	Sets the index writer RAM buffer size in MB.
autoCommitMS	long	0 = disabled	The time (in ms) bewteen commits of the index. If set to 0, auto-commit based on time is disabled. This index is only committed if documents have been added since the last commit.
autoCommitMS	long	0 = disabled	The maximum number of documents that can be added between commits of the index. If set to 0, auto-commit based on document submission is disabled.
autoCommitSpinWait	long	1000 ms = 1 s	The spin wait time for the thread performing auto-commits (if enabled). The thread wakes this often to check whether the time and document threshold have been passed and commits if required.

Example Configuration

Simple

    <component name="LuceneService" subType="default" factoryName="aspire-lucene"/>

Complex

    <component name="LuceneIndexer" subType="default" factoryName="aspire-lucene">
      <indexDirectory>data/index/lucene-index</indexDirectory>
      <documentID>url</documentID>
      <luceneMaxFieldLength>10000</luceneMaxFieldLength>
      <luceneMaxBufferedDocuments>100</luceneMaxBufferedDocuments>
      <autoCommitSpinWait>5000</autoCommitSpinWait>
      <autoCommitMS>1800000</autoCommitMS>
      <autoCommitDocs>10000</autoCommitDocs>
    </component>

Accessing from External Components

In order to use the index and searching capabilities of this component, you must configure the <indexDirectory> parameter. Services are then provided using the AspireLucene.java interface.

Components wishing to access this functionality should main a service tracker to this component, get an instance an then call the appropriate method. See here for further details.

Page tree

Lucene Services

Configuration

Example Configuration

Simple

Complex

Accessing from External Components