Performance and Reliability

    • Distributed processing and automatic threading
    • The ability to split document processing jobs into sub jobs that can run in parallel
    • Standard technology for managing and restarting processes on servers for high availability
    • Can be placed within an architecture for Backup Failover

Ease of Administration

    • Making dynamic (on-the-fly) configuration changes
    • Dynamically adding new components
      • Dynamic refresh of component code
      • Rich built-in XML processing methods including XPath and XSLT
      • Hierarchical component configuration
    • Rich and comprehensive web-based administration and control interface

Strong Developer Environment

    • Intuitive workflow interface
    • Supports processing content in diverse languages
    • Easy mapping of document fields to search fields
    • Rich built-in JSON and XML processing methods, including XPath, XSLT
    • Use of scripting to build complex processing components
    • Hierarchical component configuration
    • Tightly integrated with Maven repositories for sharing and loading component code 
    • Sharing and loading component code
    • Process streams of tokens, for performing text analytics
      • Entity extraction
      • Latent Semantic Analysis
      • Document vector creation and comparison
      • Topic Analysis

Support for Security

    • Handle Proxy LDAP requests, including:
      • Authenticating users
      • Determining user group membership across a multitude of systems

Support to Federate Search Requests

    • Distribute queries to multiple search engines
    • Merge search results

Support for Hadoop

    • Ability to write to HDFS
    • Ability to include Aspire within Map/Reduce jobs


Diverse Languages

Aspire is able to crawl content in many languages from content repositories, process it using pipelines created using content processing components and publish them to target applications, typically search engines. 

Most of Aspire doesn’t care about the encoding or the language and is designed as a content processor with UTF-8 processing of all documents throughout the entire stack. The area where language and encoding are most critical within Aspire are those components that actually look at and manipulate the internal text stream.


  • Aspire's default text tokenization services uses the Lucene language analyzers, providing simple tokenization services but not stemming / lemmatization or other forms of text processing.
  • The tokenization pipeline stages within the Tokenization Library typically rely on externalized files which should work the same regardless of what language they’re in.
  • In addition there is a packaged Aspire Basis Tokenizer (requires license from Basis) that also supports many languages.
    • Aspire's open architecture is flexible enough to integrate many other language processors, both Open Source and Commercial.
  • No labels