- Distributed processing and automatic threading
- The ability to split document processing jobs into sub jobs that can run in parallel
- Standard technology for managing and restarting processes on servers for high availability
- Can be placed within an architecture for Backup Failover
Ease of administration
- Making dynamic (on-the-fly) configuration changes
- Dynamically adding new components
- Dynamic refresh of component code
- Rich built-in XML processing methods including XPath and XSLT
- Hierarchical component configuration
- Rich and comprehensive web-based administration and control interface
A strong developer environment
Intuitive workflow interface
Supports processing content in diverse languages
Easy mapping of document fields to search fields
Rich built-in JSON and XML processing methods, including XPath, XSLT
Use of scripting to build complex processing components
Hierarchical component configuration
Tightly integrated with Maven repositories for sharing and loading component code
Sharing and loading component code
Process streams of tokens, for performing text analytics
Entity extraction
Latent Semantic Analysis
Document vector creation and comparison
Topic Analysis
Support for security
- Handle Proxy LDAP requests, including:
- Determining user group membership across a multitude of systems
Support to Federate search requests
- Distribute queries to multiple search engines
- Merge search results
Support for Hadoop
- Ability to write to HDFS
- Ability to include Aspire within Map/Reduce jobs
Structure of an Aspire Solution
Aspire deployments can be divided into three high-level functional areas: content access, content processing, and publishing.
- Content access fetches the documents and associated metadata from the content repositories. The applications that perform this function are called Aspire Connectors. These use the supported application programing interfaces (APIs) of target repositories to access content, metadata, and security credentials. Where available, Aspire connectors capture the full directory structure from the repository, to support browsable enterprise site maps.
- Content processing analyses, augments, and transforms content. Depending on the needs of the application this can involve simple use of regular expressions to a wide range of complex semantic and statistical processing techniques. Content processing can spawn Hadoop Map/Reduce jobs for large processing tasks.
- Publishing refers to the components in an Aspire deployment that are responsible for pushing the processed text from the content processing pipeline(s) to the target system, typically a search engine or file directory, in the correct form, and where available using the search engine’s ingestion API. The applications that perform this function are called Aspire Publishers. XML and JSON output is also available.
Functional Component Hierarchy
- Component - Atomic piece of Aspire logic.
- Configurable Component - Single component wrapped with a dxf so it can be used with the Admin UI.
- Application or Application Bundle - Multiple component wrapped with a dxf and possibly configuration files.