Page tree
Skip to end of metadata
Go to start of metadata

Performance and reliability

  • Distributed processing and automatic threading
  • The ability to split document processing jobs into sub jobs that can run in parallel
  • Standard technology for managing and restarting processes on servers for high availability
  • Can be placed within an architecture for Backup Failover

Ease of administration

  • Making dynamic (on-the-fly) configuration changes
  • Dynamically adding new components
    • Dynamic refresh of component code
    • Rich built-in XML processing methods including XPath and XSLT
    • Hierarchical component configuration
  • Rich and comprehensive web-based administration and control interface

A strong developer environment

  • Intuitive workflow interface

  • Supports processing content in diverse languages 

  • Easy mapping of document fields to search fields

  • Rich built-in JSON and XML processing methods, including XPath, XSLT

  • Use of scripting to build complex processing components

  • Hierarchical component configuration

  • Tightly integrated with Maven repositories for sharing and loading component code

  • Sharing and loading component code

  • Process streams of tokens, for performing text analytics

  • Entity extraction

  • Latent Semantic Analysis

  • Document vector creation and comparison

  • Topic Analysis

Support for security

  • Handle Proxy LDAP requests, including:
    • Authenticating users
    • Determining user group membership across a multitude of systems

Support to Federate search requests

  • Distribute queries to multiple search engines
  • Merge search results

Support for Hadoop

  • Ability to write to HDFS
  • Ability to include Aspire within Map/Reduce jobs

Structure of an Aspire Solution


Aspire deployments can be divided into three high-level functional areas: content access, content processing, and publishing.

  • Content access fetches the documents and associated metadata from the content repositories. The applications that perform this function are called Aspire Connectors. These use the supported application programing interfaces (APIs) of target repositories to access content, metadata, and security credentials. Where available, Aspire connectors capture the full directory structure from the repository, to support browsable enterprise site maps.
  • Content processing analyses, augments, and transforms content. Depending on the needs of the application this can involve simple use of regular expressions to a wide range of complex semantic and statistical processing techniques. Content processing can spawn Hadoop Map/Reduce jobs for large processing tasks.
  • Publishing refers to the components in an Aspire deployment that are responsible for pushing the processed text from the content processing pipeline(s) to the target system, typically a search engine or file directory, in the correct form, and where available using the search engine’s ingestion API. The applications that perform this function are called Aspire Publishers. XML and JSON output is also available.

Functional Component Hierarchy


  • Component - Atomic piece of Aspire logic.
  • Configurable Component - Single component wrapped with a dxf so it can be used with the Admin UI.
  • Application or Application Bundle - Multiple component wrapped with a dxf and possibly configuration files.

 

  • No labels