Introduction

Aspire is a framework and libraries of extensible components designed to enable creation of solutions to acquire data from one or more content repositories (such as file systems, relational databases, cloud storage, or content management systems), extract metadata and text from the documents, analyze, modify and enhance the content and metadata if needed, and then publish each document, together with its metadata, to a search engine or other target application.

Aspire uses Apache Felix (an open source implementation of OSGi) to install, start, stop, update, and uninstall Aspire components and applications without requiring a reboot, supporting improved uptime and making system administration easier. Each individual piece of processing functionality within Aspire is a modular component that can be used by itself, or in conjunction with other components to create an Aspire application.

What is Aspire used for?

Aspire is being used in many types of customer applications, here are some examples:

Enterprise search to enrich content with additional metadata to support advanced navigation.
Staffing and recruitment to provide search and match solutions between candidate CVs and job descriptions
State government information site to extract metadata from OCR files and normalize the data prior to indexing
Records management to automatically categorize corporate data as it is migrated into SharePoint where content needs to be aggregated and categorized before searching
Legal research to find and analyze content for forward and reverse citations to other content to improve recall and analysis
Company intranet to automatically create enterprise-wide sitemaps for browsing style investigation
Federal government information site to intelligently split up large single files pertaining to laws into searchable” chapters and clauses
Basic content access (connector) to one or more content repositories for search engines
Analyzing and grouping content geospatially for localization

Aspire is extremely flexible. By pulling the data processing pipelines out of the search engine, Aspire can more powerfully and efficiently manipulate content and metadata, can process it in multiple pipelines simultaneously (and over multiple machines)for higher performance, and then feed it to one or more engines for indexing.

The Aspire framework supports creating Natural Language Processing (NLP), Machine Learning, and other analytic processing for text through a rich set of basic components. More detailed descriptions can be found on this page: Natural Language Processing (NLP)

If you want to start using Aspire, see here.

Administrator’s View

The administrator is responsible for installing, configuring, and maintaining Aspire deployments. Aspire deployments are managed through a web-based, point-and-click interface, the same used by the Aspire developer, however it is expected that an administrator only needs to fill in configuration information. Depending on your environment, you may wish to have a single Aspire system administrator, or you may wish to have several, each responsible for different content sources.

The System Administration UI has the following main functions.

Content Source administration functions include:

Configuring properties for Aspire connectors
Setting up crawling schedules for repositories
Managing full and incremental crawls
Managing security
Monitoring system health and performance
Monitoring crawl statistics and performance
Index Auditing

Administration UI Security:

Securing the Administration UI

Connection Security:

Securing connections to Aspire

Document Level Security:

Solr Document security filtering
Google Search Appliance filtering - Using any Aspire connector that provides the document ACLs is enough and normal GSA filtering works.
SharePoint 2013 filtering (on premise)

The release of Aspire 2 included some major changes in administration. If you are administering Aspire 2 click here for more in-depth information.

If you are administering Aspire 1.x click here for more in-depth information.

Developer’s View

Aspire deployments are dynamically built from components and subcomponents. Aspire also includes the concept of “application bundles,” which are essentially groups of components pre-packaged to perform a specific function and have embedded files to define their look and feel within the Aspire Administration UI. System developers can easily combine components in various ways to process data according to the needs of the application.

Standard Aspire components can be mixed with custom 3rd-party components and with new components. The high level developer’s view of Aspire processing control is based on three major component types:

Component Managers
Pipeline Managers
Tokenization Manager

If you are developing for Aspire 2 click here for more in-depth information.

If you are developing for Aspire 1.x click here for more in-depth information.

Aspire Community vs. Enterprise Distributions

Aspire Features

Performance and reliability
- Distributed processing and automatic threading
- The ability to split document processing jobs into sub jobs that can run in parallel
- Standard technology for managing and restarting processes on servers for high availability
- Can be placed within an architecture for Backup Failover

Ease of administration
- Making dynamic (on-the-fly) configuration changes
- Dynamically adding new components
  - Dynamic refresh of component code
  - Rich built-in XML processing methods including XPath and XSLT
  - Hierarchical component configuration
- Rich and comprehensive web-based administration and control interface

A strong developer environment
- Intuitive workflow interface
- Supports processing content in diverse languages
- Easy mapping of document fields to search fields
- Rich built-in JSON and XML processing methods, including XPath, XSLT
- Use of scripting to build complex processing components
- Hierarchical component configuration
- Tightly integrated with Maven repositories for sharing and loading component code
- Sharing and loading component code
- Process streams of tokens, for performing text analytics
  - Entity extraction
  - Latent Semantic Analysis
  - Document vector creation and comparison
  - Topic Analysis

Support for security
- Handle Proxy LDAP requests, including:
  - Authenticating users
  - Determining user group membership across a multitude of systems

Support to Federate search requests
- Distribute queries to multiple search engines
- Merge search results

Support for Hadoop
- Ability to write to HDFS
- Ability to include Aspire within Map/Reduce jobs

Structure of an Aspire Solution

Aspire deployments can be divided into three high-level functional areas: content access, content processing, and publishing.

Content access fetches the documents and associated metadata from the content repositories. The applications that perform this function are called Aspire Connectors. These use the supported application programing interfaces (APIs) of target repositories to access content, metadata, and security credentials. Where available, Aspire connectors capture the full directory structure from the repository, to support browsable enterprise site maps.

Content processing analyses, augments, and transforms content. Depending on the needs of the application this can involve simple use of regular expressions to a wide range of complex semantic and statistical processing techniques. Content processing can spawn Hadoop Map/Reduce jobs for large processing tasks.

Publishing refers to the components in an Aspire deployment that are responsible for pushing the processed text from the content processing pipeline(s) to the target system, typically a search engine or file directory, in the correct form, and where available using the search engine’s ingestion API. The applications that perform this function are called Aspire Publishers. XML and JSON output is also available.

Functional Component Hierarchy

Component - atomic piece of Aspire logic
Configurable Component - single component wrapped with a DXF so it can be used with the Admin UI.
Application or Application Bundle - multiple component wrapped with a dxf and possibly configuration files).

Version Numbering

Aspire core releases are given version numbers to help identify what software an Aspire solution is built upon. The version number contains a major version, left most digit, that is reserved to denote the overall architecture. The second digit represents the minor version and denotes a release with new features. The third number, if present represents the stability release version, this denotes a release with multiple "bug fixes". In rare cases there can be a fourth digit if it is necessary to release a version with one or just a few bug fixes.

Currently the version numbers for Aspire connectors are the same as the major and minor releases. For example the current Jive connector and Aspire core are both 2.1. Over time the version numbers after the major digit can diverge. With the release of Aspire 2.0 the version dependencies between Aspire core and connectors and Publishers has been eliminated. This allows Search Technologies to release new versions of connectors or publishers between Aspire core releases. The major version number must always match.

Release Notes

Aspire release notes Open Source Components

Download Aspire

For Downloading Aspire 2.0 please follow this link:

Aspire 2.0 Downloads

For Aspire 1.X:

Aspire 1.X Downloads

Version Specific Information

This section has links to the detailed information for managing and developing for each major Aspire release. The release notes for each Aspire version can be accessed by clicking here. The release of Aspire 2 included significant enough changes we decided to create its own branch of the wiki while maintaining the Aspire 1.x branch. The two links below allow you to navigate to these branches

Aspire Versions 2.x

Aspire Versions 1.x and older

Page tree

Versions Compared

Old Version 3

New Version 4

Key

Introduction

What is Aspire used for?

Administrator’s View

Developer’s View

Aspire Community vs. Enterprise Distributions

Aspire Features

Structure of an Aspire Solution

Functional Component Hierarchy

Version Numbering

Release Notes

Download Aspire

Version Specific Information

Page tree

Page History

Versions Compared

Old Version 3

New Version 4

Key

Introduction

What is Aspire used for?

Administrator’s View

Developer’s View

Aspire Community vs. Enterprise Distributions

Aspire Features

Structure of an Aspire Solution

Functional Component Hierarchy

Version Numbering

Release Notes

Download Aspire

Version Specific Information