Aspire is a framework and libraries of extensible components designed to enable creation of solutions to acquire data from one or more content repositories (such as file systems, relational databases, cloud storage, or content management systems), extract metadata and text from the documents, analyze, modify and enhance the content and metadata if needed, and then publish each document, together with its metadata, to a search engine or other target application.
Aspire uses Apache Felix (an open source implementation of OSGi) to install, start, stop, update, and uninstall Aspire components and applications without requiring a reboot, supporting improved uptime and making system administration easier. Each individual piece of processing functionality within Aspire is a modular component that can be used by itself, or in conjunction with other components to create an Aspire application.
What is Aspire Used For?
Aspire is used in many types of customer applications. Here are some examples.
- Enterprise search to enrich content with additional metadata to support advanced navigation.
- Staffing and recruitment to provide search and match solutions between candidate CVs and job descriptions
- State government information site to extract metadata from OCR files and normalize the data prior to indexing
- Records management to automatically categorize corporate data as it is migrated into SharePoint where content needs to be aggregated and categorized before searching
- Legal research to find and analyze content for forward and reverse citations to other content to improve recall and analysis
- Company intranet to automatically create enterprise-wide sitemaps for browsing style investigation
- Federal government information site to intelligently split up large single files pertaining to laws into searchable” chapters and clauses
- Basic content access (connector) to one or more content repositories for search engines
- Analyzing and grouping content geospatially for localization
Aspire is extremely flexible. By pulling the data processing pipelines out of the search engine, Aspire can more powerfully and efficiently manipulate content and metadata, can process it in multiple pipelines simultaneously (and over multiple machines)for higher performance, and then feed it to one or more engines for indexing.
The Aspire framework supports creating Natural Language Processing (NLP), Machine Learning, and other analytic processing for text through a rich set of basic components.
More detailed descriptions can be found on this page: Natural Language Processing (NLP)
If you want to start using Aspire, see here.
The administrator is responsible for installing, configuring, and maintaining Aspire deployments. Aspire deployments are managed through a web-based, point-and-click interface, the same used by the Aspire developer, however it is expected that an administrator only needs to fill in configuration information. Depending on your environment, you may wish to have a single Aspire system administrator, or you may wish to have several, each responsible for different content sources.
The System Administration UI has the following main functions.
Content Source administration functions include
- Configuring properties for Aspire connectors
- Setting up crawling schedules for repositories
- Managing full and incremental crawls
- Managing security
- Monitoring system health and performance
- Monitoring crawl statistics and performance
- Index Auditing
Administration UI Security
Document Level Security
Aspire deployments are dynamically built from components and subcomponents. Aspire also includes the concept of “application bundles,” which are essentially groups of components pre-packaged to perform a specific function and have embedded files to define their look and feel within the Aspire Administration UI. System developers can easily combine components in various ways to process data according to the needs of the application.
Standard Aspire components can be mixed with custom 3rd-party components and with new components. The high level developer’s view of Aspire processing control is based on three major component types:
- Component Managers
- Pipeline Managers
- Tokenization Manager