Page History

Aspire as a content acquisition/processing/indexing platform offers different features for each different task, the main ones are listed here.

Content Acquisition with Connectors

Built-in connectors to dozens of different data sources (see list of available connectors here)
- Scalable: Automatically distributes ingestion jobs across a cluster of nodes
- Elastic: Add and remove nodes at any time
- Resilient: Crawl state is carefully tracked at all points
  - Jobs on failed nodes are automatically picked up by other nodes
  - After a full system crash, crawling restarts from where it left off
- High Performance: Crawls are typically limited only by limitations on the source system
- Incremental: Automatically identifies incremental changes and processes only those changes
  - The method for detecting incremental changes is based on what is provided by the underlying content storage technology.

Content Publication with Publishers

Built-in publishers to most commonly available search engines
- Including but not limited to:
  - Elasticsearch
  - Solr
  - SharePoint
  - Google Cloud Search
  - Amazon Kendra
Content migration publishers for Cloud-based storage solutions
Real-Time Streaming systems such as
- Amazon Kinesis
- Apache Kafka

Easy Heading Free

navigationTitle	On this Page

Metadata/Content extraction & manipulation

Built-in components for many common content processing tasks
- Such as text extraction, OCR, field mapping, domain mapping, archive file extraction, etc.

Scripting for easy manipulation of metadata
Document rendering as images (for thumbnail previews)

Document Level Security

Fully understands document-level security
- Ingests ACLs for each content source
- Provides cached, high-performance group-expansion* for each content source
  - *group-expansion is a process where the user-group memberships are flattened in such a way that given any user, a flat list of its groups are listed, even the parent groups of the ones directly assigned to them.
    - For example
      - user Ann has been assigned to the Developers group only
      - group Developers is part of another group called IT_Operations
      - After the group-expansion process, Ann is listed to be part of both Developers and IT_Operations
- Multi-domain identity extraction and mapping
- Identity publication

Customizable components

Aspire is designed to be able to host and use independent components (connectors, publishers and content-manipulation are components), if there are no built-in components for what you need, Aspire provides a set of SDK frameworks to develop new components.

So you can:

Create custom connectors and publishers
Create custom pipelines and workflow controls
Create custom components

Ease of deployment

Components and configurations are deployed through Maven
Properties allow for anything to be parameterized (e.g. server destinations, credentials, file directory locations, etc.)
Content source configurations can be exported from any cluster and imported on another
Container images for ease of deployment in Container orchestration tools such as Kubernetes

Page tree

Versions Compared

Old Version 19

New Version Current

Key