Page History

Content Acquisition with Connectors

Built-in connectors to dozens of different data sources (see list of available connectors here link to connectors)
- Scalable: Automatically distributes ingestion jobs across a cluster of nodes
- Elastic: Add and remove nodes at any time
- Resilient: Crawl state is carefully tracked at all points
  - Jobs on failed nodes are automatically picked up by other nodes
  - After a full system crash, crawling restarts from where it left off
- High Performance: Crawls are typically limited only by limitations on the source system
- Incremental: Automatically identifies incremental changes and processes only those changes
  - The method for detecting incremental changes is based on what is provided by the underlying content storage technology.

Content Publication with Publishers

Built-in publishers to most commonly available search engines
- Including but not limited to:
  - Elasticsearch
  - Solr
  - SharePoint
  - Google Cloud Search
  - Amazon Kendra (should we include this?)
Content migration publishers for Cloud-based storage solutions such as
- Amazon S3
- Google Cloud Storage (is this available?)
- Azure Blob Storage (is this available?)
Real-Time Streaming systems such as
- Amazon Kinesis
- Apache Kafka

Content manipulation

Built-in components for many common content processing tasks
- Such as text extraction, OCR, field mapping, domain mapping, archive file extraction, etc.

Scripting for easy manipulation of metadata
Document rendering as images (for thumbnail previews)

Document Level Security

Fully understands document-level security
- Ingests ACLs for each content source
- Provides cached, high-performance group-expansion* for each content source
  - *group-expansion is a process where the user-group memberships are flattened in such a way that given any user, a flat list of its groups are listed, even the parent groups of the ones directly assigned to them.
    - For example
      - user Ann has been assigned to the Developers group only
      - Developers is part of another group called IT_Operations
      - After the group-expansion process, Ann is listed to be part of both Developers and IT_Operations
- Multi-domain identity extraction and mapping
- Identity publication

Customizable components

Aspire is designed to be able to host and use independent components (connectors, publishers and content-manipulation are components), if there are no built-in components for what you need, Aspire provides a set of

...

SDK frameworks to develop new components.

So you can:

Create custom connectors and publishers
Create custom pipelines and workflow controls
Create custom components

Ease of deployment

Components and configurations are deployed through Maven
Properties allow for anything to be parameterized (e.g. server destinations, credentials, file directory locations, etc)
Content source configurations can be exported from any cluster and imported on another
Container images for ease of deployment in Container orchestration tools such as Kubernetes

Page tree

Versions Compared

Old Version 5

New Version 6

Key