Aspire as a content acquisition/processing/indexing platform offers different features for each different task, the main ones are listed here.

Content Acquisition with Connectors

  • Built-in connectors to dozens of different data sources (see list of available connectors here)
    • Scalable:  Automatically distributes ingestion jobs across a cluster of nodes
    • Elastic:  Add and remove nodes at any time
    • Resilient:  Crawl state is carefully tracked at all points
      • Jobs on failed nodes are automatically picked up by other nodes
      • After a full system crash, crawling restarts from where it left off
    • High Performance:  Crawls are typically limited only by limitations on the source system
    • Incremental:  Automatically identifies incremental changes and processes only those changes
      • The method for detecting incremental changes is based on what is provided by the underlying content storage technology.

Content Publication with Publishers

  • Built-in publishers to most commonly available search engines
    • Including but not limited to:
      • Elasticsearch
      • Solr
      • SharePoint
      • Google Cloud Search
      • Amazon Kendra
  • Content migration publishers for Cloud-based storage solutions
  • Real-Time Streaming systems such as
    • Amazon Kinesis
    • Apache Kafka

Metadata/Content extraction & manipulation

  • Built-in components for many common content processing tasks
    • Such as text extraction, OCR, field mapping, domain mapping, archive file extraction, etc.
  • Scripting for easy manipulation of metadata
  • Document rendering as images (for thumbnail previews)

Document Level Security

  • Fully understands document-level security
    • Ingests ACLs for each content source
    • Provides cached, high-performance group-expansion* for each content source
      • *group-expansion is a process where the user-group memberships are flattened in such a way that given any user, a flat list of its groups are listed, even the parent groups of the ones directly assigned to them.
        • For example
          • user Ann has been assigned to the Developers group only
          • group Developers is part of another group called IT_Operations
          • After the group-expansion process, Ann is listed to be part of both Developers and IT_Operations
    • Multi-domain identity extraction and mapping
    • Identity publication

Customizable components

Aspire is designed to be able to host and use independent components (connectors, publishers and content-manipulation are components), if there are no built-in components for what you need, Aspire provides a set of SDK frameworks to develop new components.

So you can:

  • Create custom connectors and publishers
  • Create custom pipelines and workflow controls
  • Create custom components

Ease of deployment

  • Components and configurations are deployed through Maven
  • Properties allow for anything to be parameterized (e.g. server destinations, credentials, file directory locations, etc.)
  • Content source configurations can be exported from any cluster and imported on another
  • Container images for ease of deployment in Container orchestration tools such as Kubernetes
  • No labels