Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Aspire as a content acquisition/processing/indexing platform offers different features for each different task, the main ones are listed here.

Features on this page:

Table of Contents

Content Acquisition with Connectors

  • Built-in connectors to dozens of different data sources (see list of available connectors here
link to connectors
  • )
    • Scalable:  Automatically distributes ingestion jobs across a cluster of nodes
    • Elastic:  Add and remove nodes at any time
    • Resilient:  Crawl state is carefully tracked at all points
      • Jobs on failed nodes are automatically picked up by other nodes
      • After a full system crash, crawling restarts from where it left off
    • High Performance:  Crawls are typically limited only by limitations on the source system
    • Incremental:  Automatically identifies incremental changes and processes only those changes
      • The method for detecting incremental changes is based on what is provided by the underlying content storage technology.

Content Publication with Publishers

  • Built-in publishers to most commonly available search engines
    • Including but not limited to:
      • Elasticsearch
      • Solr
      • SharePoint
      • Google Cloud Search
      • Amazon Kendra
(should we include this?)
  • Content migration publishers for Cloud-based storage solutions
such asAmazon S3
  • Google Cloud Storage (is this available?)
  • Azure Blob Storage (is this available?)
    • Real-Time Streaming systems such as
      • Amazon Kinesis
      • Apache Kafka

    Easy Heading Free
    navigationTitleOn this Page

    Metadata/Content extraction & manipulation

    • Built-in components for many common content processing tasks
      • Such as text extraction, OCR, field mapping, domain mapping, archive file extraction, etc.
    • Scripting for easy manipulation of metadata
    • Document rendering as images (for thumbnail previews)

    Document Level Security

    • Fully understands document-level security
      • Ingests ACLs for each content source
      • Provides cached, high-performance group-expansion* for each content source
        • *group-expansion is a process where the user-group memberships are flattened in such a way that given any user, a flat list of its groups are listed, even the parent groups of the ones directly assigned to them.
          • For example
            • user Ann has been assigned to the Developers group only
            • group Developers is part of another group called IT_Operations
            • After the group-expansion process, Ann is listed to be part of both Developers and IT_Operations
      • Multi-domain identity extraction and mapping
      • Identity publication

    Customizable components

    Aspire is designed to be able to host and use independent components (connectors, publishers and content-manipulation are components), if there are no built-in components for what you need, Aspire provides a set of SDK frameworks to develop new components.

    So you can:

    • Create custom connectors and publishers
    • Create custom pipelines and workflow controls
    • Create custom components

    Ease of deployment

    • Components and configurations are deployed through Maven
    • Properties allow for anything to be parameterized (e.g. server destinations, credentials, file directory locations, etc.)
    • Content source configurations can be exported from any cluster and imported on another
    • Container images for ease of deployment in Container orchestration tools such as Kubernetes