Content Acquisition with Connectors
- Built-in connectors to dozens of different data sources (see list of available connectors here link to connectors)
- Scalable: Automatically distributes ingestion jobs across a cluster of nodes
- Elastic: Add and remove nodes at any time
- Resilient: Crawl state is carefully tracked at all points
- Jobs on failed nodes are automatically picked up by other nodes
- After a full system crash, crawling restarts from where it left off
- High Performance: Crawls are typically limited only by limitations on the source system
- Incremental: Automatically identifies incremental changes and processes only those changes
- The method for detecting incremental changes is based on what is provided by the underlying content storage technology.
Content Publication with Publishers
- Built-in publishers to most commonly available search engines
- Including but not limited to:
- Elasticsearch
- Solr
- SharePoint
- Google Cloud Search
- Amazon Kendra (should we include this?)
- Content migration publishers for Cloud-based storage solutions such as
- Amazon S3
- Google Cloud Storage (is this available?)
- Azure Blob Storage (is this available?)
- Real-Time Streaming systems such as
- Amazon Kinesis
- Apache Kafka
Content manipulation
- Built-in components for many common content processing tasks
- Such as text extraction, OCR, field mapping, domain mapping, archive file extraction, etc.
- Scripting for easy manipulation of metadata
- Document rendering as images (for thumbnail previews)
Document Level Security
- Fully understands document-level security
- Ingests ACLs for each content source
- Provides cached, high-performance group-expansion* for each content source
- *group-expansion is a process where the user-group memberships are flattened in such a way that given any user, a flat list of its groups are listed, even the parent groups of the ones directly assigned to them.
- For example
- user Ann has been assigned to the Developers group only
- Developers is part of another group called IT_Operations
- After the group-expansion process, Ann is listed to be part of both Developers and IT_Operations
- Multi-domain identity extraction and mapping
- Identity publication
Customizable components
Aspire is designed to be able to host and use independent components (connectors, publishers and content-manipulation are components), if there are no built-in components for what you need, Aspire provides a set of