...
Repository Type | Connector | Description | Aspire 4.x | Aspire 5.x |
---|---|---|---|---|
File System | File System | Extracts documents from a locally accessible File System path | ✓ | ✓ |
SMB | Extracts documents from remote sharing servers using the Server Message Block (SMB) protocol. | ✓ | ✓ | |
FTP | Extracts documents from remote servers using the File Transfer Protocol (FTP) | ✓ | ☐ | |
Extracts documents from S3 buckets in any region on AWS | ✓ | ✓ | ||
Box.com | Extracts documents from Box.com | ✓ | ☐ | |
HDFS | Extracts documents from the Hadoop Distributed File System (HDFS) via WebHDFS | ✓ | ☐ | |
OneDrive OneDrive | Extracts documents from Microsoft OneDrive accounts | ✓ | ✓ | |
Extracts documents from Microsoft Azure Data Lake Store cloud | ✓ | ✓ | ||
Extracts documents from Microsoft Azure Blob Storage service | ✓ | ✓ | ||
Extracts documents from Microsoft Azure File Storage service | ✓ | ✓ | ||
Events, Messaging and Streaming | Extract events from Microsoft Azure Events Hub service | ✓ | ✓ | |
Extracts events from Apache Kafka event streaming platform | ✓ | ✓ | ||
Extracts records from Amazon Kinesis Data Streams | ✓ | ☐ | ||
RSS | Extracts items from RSS feeds | ✓ | ☐ | |
Relational Databases | RDB via Table | Extracts content from Relational Database SQL queries, and performs incremental updates based on Update-Table queries. | ✓ | ✓ |
RDB via Snapshots | Extracts content from Relational Database SQL queries, and performs incremental update by using a content digest Snapshot table | ✓ | ✓ | |
Scans all databases within a server, extracts table information from all databases and extracts rows from all tables. | ✓ | ✓ | ||
Content Management Systems | Documentum | Extracts documents stored in docbases, cabinets, folders, and sub-folders within Documentum | ✓ | ☐ |
Extracts documents using DQL query language for full and incremental crawls. ACLs extraction is also expressed as DQL statements. | ✓ | ✓ | ||
The Dropbox connector can crawl Pages, Folders and Files from a Dropbox repository. It does identity Crawling, can execute snapshot-based Incrementals and respects document hierarchy. | ☐ | ✓ | ||
SharePoint SharePoint 2013 | Extracts documents from Microsoft SharePoint 2013 (sites, lists, external lists, folders, documents or list items, attachments) | ✓ | ✓ | |
SharePoint SharePoint 2016 | Extracts documents from Microsoft SharePoint 2016 (sites, lists, external lists, folders, documents or list items, attachments) | ✓ | ✓ | |
SharePoint SharePoint 2019 | Extracts documents from Microsoft SharePoint 2019 (sites, lists, external lists, folders, documents or list items, attachments) | ☐ | ✓ | |
Extracts documents from Microsoft SharePoint Online (sites, lists, external lists, folders, documents or list items, attachments) | ✓ | ✓ | ||
Collaboration | Extracts documents from Confluence repositories, including spaces, blogs, pages, attachments, and comments | ✓ | ✓ | |
IBM Connections | Extracts content from IBM Connections servers including Activities, Blogs, Bookmarks, Files, Forums, Wikis, Profiles, and Communities | ✓ | ☐ | |
Atlassian Jira | Extracts content from different Jira issue types: (Bug, CCB, Device Profile, Epic, Improvement, Information, Inquiry, New Feature, Question, etc.) | ✓ | ☐ | |
Extracts content from Salesforce including Accounts, Campaigns, Cases, Contracts, Contacts, Chatters, Documents, Groups, Ideas, Leads, Opportunities, Partners, Pricebooks, Products, Profiles, Solutions, Tasks, User, Knowledge Articles and Attachments. | ✓ | ✓ | ||
Extracts content from ServiceNow including Knowledge Articles, Article Categories, Knowledge Bases, Attachments, ACLs, Users, and Catalog Items | ✓ | ✓ | ||
Extracts content from an Adobe Experience Manager (AEM) server, including all page and asset objects | ✓ | ✓ | ||
Extracts content from Veeva Vault using a Vault Query Language (VQL) statement. | ☐ | ✓ | ||
Kinesis | Fetches data from Amazon Kinesis Data Streams. | ✓ | ☐ | |
CRM | RightNow | Extracts content from a RightNow instance including Answers, Attachments, and Incidents | ✓ | ☐ |
Web Crawler | Extract pages and documents from websites by following links inside HTML pages. Static websites supported. Multiple Authentication mechanisms. | ✓ | ✓ | |
Extract pages and documents from websites by following links inside HTML pages. Dynamic websites supported. Uses the Selenium framework to render the pages in real browser instances. Highly flexible crawling by scripting behaviors on the browser. | ✓ | ✓ | ||
Social Networks | Jive | Extracts content from any Jive Community using REST API v3. Includes documents stored in spaces, groups, projects, blogs, and any sub-folders. | ✓ | ☐ |
Extracts tweets and metadata from any Twitter account, including Tweet Text, URL Links, Geo Location, Hashtags, User mentions, Media entities, Retweet count | ✓ | ☐ | ||
Yammer | Extracts content from Yammer messages by Group, Thread, and Topic. | ✓ | ☐ | |
NoSQL Database | HBase | Extracts content stored in the objectData field of the tables in an HBase server. | ✓ | ☐ |
Extracts documents stored in an Elasticsearch index using a query to filter the documents to extract. | ✓ | ✓ | ||
Identity Providers | Group Expansion | The Group Expansion connector can crawl and expand identities from the Identity Cache. | ☐ | ✓ |
LDAP Identity | Retrieves users, groups, and memberships from any LDAP server | ✓ | ✓ | |
Retrieves users, groups, and memberships stored in Confluence repositories. | ✓ | ✓ | ||
Retrieves users, groups, and memberships from Azure Active Directory. | ✓ | ✓ | ||
Other | MS Exchange | Extracts content from the Exchange Servers including Mail (and attachments), Calendar and Contact | ☐ | ☐ |
The REST connector can retrieve data from any JSON-based REST endpoint. | ☐ | ✓ |
...