The Confluence connector will crawl content from any Confluence content repository. The connector will retrieve spaces, pages, blogs, attachments and comments.

The connector uses the Confluence REST API to crawl Confluence content and we support both Confluence On-premise installation and Cloud installation.

Introduction

Some of the features of the Confluence connector include:

Performs incremental crawling (so that only new/updated documents are indexed)
Fetches access control lists (ACLs) for document level security
Is search engine independent
Runs from any machine with access to the given Confluence URLs
It suppports HTTP and HTTPs.
Designed for supporting early binding mechanisms

For a Complete tutorial on Confluence see here

Summary of Confluence organization

This is the hierarchy of spaces/pages/blogs/attachments/comments for Confluence versions:

Dashboard: Is the first page you see when login to Confluence, it provides quick access to the top level features of Confluence.
- Spaces: Spaces are containers that group content related to a specific theme or topic. Spaces contain pages and blogs.
  - Pages: Like a web page or a page in a book, pages are places where you write content related to a specific theme or topic. Pages can contain attachments and comments
    - Attachments: Documents (images, files, videos, etc) that are embedded in a page or blog and contain relevant information about the topic or theme the page/blog is talking about.
    - Comments: Remarks users leave on a page or blog to share information with other users.
  - Blogs: A blog is a discussion or informational site published on the World Wide Web and consisting of discrete entries ("posts") typically displayed in reverse chronological order. Blog. Confluence blogs can contain attachments and comments
    - Attachments
    - Comments

Environment and Access Requirements

Repository Support

The Aspire Confluence connector was created and tested using version Confluence 7.19.2

Before installing the Confluence connector, make sure that:

Confluence is up and running.
The Confluence REST API is available.
You have all the certificates you need to log into the site if your Confluence instance is in a secure connection (HTTPS)
You have a Confluence client login with sufficient permissions to crawl documents for indexing (at least Admin level permissions)

Account Privileges

In order to access Confluence a user account with sufficient privileges must be supplied. It is recommended that the account be the site administrator.

Environment Requirements

No special requirements here

Framework and Connector Features

Framework Features

Name	Supported
Content Crawling	yes
Identity Crawling	yes
Snapshot-based Incremental s	yes
Non-snapshot-based Incremental s	no
Document Hierarchy	yes

Connector Features

The connector can operate in two modes: full and incremental.

Content Crawled

The Confluence connector is able to crawl the following objects:

Name	Type	Relevant Metadata	Content Fetch & Extraction	Description
space		table fields	NA	Fields requested by SQL

Limitations

No limitations defined

Page tree

Introduction

Summary of Confluence organization

Environment and Access Requirements

Repository Support

Account Privileges

Environment Requirements

Framework and Connector Features

Framework Features

Connector Features

Content Crawled

Limitations

Contact Us: [email protected]

Page tree

Confluence Connector - Features

Introduction

Summary of Confluence organization

Environment and Access Requirements

Repository Support

Account Privileges

Environment Requirements

Framework and Connector Features

Framework Features

Connector Features

Content Crawled

Limitations

Contact Us: [email protected]