Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The Confluence connector will crawl content from any Confluence content repository. The connector will retrieve spaces, pages, blogs, attachments, and comments.

The connector uses the Confluence REST API to crawl Confluence content, and we support both Confluence On-premise installation and Cloud installation.

Easy Heading Free
navigationTitleOn this Page
navigationExpandOptionexpand-all-by-default

Introduction

The RDB via Table connector features include the following:


Some features of the Confluence connector include:

  • Performs incremental crawling (
  • Connects to a database server using JDBC drivers (which must be downloaded separately)
  • Performs full crawling
  • Performs incremental crawling,
    • so that only new
    or
    • /updated documents are indexed
    , using tables to hold identifiers of updated contentFetches data from the database using SQL statements
    • ).
    • Fetches access control lists (ACLs) for document level security.
    • Is search engine independent.
    • Runs from any machine with access to the given
    database
    • Confluence URLs.
    • It supports HTTP and HTTPs..

    For a Complete tutorial on Confluence, see here

    Summary of Confluence organization

    This is the hierarchy of spaces/pages/blogs/attachments/comments for Confluence versions:

    • Dashboard: Is the first page you see when login to Confluence, it provides quick access to the top-level features of Confluence.
      • Spaces: Spaces are containers that group content related to a specific theme or topic. Spaces contain pages and blogs.
        • Pages: Like a web page or a page in a book, pages are places where you write content related to a specific theme or topic. Pages can contain attachments and comments.
          • Attachments: Documents (images, files, videos, etc.) that are embedded in a page or blog and contain relevant information about the topic or theme the page/blog is talking about.
          • Comments: Remarks users leave on a page or blog to share information with other users.
        • Blogs: A blog is a discussion or informational site published on the World Wide Web and consisting of discrete entries (“posts”) typically displayed in reverse chronological order. Blog. Confluence blogs can contain attachments and comments.
          • Attachments
          • Comments

    Environment and Access Requirements


    Repository Support

    The Aspire Confluence connector was created and tested using version 7.19.2 of Confluence.

    Before installing the Confluence connector, make sure that:

    • Confluence is up and running.
    • The Confluence REST API is available.
    • You have all the certificates you need to log into the site if your Confluence instance is on a secure connection (HTTPS)
    • You have a Confluence client login with sufficient permissions to crawl documents for indexing (at least Admin level permissions)

    Account Privileges

    To access Confluence, a user account with sufficient privileges must be supplied. It is recommended that the account be the site administrator

    Environment and Access Requirements

    Repository Support

    JDBC Drivers

    The RDB via Table connects to databases via JDBC, so you'll need the appropriate JDBC client (driver) JAR file for the database you want to connect to. These are available for most (if not all) major database vendors, and your first port of call for the driver should be the vendor's website.

    Account Privileges

    A prerequisite for crawling any RDBMS is to have an RDBMS account. The recommended name for this account is "aspire_crawl_account" or something similar. The username and password for this account will be required below.

    The "aspire_crawl_account" will need to have sufficient access rights to read all of the documents in the RDBMS that you wish to crawl.

    To set the rights for your "aspire_crawl_account", do the following:

    1. Log into the RDBMS as an Administrator.
    2. Make the role of the "aspire_crawl_account" either administrator or superuser (so that it has access to all RDBMS content).

    You will need this login information later in these procedures, when entering properties for your RDB Connector via Table.

    Environment Requirements

    No special requirements here

    Framework and Connector Features


    Framework Features

    NameSupported
    Content Crawlingyes
    Identity Crawlingnoyes
    Snapshot-based Incremental snoyes
    Non-snapshot-based Incremental syesno
    Document Hierarchynoyes

    Connector Features

    The connector can operate in two modes: full and incremental.

    Important:  The data submitted to Aspire by this connector is dependent entirely on the SQL that's configured. Therefore, it is quite possible to submit all of the data in an incremental crawl, or only some of the data in a full crawl.

    Full Mode

    In full mode, the connector executes a single SQL database statement and submits each row returned for processing in Aspire.

    Incremental Mode

    In incremental mode, there are three stages of processing: preprocessing, crawling, and post-processing.

    1 - Pre-processing

    (Optional) This stage runs a SQL statement against the database that can be used to mark rows to crawl (i.e., they have changed since the previous run).

    2 - Crawling

    This stage (similar to full mode) executes a single SQL database statement and submits each row returned for processing in Aspire. Typically, the result set is a subset of the full data that may be filtered using information updated in the (optional) pre-processing stage.

    3 - Post-processing

    (Optional) Each row of data submitted to Aspire can execute a SQL statement to update its status in the database. This may be to reset a flag set in the (optional) pre-processing stage, thereby denoting the item as processed. Different SQL can be executed for rows that were successfully processed versus ones that were not.

    Click here to find out various crawling options


    Content Crawled


    The content retrieved by the connector is defined entirely using SQL statements, so you can select all or subsets of columns from one or more tables. Initially, the data is inserted into Aspire using the returned column names, but this may be changed by further Aspire processing.The RDB via Tables connector is able to Confluence connector can crawl the following objects:

    Fields requested by SQL
    NameType Relevant MetadataContent Fetch & ExtractionDescription
    spacecontainerspaces database rowtable fieldsNA
    blogcontainerblog fieldsyes
    pagecontainerpage fieldsyes
    attachmentdocumentattachment fieldsyes

    Limitations


    No limitations defined