Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The RDB via Table connector crawls content from any relational database that can be accessed using JDBC.

  • The connector extracts data based on SQL statements and submits this data into Aspire for processing
  • The connector directly extracts the data. This means that there is not a fetch data phase. However, if your database includes references to external data (say URLs to web sites or paths of external files), then a fetch stage may be invoked.

Easy Heading Free
navigationTitleOn this Page
navigationExpandOptionexpand-all-by-default

Introduction


The Group Expansion connector can crawl and expand identities from the Identity Cache. The Identity Cache is part of Aspire crawl state database. Typically Elastic Search is used as a repository for crawl state database and the Identity Cache is stored in the index aspire-identitycache. The Identity Cache serves connectors as a storage for their identities like groups and users. For example we can find LDAP users and groups, Confluence users and groups etc in the Identity Cache. The purpose of Group Expansion Connector is to crawl identities for required seeds, do group expansion and publish the expanded identities. The connector also supports custom mapping configuration for selected seeds.

What is group expansion

Let us use this example. If the user User1 is part of the group Grp1 and the group Grp1 is part of the group Grp0 then the result of the group expansion is updated information for the user User1. Instead of just having the group Grp1 in the user information we will have now the list of 2 groups - Grp1, Grp0 - there. This is what expanding groups means.

What is custom mapping

Some seeds require custom mapping for key attribute names. For example we may need to map an user name coming from Confluence connector to the standard AD name. We support two kinds of mapping:

  • local - we can define which  attribute name from the current identity should be used as an identity key
  • external - we can define the seed and mapping attributes to fetch identities for the purpose of mapping from. This would be typically LDAP/AD seeds.

Environment and Access Requirements

Repository Support

The Group Expansion connector crawls identities from the identity cache. The Identity Cache is part of Aspire crawl state database. Typically Elastic Search is used as a repository for crawl state database and the Identity Cache is stored in the index aspire-identitycache.

Account Privileges

Not relevant here

RDB via Table connector features include the following:

  • Connects to a database server using JDBC drivers (which must be downloaded separately)
  • Performs full crawling
  • Performs incremental crawling, so that only new or updated documents are indexed, using tables to hold identifiers of updated content
  • Fetches data from the database using SQL statements
  • Is search engine independent
  • Runs from any machine with access to the given database

Environment and Access Requirements


Repository Support

JDBC Drivers

The RDB via Table connects to databases via JDBC, so you'll need the appropriate JDBC client (driver) JAR file for the database you want to connect to. These are available for most (if not all) major database vendors, and your first port of call for the driver should be the vendor's website.

Account Privileges

A prerequisite for crawling any RDBMS is to have an RDBMS account. The recommended name for this account is "aspire_crawl_account" or something similar. The username and password for this account will be required below.

The "aspire_crawl_account" will need to have sufficient access rights to read all of the documents in the RDBMS that you wish to crawl.

To set the rights for your "aspire_crawl_account", do the following:

  1. Log into the RDBMS as an Administrator.
  2. Make the role of the "aspire_crawl_account" either administrator or superuser (so that it has access to all RDBMS content).

You will need this login information later in these procedures, when entering properties for your RDB Connector via Table.

Environment Requirements

No special requirements here

Framework and Connector Features


Framework Features

NameSupported
Content Crawlingyes
Identity Crawlingno
Snapshot-based Incremental syesno
Non-snapshot-based Incremental snoyes
Document Hierarchyno

Connector Features

The Group Expansion connector has the following features:

  • Seeds filtering using include and exclude lists.
  • Custom mapping configuration for selected seeds

connector can operate in two modes: full and incremental.

Important:  The data submitted to Aspire by this connector is dependent entirely on the SQL that's configured. Therefore, it is quite possible to submit all of the data in an incremental crawl, or only some of the data in a full crawl.

Full Mode

In full mode, the connector executes a single SQL database statement and submits each row returned for processing in Aspire.

Incremental Mode

In incremental mode, there are three stages of processing: preprocessing, crawling, and post-processing.

1 - Pre-processing

(Optional) This stage runs a SQL statement against the database that can be used to mark rows to crawl (i.e., they have changed since the previous run).

2 - Crawling

This stage (similar to full mode) executes a single SQL database statement and submits each row returned for processing in Aspire. Typically, the result set is a subset of the full data that may be filtered using information updated in the (optional) pre-processing stage.

3 - Post-processing

(Optional) Each row of data submitted to Aspire can execute a SQL statement to update its status in the database. This may be to reset a flag set in the (optional) pre-processing stage, thereby denoting the item as processed. Different SQL can be executed for rows that were successfully processed versus ones that were not.

Click here to find out various crawling options

Content Crawled


The content retrieved by the connector is defined entirely using SQL statements, so you can select all or subsets of columns from one or more tables. Initially, the data is inserted into Aspire using the returned column names, but this may be changed by further Aspire processing.

The RDB via Tables

Content Crawled

The Group Expansion connector is able to crawl the following objects:

NameType Relevant MetadataContent Fetch & ExtractionDescription
Seeddatabase row
containertable fieldsNAThe identities are grouped by seeds and we crawl identities belonging to seedsIdentity
  • key
  • source
  • groups
NAThe identities with expanded groupsFields requested by SQL

Limitations


No limitations defined