Crawling IBM Connections Tutorial (Aspire 2)

This tutorial walks through the steps necessary to crawl a IBM Connections Site using the Aspire IBM Connections connector.

Step 1: Set IBM Connections Access Rights

The "aspire_crawl_account" must be a user login with search-admin role to read all of the documents for all available applications in IBM Connections that you wish to crawl. IBM Connections requires WebSphere® Application Server.

To set the rights for your "aspire_crawl_account", do the following:

Log in to the WebSphere Application Server Integrated Solutions Console on the IBM Connections server as a WebSphere Application Server administrative user.
Select Applications > Enterprise Application.
When the list of enterprise applications is displayed, select an application and click Security role to user/group mapping.
Assign an administrative user to the role named search-admin.
Click OK twice, and save the changes to the master configuration.
Repeat steps 3, 4, and 5 for all enterprise applications in IBM Connections.

You will need this login information later in these procedures, when entering properties for your IBM Connections Connector.

Step 2: Launch Aspire and open the Content Source Management Page

Launch Aspire (if it's not already running). See:

Launching Aspire

Browse to: http://localhost:50505. For details on using the Aspire Content Source Management page, please refer to UI Introduction.

Step 3: Add a new IBM Connections Content Source

Aspire Content Source Management Page

To create a new content source:

From the Aspire 2 Home page, click on "Add Source" button.
Click on "IBM Connections Connector".

If IBM Connections content source is not in the list, you can choose Custom with the following coordinates.

Group id: com.searchtechnologies.aspire Artifact Id: app-ibmconnections-connector Version 2.0

Step 3a: Specify Basic Information

General Configuration Tab

In the "General" tab in the Add New Content Source window, specify basic information for the content source:

Enter a content source name in the "Name" field.
This is any useful name which you decide is a good name for the source. It will be displayed in the content source page, in error messages, etc.
Click on the "Active?" checkbox to add a checkmark.
Unchecking the "Active?" option allows you to configure content sources but not have them enabled. This is useful if the folder will be under maintenance and no crawls are wanted during that period of time.
Click on the "Schedule" drop-down list and select one of the following: Manually, Periodically, Daily, or Weekly.
Aspire can automatically schedule content sources to be crawled on a set schedule, such as once a day, several times a week, or periodically (every N minutes or hours). For the purposes of this tutorial, you may want to select Manually and then set up a regular crawling schedule later.
After selecting a Schedule type, specify the details, if applicable:
Manually: No additional options. Periodically: Specify the "Run job every:" options by entering the number of "hours" and "minutes." Daily: Specify the "Start time:" by clicking on the hours and minutes drop-down lists and selecting options. 'Weekly: Specify the "Start time:" by clicking on the hours and minutes drop-down lists and selecting options, then clicking on the day checkboxes to specify days of the week to run the crawl.

Step 3b: Specify the Connector Information

Connector Configuration Tab

In the "Connector" tab, specify the connection information to crawl the IBM Connections repository.

Server URL: IBM Connections Server URL to crawl.
User Name, and Password: The search administrator user and password used in previous section (Set IBM Connections Access Rights).
Page Size: Specifies the number of entries per page to return in the crawl.
Use SSL: Indicates if the connector is going to use authentication to connect.
1. SSL Certificate Directory: Path of the websphere trust store (base on the security certificate. See Prerequisites). If required.
2. SSL Password: Password of the certificate.
Extract ACLs: IBM Connections uses an specific type of ACL (<wplc:acl>4F6A3B85-BF45-40B3-89E9-932080F215D5</wplc:acl>), in order to convert this Uuid value to a propert LDAP user, we need to connect to LDAP used by IBM Connections server.
1. LDAP URL: IBM Connections Server URL to crawl.
2. User Name (Distinguised Name(DN)): User credential to connect to LDAP. (Ex. cn=administrator, cn=Users, dc=stlab,dc=local)
3. DN Password: Password to connect to LDAP.
4. All User and Groups Search Filter: Ldap filter to get all users and groups. (Ex. (|(objectClass=person)(objectClass=group)))
5. LDAP Search Base: Ldap base to start looking for users and groups. (Ex. cn=Users,dc=stlab,dc=local)
6. GUID attribute: Ldap attribute for user's object GUID.
7. Name attribute: Ldap attribute for the user's name.
Index Specific Endpoints: Indicates if the connector will crawl all available applications (Activities, Blogs, Bookmarks, Communities, Files, Forums,

Profiles, Wikis, Libraries, Communities Events and Status Update). If is selected, the user could choose the application one by one.

Advance Properties (optional)

This section shows how to configure the advance properties of the connector. Also you can check Connector Properties for more info

IBM Connection Working Directory: Directory where timestamps will be placed.

Advance Properties

Group Expansion

IBM Connections connector does not include group expansion. The connector will use the Extract ACLs section in order to extract and match the acls returned by IBM Seed List (SPI) with the LDAP server (Please, make sure you are using the same LDAP configuration in the connector and in the general Group Expansion Manager).

See more info on the Group Expansion Service (Aspire 2)

Step 3c: Specify Workflow Information

Workflow Configuration Tab

In the "Workflow" tab, specify the workflow steps for the jobs that come out of the crawl. Drag and drop rules to determine which steps should an item follow after being crawled. This rules could be where to publish the document or transformations needed on the data before sending it to a search engine. See Workflow for more information.

For the purpose of this tutorial, drag and drop the Publish To File rule found under the Publishers tab to the onPublish Workflow tree.
1. Specify a Name and Description for the Publisher.
2. Click Add.

After completing this steps click on the Save button and you'll be sent back to the Home Page.