Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

 

 

The IBM Connections connector will crawl content from IBM Connections server. 

Features


The IBM Connections connector will crawl content from any IBM Connections applications (Activities, Blogs, Bookmarks, Communities, Files, Forums, Profiles, and Wikis).

Note that the IBM Connections connector is not part of Aspire Enterprise bundle, however it may be purchased separately.

Some of the features of the IBM Connections connector include:

  • Performs incremental crawling (so that only new/updated/deleted documents are indexed)
  • Metadata extraction.
  • Fetches access control lists (ACLs) for document-level security (For this feature you need to configure the LDAP credentials in the connector UI section "Extract ACL").
  • Is search engine independent
  • Runs from any machine with access to the given IBM Connections Site
  • Filter the crawled documents by file names using regex patterns.

Content Retrieved


The IBM Connections connector retries several types of documents, listed bellow are the inclusions and exclusions of these documents.

Include

Exclude

  • Only exclude Trash documents

Limitations 


Due to API limitations, IBM Connections connector has the following limitations:

  • Example limitation
    • Example Reason

Operation Mode 

 The connector will use the ATOM API use the Seedlist service provider interface (SPI) provided with IBM Connections to integrate a search engine with IBM Connections content over HTTP or HTTPS to communicate with IBM ConnectionsThe connector acquires content by doing the following:

  • Go recursively through all items and documents of an eRoom site, creates sub-jobs for each object discovered. Each sub-job contains all metadata available, including ACLs.
  • Saves the item states into a MongoDB instance in order to compare and perform the incremental crawls with added, updated and deleted items.

    The following features are not currently implemented, but are on the development plan:

    • Example future plan
    Anything we should add? Please let us know
    • Send a GET request to the seedlist feed for all application whose data you want to crawl.
      http://<servername>/activities/seedlist/myserver 
      http://<servername>/blogs/seedlist/myserver
      http://<servername>/dogear/seedlist/myserver
      http://<servername>/communities/seedlist/myserver
      http://<servername>/files/seedlist/myserver 
      http://<servername>/forums/seedlist/myserver
      http://<servername>/profiles/seedlist/myserver
      http://<servername>/wikis/seedlist/myserver

       

    • Process the returned feed. Find the rel=next link and send a GET request to the web address specified by its href attribute.
    • Repeat the previous two steps until the response includes a <wplc:timestamp> element in its body.
    • Store the value of the <wplc:timestamp> element; you must pass that value as a parameter when you perform a subsequent crawl of the data.