Page tree
Skip to end of metadata
Go to start of metadata

The eRoom connector will crawl content from an eRoom server instance (site) with an enabled XML Query option (Allow XML queries and commands from external applications).


Features


Some of the features of the eRoom connector include:

  • Performs incremental crawling (so that only new/updated documents are indexed).
  • Fetches LDAP (including Active Directory) access control lists (ACLs) for document-level security (including users and groups).
  • Metadata extraction.
  • Is search engine independent
  • Runs from any machine with access to the given eRoom site.
  • Designed to support early binding mechanisms and group expansion of nested permissions.
  • Filter the crawled documents by file names using regex patterns.
  • Supports Windows/Linux/MacOS file shares.


Content Retrieved


The eRoom connector retrieves several types of documents, listed below are the inclusions and exclusions of these documents.

Include

  • Folder 
  • Discussion

  • Note

  • Poll
  • Calendar and Events
  • Project Plan and Tasks
  • Database and Rows
  • Inbox (basic information)
  • Other Files
  • Link
  • Dashboard 

Exclude

  • Comments on items
  • Emails on Inbox item
  • Content of Link addressed pages


Limitations 


Due to API limitations, the eRoom connector has the following limitations:

  • Multi-Threads Technical limitation

    With Aspire 4.0 we introduced a multi-threading platform to perform crawls, but because of architecture / API limitations the eRoom Connector only works as single thread connector. It seems to be that the server does not allow multiple connections performing queries at the same time.

    At some point of the crawl, the connector get this errors from the server "No target objects were found evaluating the command's select attribute" but the query that we are trying to execute works well (we tested using a SOAP/XML Test program provided for EMC).

    Also, if you look at the latest comments in this post from the EMC forums (eRoom Forum); it seems that eRoom is not multi-thread safe. When multiple processes are spawned, you will get unpredictable behavior. Because of this reason, the eRoom Connector is single threaded - this will affect Aspire 4.0's performance as it cannot utilize its multi-threaded platform on the eRoom connector.


Operation Mode

The connector will use SOAP / XML over HTTP or HTTPs to acquire information of eRoom content. The connector acquires content by doing the following:

  • Go recursively through all items and documents of an eRoom site, creates sub-jobs for each object discovered. Each sub-job contains all metadata available, including ACLs.
  • Saves the item states into a MongoDB instance in order to compare and perform the incremental crawls with added, updated and deleted items.


  • No labels