Group Expansion Service Introduction (Aspire 2)

Group Expansion Introduction

Group expansion in Aspire refers to the process of receiving a username and calculating from that the full set of groups to which the user belongs. Group expansion is most often considered at the time a search is being performed – a user has logged in to a search application with their user name and wants to retrieve data. The data in the search engine has been secured using access control lists and those list access control lists contain the names of users and groups that are permitted or denied access to the item. To provide the users a comprehensive result set, covering all the documents a user has access to, the search engine must know the groups that the user belongs to. However, at login time, the user only provided their username and hence, group expansion is required.

Different search applications can request group expansion with varying frequencies. Some applications may request expansion when the user logs in whereas some may request it for every search. The frequency with which requests are made is not specifically important, as long as the servers are not overloaded. What is important is the speed with which the information is returned. Aspire should be able to respond to a request in a timely manner. Whilst a request made during login may be able to wait a few seconds, a request made during a search must be processed very quickly.

The format of the incoming request is also important. Search applications may expect to submit requests to an ldap server (the GSA for example) or via http. Group expansion must support both of these request types. There are other things to consider:

An Aspire installation will probably have multiple content repositories (Documentum, SharePoint etc). Whilst these will quite often connect to LDAP or Active Directory and use groups from there, it is quite possible they use groups that are unique to the content repository and so it is important that these “local” groups are correctly handled.

If we do have multiple repositories, it’s possible that the username used in one content source is not the same as the username used in another, so this situation must be accounted for.

It is also possible that user and group information from differing repositories comes in different forms (with or without a domain for example).

Finally, the username passed to group expansion could come with or without a domain, or may not even be a “standard” username (it could be a SID/GUID for example).

The use of caching in group expansion

As mentioned above, it’s important that the requests for group expansion are processed as quickly as possible. This would suggest that some sort of caching is required. There is a potential issue here. What happens if you have an entry in the cache that gives the user access to a group from which he is later removed? It is theoretically possible that he will return a search result that he can no longer access. However, most system implementations will then direct the user to the original repository to retrieve the document at which point the request will be rejected, so any “risk” of the user seeing prohibited content is mitigated.

Group Expansion Process in Aspire

At the very high level, group expansion inside Aspire can be split in to two processes:

Group collection and caching
Processing expansion requests

However, the two steps noted need to be performed on a per content source basis, so the group expansion process must include a step that that takes that single request, sends that to each of the content sources we have installed (or at least what to perform expansion for) and merges the results. This process is known as the Group Expansion Manager, so the high level architecture for group expansion becomes:

Content Source
- Group collection and caching
Processing expansion requests
- Group Expansion manager

The group expansion manager will also handle conversion of requests from external sources via http and ldap to a format Aspire can use and then to convert the responses back to the appropriate form.

The high level architecture can be seen in the diagram below:

Group collection and caching

Whatever the content source, request format or frequency, the key to the group expansion process is the ability to get the groups for a user. Since, for content source data extraction purposes, we have already built a connector that understands how to connect to the content source and has all the appropriate jar files for the content source API built in, we use this to collect the groups and insert in them in to a cache.

A scheduler component will periodically send the content source connector component a job to tell it to reload the cache. When the connector component receives this job, it will open a cache to hold the newly downloaded groups. Then, in a separate thread (so as not to block any repository scanning) it will connect to the repository and request:

a list of all users and the groups to which they belong, and
a list of groups and the groups to which they belong (for nested group expansion)

Once the connector has this information, it will:

Consider each user in turn and perform un-nesting of the groups, to produce a cache of users against the (full set of) groups to which they belong.
Switch in the new cache in an atomic operation so group expansion requests are always using a consistent cache (that is, you do not encounter the situation where expansion for one user is the result that was cached today and for another user was cached yesterday).

Un-nesting the groups involves talking the user and looking up the groups to which he belongs and then looking up each of the groups found and now looking to see if those are members of other groups. This process is repeated until no new groups are found. Thus, if a user is a member of group “one” and group “two” and group “two” is a member of group “three” and group “four”, the entry in the cache will record the user as a member of groups “one”, “two”, “three” and “four”.

An added complication when calculating group membership is that some content source repositories can use external groups (typically from and ldap or active directory server). These groups typically can be members of “local” groups to the repository but are not always reported back in the list of groups for the repository and so it can be difficult to work out what “local” groups they belong to. Thus the connector can be optionally supplied with a list of groups from an external source. These groups are then looked up by the connector to ensure they are handled correctly.

At the end of the group collection process, the cache will contain a consistent set of users against the groups to which they belong. This cache will then be used to serve group expansion requests until the next time the scheduler determines the cache should be reloaded. At this point the process begins again.

Processing expansion requests

At the point at which group expansion needs to be performed for a user, the user will be looked up in the cache created by the connector. The input to this process (known as a Group Expansion Request) will be an Aspire job with particular attributes, including the username of the user to be looked up. The response (a Group Expansion Response) will include a set of groups to be returned to the requester.

The cache is assumed to have already been populated and the process uses a client to retrieve the information. Whilst the methods to write to the cache needed to understand how to connect to the repository, the methods to read do not. They only need to be able to understand the information needed to look up items in the cache and how to format the information returned.

Since the connector already has the cache open, we make it implement a “Group Expansion Server” interface and have a generic client that takes the “group expansion request”, extracts the username to look up, calls a method on the “Group Expansion Server” interface to get the groups for that user and adds the returned groups to the “group expansion response”

Group Expansion Manager

The group expansion manager is responsible for four main areas of processing:

Receive external requests
- And return a response when all processing has been done.
Provide workflow components that allow the operator to make changes to the request and response.
Add external group data in to the request
- See later.
Route the request to all content sources for which group expansion should be performed.

The architecture can be seen below:

Receiving requests

The group expansion manager includes the components for receiving group expansion requests. It publishes a servlet in Aspire with the (default) path /groupExpansion (ie http://localhost:50505/groupExpansion). This servlet expects a single parameter (username) sends a group expansion job in to Aspire. Once processing has been performed, the servlet will return a list of groups to which the user belongs. Thus a call to

http://localhost:50505/groupExpansion?username=tesla

provides a response in the form:

 <groups>
    <group>tesla</group>
    <group>scientists</group>
    <group>italians</group>
    <group>group1</group>
    <group>group2</group>
    <group>group3</group>
    <group>group4</group>
    <group>PUBLIC:ALL</group>
    <group>xxxxxx</group>
  </groups>

Note that the user itself (tesla in the above example) is returned as a pseudo group.

The group expansion manager also includes a “proxy ldap server”. This is disabled by default and requires pre-installation of another service (Ldap group cache). When enabled, this proxy allows search engines such as the GSA to use Aspire for group expansion. The proxy expects to receive all requests from the engine. Requests for groups for user are intercepted. The username is extracted from the ldap request and sent as a “group expansion request” job to the same pipeline as the http requests. Once the expansion has been performed, the returned groups are gathered and formatted as an ldap server response and sent back to the engine.

Requests which are not requests for groups (such as general ldap searches or login requests) are forwarded to a “real” ldap server via an ldap connection component. This ldap connection component is not installed as part of the group expansion manager and must be configured externally. The ldap connection component is included in the Ldap Group Cache service (see later).

Workflow

The group expansion manager includes a number of workflow processors to allow the system administrator the chance to manipulate the group expansion requests or responses and change either the user to be looked up, or the groups to be returned.

Workflow rules can be added via the UI and are actioned in the following points in the process:

Workflow name	Position in process	Usage
onRequest	Immediately after the request is received	Change or modify the user name to be looked up
afterLdap	After external information has been added to the request (if enabled)	Change or modify the user name to be looked up. Add extra external group information
onExpansion	After the request has passed through the router, but before the request is sent to the group expander for expansion	Change or modify the user name to look up for an individual group expander. A single request for expansion could cause multiple jobs in this workflow, assuming more than one expander is configured
onResponse	After all expansion has been performed, before the groups are returned back to the requester	Modify the groups returned to the requester

External group data

External group data (typically from LDAP or Active Directory) information can be added during the expansion process before the request is sent to the group expanders in the content sources. The information is added before the expansion at the content source for two reasons:

Some legacy connectors require external groups to be provided on the request so they can perform their expansion
Adding LDAP information at this point allows the router to use this information later in the expansion process. Differing LDAP or Active Directory attributes to be used for lookup at different expanders. This is particularly useful if (say) your username for one repository different to your main username and that different username held in an LDAP or Active Directory attribute.

Addition of external group data is disabled by default. When enabled, you must specific the Aspire application that provides external group data. This would typically be the LDAP Group Cache (see below)

When enabled, the following group expansion request

<doc type="groupExpansion">
  <feederLabel>GroupExpander</feederLabel>
  <username source="HTTPFeederServlet">tesla</username>
  <aspireHttpFeederServlet fullPath="/groupExpansion" remoteAddr="127.0.0.1" remoteHost="127.0.0.1" remotePort="52644" serverName="l
ocalhost" serverPort="50505" servletPath="/groupExpansion" source="HTTPFeederServlet">
    <queryString>username=tesla</queryString>
  </aspireHttpFeederServlet>
  <groupExpander>
    <expanders>
      <route lookupAttribute="myLookupAttr"/>
    </expanders>
  </groupExpander>
</doc>

would have ldap information added and would then appear something like this:

<doc type="groupExpansion">
  <feederLabel>GroupExpander</feederLabel>
  <username source="HTTPFeederServlet">tesla</username>
  <aspireHttpFeederServlet fullPath="/groupExpansion" remoteAddr="127.0.0.1" remoteHost="127.0.0.1" remotePort="52644" serverName="l
ocalhost" serverPort="50505" servletPath="/groupExpansion" source="HTTPFeederServlet">
    <queryString>username=tesla</queryString>
  </aspireHttpFeederServlet>
  <groupExpander>
    <expanders>
      <route lookupAttribute="myLookupAttr"/>
    </expanders>
  </groupExpander>
  <ldap source="ldap">
    <dn>uid=tesla,dc=example,dc=com</dn>
    <mail>[email protected]</mail>
    <gidNumber>99999</gidNumber>
    <uidNumber>88888</uidNumber>
    <uid>tesla</uid>
    <objectClass>inetOrgPerson</objectClass>
    <objectClass>organizationalPerson</objectClass>
    <objectClass>person</objectClass>
    <objectClass>top</objectClass>
    <objectClass>posixAccount</objectClass>
    <homeDirectory>home</homeDirectory>
    <sn>Tesla</sn>
    <cn>Nikola Tesla</cn>
  </ldap>
  <groups>
    <group source="ldap">tesla</group>
    <group source="ldap">scientists</group>
    <group source="ldap">italians</group>
  </groups>
</doc>

Note the <ldap> section which has a number of attributes under it and the <groups> section with the ldap groups.

Routing

Routing is the method by which a single request gets sent to multiple content sources for expansion. If (say) you are using group expansion and have Lotus Notes, SharePoint, Documentum and Confluence you will almost certainly have four group expanders.

Routing occurs after external group information has been added (if enabled).

When configuring routing you specify a minimum of a content source you want the group expansion request to be sent to. You can specify as many or as few as you wish. The incoming expansion request is replicated to produce “child” expansion requests and the “child” group expansion request is sent (using a new Aspire job) to the desired content source (after being passed through the workflow component).

The router then waits for the group expansion jobs to complete. The wait time is configurable but defaults to 15 seconds. The groups from any “child” expansion requests that complete with in the given time are added to the original request. Once all child requests have returned, or the wait time has been exceeded, the original request, with a full set of groups, will continue and will eventually be passed back to the requester as a group expansion response. Any “child” requests that complete after the wait time will be ignored

Domain Handling in the Group Expansion Manager

By default, the group expansion manager does not alter the domains of incoming requests or outgoing responses. However, the manager allows the following options for both incoming and outgoing domains:

Option	Description
Leave alone	The username is un touched. If the username has a domain it will be left alone. If it doesn’t, none will be added
Strip	Any domain will be removed from the user or group name
Add	The specified domain name will be added to the user or group name (replacing of any existing domain name)

Supplementary groups

The group expansion manager will allow you to add supplementary groups should you need to. These groups are added after all other expansion has been performed.

PUBLIC:ALL

The group expansion manager by default adds the Aspire “PUBLIC:ALL” group. This group is used by connectors to indicate content that is identified as public. The addition of this can be turned off if required.

Static Groups

The group expansion manager will optionally add static groups. These are any additionally groups that you wish to be added to expansion responses. You may configure as many as you wish by specifying the name of the group (including domain if required) in the UI. Note that the groups are added exactly as entered and any domain remains unaltered, even if you have configured domain handling as described above.

Page tree