Feature only available with Aspire Enterprise

Group Expansion Introduction

...

Group Expansion Introduction

Group expansion is most often considered at the time a search query is being

...

executed – a user has logged in to a search application with their user name and wants to retrieve data. The data in the search engine has been secured using access control lists

...

(ACLs) which contain the names of users and groups that are permitted or denied access to

...

each item. To provide the users a comprehensive result set, covering all the documents a user has access to, the search engine must know the groups that the user belongs to. However, at login time, the user only

...

provides their username and hence, group expansion is required.

Different search applications can request group expansion with varying frequencies. Some applications may request expansion when the user logs in whereas some may request it for every search. The frequency with which requests are made is not specifically important, as long as the servers are not overloaded. What is important is the speed with which the information is returned. Aspire should be able to respond to a request in a timely manner. Whilst a request made during login may be able to wait a few seconds, a request made during a search must be processed very quickly.

The format of the incoming request is also important. Search applications may expect to submit requests to an ldap server (the GSA for example) or via

...

HTTP. Group expansion must support both of these request types. There are other things to consider:

An Aspire installation will probably have multiple content repositories (Documentum, SharePoint etc). Whilst these will quite often connect to LDAP or Active Directory and use groups from there, it is quite possible they use groups that are unique to the content repository and so it is important that these “local” groups are correctly handled.

If we do have multiple repositories, it’s possible that the username used in one content source is not the same as the username used in another, so this situation must be accounted for.

It is also possible that user and group information from differing repositories comes in different forms (with or without a domain for example).

Finally, the username passed to group expansion could come with or without a domain, or may not even be a “standard” username (it could be a SID/GUID for example).

The use of caching in group expansion

As mentioned above, it’s important that the requests for group expansion are processed as quickly as possible. This would suggest that some sort of caching is required. There is a potential issue here. What happens if you have an entry in the cache that gives the user access to a group from which he is later removed? It is theoretically possible that he will return a search result that he can no longer access. However, most system implementations will then direct the user to the original repository to retrieve the document at which point the request will be rejected, so any “risk” of the user seeing prohibited content is mitigated.

Group Expansion Process in Aspire

At the very high level, group expansion inside Aspire can be split in to two processes:

Group collection and caching
Processing expansion requests

...

The first process is responsibility of each content source or the LDAP Cache service, each connector knows how to retrieve the users and their groups from the repository, and the connector framework stores them into the Group Expansion Manager database cache.

The second process is executed by the Group Expansion Manager service, which receives the HTTP request and queries its database for the username and groups cached in its database.

The high level architecture for group expansion becomes:

Content Source
- Group collection and caching
Group Expansion Manager
- Processing expansion requests

...

The group expansion manager will also handle conversion of requests from external sources via

...

HTTP or LDAP to a format Aspire can use and then to convert the responses back to the appropriate form.

The high level architecture can be seen in the diagram below:

...

Image Added

Group collection and caching

Whatever the content source, request format or frequency, the key to the group expansion process is the ability to get the groups for a user. Since, for content source data extraction purposes, we have already built a connector that understands how to connect to the content source and has all the appropriate jar files for the content source API built in, we use this to collect the groups and insert in them in to a cache.

A scheduler component will periodically send the content source connector component a job to tell it to reload the cache. When the connector component receives this job, it will

...

update the group expansion manager database cache with the newly downloaded groups.

...

These groups are downloaded in a separate thread (so as not to block any repository scanning)

...

. The connector requests:

a list of all users and the groups to which they belong, and
a list of groups and the groups to which they belong (for nested group expansion)

Once the connector has this information, it will:

Consider each user in turn and perform un-nesting of the groups, to produce a cache of users against the (full set of) groups to which they belong.

...

The cache will always contain updated users/groups data, since the connector will remove any deleted user, update the groups for any given one or add any new user.

Un-nesting the groups involves talking the user and looking up the groups to which he belongs and then looking up each of the groups found and now looking to see if those are members of other groups. This process is repeated until no new groups are found. Thus, if a user is a member of group “one” and group “two” and group “two” is a member of group “three” and group “four”, the entry in the cache will record the user as a member of groups “one”, “two”, “three” and “four”.

An added complication when calculating group membership is that some content source repositories can use external groups (typically from and ldap or active directory server). These groups typically can be members of “local” groups to the repository but are not always reported back in the list of groups for the repository and so it can be difficult to work out what “local” groups they belong to. Thus the connector can be optionally supplied with a list of groups from an external source. These groups are then looked up by the connector to ensure they are handled correctly.

At the end of the group collection process, the cache will contain a consistent set of users against the groups to which they belong. This cache will then be used to serve group expansion requests until the next time the scheduler determines the cache should be reloaded. At this point the process begins again.

Processing expansion requests

At the point at which group expansion needs to be performed for a user, the user will be looked up in the

...

group expansion manager database cache. The input to this process (known as a Group Expansion Request) will be an Aspire job with particular attributes, including the username of the user to be looked up. The response (a Group Expansion Response) will include a set of groups to be returned to the requester.

The cache is assumed to have already been populated and the

...

group expansion manager executes a single query against the database to obtain the results which will be merged before returned (there can be multiple entries for a single user coming from different repositories with different groups).

Group Expansion Manager

The group expansion manager is responsible for four main areas of processing:

Receive external requests
- And return a response when all processing has been done.
Provide workflow components that allow the operator to make changes to the request and response.
Add external group data in to the

...

See later.

...

responses
Query the username in its database cache (populated by the different connectors)

The architecture can be seen below:

...

Image Added

Receiving requests

The group expansion manager includes the components for receiving group expansion requests. It publishes a servlet in Aspire with the (default) path /groupExpansion (ie http://localhost:50505/groupExpansion). This servlet expects a single parameter (username) sends a group expansion job in to Aspire. Once processing has been performed, the servlet will return a list of groups to which the user belongs. Thus a call to

http://localhost:50505/groupExpansion?username=tesla

provides a response in the form:

  <groups>
    <group>tesla</group>
    <group>scientists</group>
    <group>italians</group>
    <group>group1</group>
    <group>group2</group>
    <group>group3</group>
    <group>group4</group>
    <group>PUBLIC:ALL</group>
    <group>xxxxxx</group>
  </groups>

Note that the user itself (tesla in the above example) is returned as a pseudo group.

The group expansion manager also includes a “proxy ldap server”. This is disabled by default and requires pre-installation of another service (Ldap group cache). When enabled, this proxy allows search engines such as the GSA to use Aspire for group expansion. The proxy expects to receive all requests from the engine. Requests for groups for user are intercepted. The username is extracted from the ldap request and sent as a “group expansion request” job to the same pipeline as the http requests. Once the expansion has been performed, the returned groups are gathered and formatted as an ldap server response and sent back to the engine.

Requests which are not requests for groups (such as general ldap searches or login requests) are forwarded to a “real” ldap server via an ldap connection component. This ldap connection component is not installed as part of the group expansion manager and must be configured externally. The ldap connection component is included in the Ldap Group Cache service (see later).

Workflow

The group expansion manager includes a number of workflow processors to allow the system administrator the chance to manipulate the group expansion requests or responses and change either the user to be looked up, or the groups to be returned.

Workflow rules can be added via the UI and are

...

executed in the following points in the process:

Workflow name	Position in process	Usage
onRequest	Immediately after the request is received

...

Change or modify the user name to be looked

...

up

...


onResponse	After all expansion has been performed, before the groups are returned back to the requester	Modify the groups returned to the requester

...

External group data (typically from LDAP or Active Directory) information can be added during the expansion process before the request is sent to the group expanders in the content sources. The information is added before the expansion at the content source for two reasons:

Some legacy connectors require external groups to be provided on the request so they can perform their expansion
Adding LDAP information at this point allows the router to use this information later in the expansion process. Differing LDAP or Active Directory attributes to be used for lookup at different expanders. This is particularly useful if (say) your username for one repository different to your main username and that different username held in an LDAP or Active Directory attribute.

Addition of external group data is disabled by default. When enabled, you must specific the Aspire application that provides external group data. This would typically be the LDAP Group Cache (see below)

When enabled, the following group expansion request

<doc type="groupExpansion">
  <feederLabel>GroupExpander</feederLabel>
  <username source="HTTPFeederServlet">tesla</username>
  <aspireHttpFeederServlet fullPath="/groupExpansion" remoteAddr="127.0.0.1" remoteHost="127.0.0.1" remotePort="52644" serverName="l
ocalhost" serverPort="50505" servletPath="/groupExpansion" source="HTTPFeederServlet">
    <queryString>username=tesla</queryString>
  </aspireHttpFeederServlet>
  <groupExpander>
    <expanders>
      <route lookupAttribute="myLookupAttr"/>
    </expanders>
  </groupExpander>
</doc>

would have ldap information added and would then appear something like this:

<doc type="groupExpansion">
  <feederLabel>GroupExpander</feederLabel>
  <username source="HTTPFeederServlet">tesla</username>
  <aspireHttpFeederServlet fullPath="/groupExpansion" remoteAddr="127.0.0.1" remoteHost="127.0.0.1" remotePort="52644" serverName="l
ocalhost" serverPort="50505" servletPath="/groupExpansion" source="HTTPFeederServlet">
    <queryString>username=tesla</queryString>
  </aspireHttpFeederServlet>
  <groupExpander>
    <expanders>
      <route lookupAttribute="myLookupAttr"/>
    </expanders>
  </groupExpander>
  <ldap source="ldap">
    <dn>uid=tesla,dc=example,dc=com</dn>
    <mail>[email protected]</mail>
    <gidNumber>99999</gidNumber>
    <uidNumber>88888</uidNumber>
    <uid>tesla</uid>
    <objectClass>inetOrgPerson</objectClass>
    <objectClass>organizationalPerson</objectClass>
    <objectClass>person</objectClass>
    <objectClass>top</objectClass>
    <objectClass>posixAccount</objectClass>
    <homeDirectory>home</homeDirectory>
    <sn>Tesla</sn>
    <cn>Nikola Tesla</cn>
  </ldap>
  <groups>
    <group source="ldap">tesla</group>
    <group source="ldap">scientists</group>
    <group source="ldap">italians</group>
  </groups>
</doc>

Note the <ldap> section which has a number of attributes under it and the <groups> section with the ldap groups.

Routing

Routing is the method by which a single request gets sent to multiple content sources for expansion. If (say) you are using group expansion and have Lotus Notes, SharePoint, Documentum and Confluence you will almost certainly have four group expanders.

Routing occurs after external group information has been added (if enabled).

When configuring routing you specify a minimum of a content source you want the group expansion request to be sent to. You can specify as many or as few as you wish. The incoming expansion request is replicated to produce “child” expansion requests and the “child” group expansion request is sent (using a new Aspire job) to the desired content source (after being passed through the workflow component).

The router then waits for the group expansion jobs to complete. The wait time is configurable but defaults to 15 seconds. The groups from any “child” expansion requests that complete with in the given time are added to the original request. Once all child requests have returned, or the wait time has been exceeded, the original request, with a full set of groups, will continue and will eventually be passed back to the requester as a group expansion response. Any “child” requests that complete after the wait time will be ignored

Domain Handling in the Group Expansion Manager

By default, the group expansion manager does not alter the domains of incoming requests or outgoing responses. However, the manager allows the following options for both incoming and outgoing domains:

Option	Description
Leave alone	The username is

...

untouched. If the username has a domain it will be left alone. If it doesn’t, none will be added
Strip	Any domain will be removed from the user or group name
Add	The specified domain name will be added to the user or group name (replacing of any existing domain name)

Supplementary groups

The group expansion manager will allow you to add supplementary groups should you need to. These groups are added after all other expansion has been performed.

PUBLIC:ALL

The group expansion manager by default adds the Aspire “PUBLIC:ALL” group. This group is used by connectors to indicate content that is identified as public. The addition of this can be turned off if required.

Additional Static Groups

The group expansion manager will optionally add static groups. These are any additionally groups that you wish to be added to expansion responses. You may configure as many as you wish by specifying the name of the group (including domain if required) in the UI. Note that the groups are added exactly as entered and any domain remains unaltered, even if you have configured domain handling as described above.

Page tree

Versions Compared

Old Version 1

New Version Current

Key

Group Expansion Introduction

Group Expansion Introduction

The use of caching in group expansion

Group Expansion Process in Aspire

Group collection and caching

Processing expansion requests

Group Expansion Manager

Receiving requests

Workflow

Routing

Domain Handling in the Group Expansion Manager

Supplementary groups

PUBLIC:ALL

Additional Static Groups

Page tree

Page History

Versions Compared

Old Version 1

New Version Current

Key

Group Expansion Introduction

Group Expansion Introduction

The use of caching in group expansion

Group Expansion Process in Aspire

Group collection and caching

Processing expansion requests

Group Expansion Manager

Receiving requests

Workflow

Routing

Domain Handling in the Group Expansion Manager

Supplementary groups

PUBLIC:ALL

Additional Static Groups