Group expansion is most often considered at the time a search query is being executed – a user has logged in to a search application with their user name and wants to retrieve data. The data in the search engine has been secured using access control lists (ACLs) which contain the names of users and groups that are permitted or denied access to each item. To provide the users a comprehensive result set, covering all the documents a user has access to, the search engine must know the groups that the user belongs to. However, at login time, the user only provides their username and hence, group expansion is required.
Different search applications can request group expansion with varying frequencies. Some applications may request expansion when the user logs in whereas some may request it for every search. The frequency with which requests are made is not specifically important, as long as the servers are not overloaded. What is important is the speed with which the information is returned. Aspire should be able to respond to a request in a timely manner. Whilst a request made during login may be able to wait a few seconds, a request made during a search must be processed very quickly.
The format of the incoming request is also important. Search applications may expect to submit requests to an ldap server (the GSA for example) or via HTTP. Group expansion must support both of these request types. There are other things to consider:
As mentioned above, it’s important that the requests for group expansion are processed as quickly as possible. This would suggest that some sort of caching is required. There is a potential issue here. What happens if you have an entry in the cache that gives the user access to a group from which he is later removed? It is theoretically possible that he will return a search result that he can no longer access. However, most system implementations will then direct the user to the original repository to retrieve the document at which point the request will be rejected, so any “risk” of the user seeing prohibited content is mitigated.
At the very high level, group expansion inside Aspire can be split in to two processes:
The first process is responsibility of each content source or the LDAP Cache service, each connector knows how to retrieve the users and their groups from the repository, and the connector framework stores them into the Group Expansion Manager database cache.
The second process is executed by the Group Expansion Manager service, which receives the HTTP request and queries its database for the username and groups cached in its database.
The high level architecture for group expansion becomes:
The group expansion manager will also handle conversion of requests from external sources via HTTP or LDAP to a format Aspire can use and then to convert the responses back to the appropriate form.
The high level architecture can be seen in the diagram below:
Whatever the content source, request format or frequency, the key to the group expansion process is the ability to get the groups for a user. Since, for content source data extraction purposes, we have already built a connector that understands how to connect to the content source and has all the appropriate jar files for the content source API built in, we use this to collect the groups and insert in them in to a cache.
A scheduler component will periodically send the content source connector component a job to tell it to reload the cache. When the connector component receives this job, it will update the group expansion manager database cache with the newly downloaded groups. These groups are downloaded in a separate thread (so as not to block any repository scanning). The connector requests:
Once the connector has this information, it will:
Un-nesting the groups involves talking the user and looking up the groups to which he belongs and then looking up each of the groups found and now looking to see if those are members of other groups. This process is repeated until no new groups are found. Thus, if a user is a member of group “one” and group “two” and group “two” is a member of group “three” and group “four”, the entry in the cache will record the user as a member of groups “one”, “two”, “three” and “four”.
An added complication when calculating group membership is that some content source repositories can use external groups (typically from and ldap or active directory server). These groups typically can be members of “local” groups to the repository but are not always reported back in the list of groups for the repository and so it can be difficult to work out what “local” groups they belong to. Thus the connector can be optionally supplied with a list of groups from an external source. These groups are then looked up by the connector to ensure they are handled correctly.
At the end of the group collection process, the cache will contain a consistent set of users against the groups to which they belong. This cache will then be used to serve group expansion requests until the next time the scheduler determines the cache should be reloaded. At this point the process begins again.
At the point at which group expansion needs to be performed for a user, the user will be looked up in the group expansion manager database cache. The input to this process (known as a Group Expansion Request) will be an Aspire job with particular attributes, including the username of the user to be looked up. The response (a Group Expansion Response) will include a set of groups to be returned to the requester.
The cache is assumed to have already been populated and the group expansion manager executes a single query against the database to obtain the results which will be merged before returned (there can be multiple entries for a single user coming from different repositories with different groups).
The group expansion manager is responsible for four main areas of processing:
The architecture can be seen below:
The group expansion manager includes the components for receiving group expansion requests. It publishes a servlet in Aspire with the (default) path /groupExpansion (ie http://localhost:50505/groupExpansion). This servlet expects a single parameter (username) sends a group expansion job in to Aspire. Once processing has been performed, the servlet will return a list of groups to which the user belongs. Thus a call to
http://localhost:50505/groupExpansion?username=tesla
provides a response in the form:
<groups> <group>tesla</group> <group>scientists</group> <group>italians</group> <group>group1</group> <group>group2</group> <group>group3</group> <group>group4</group> <group>PUBLIC:ALL</group> <group>xxxxxx</group> </groups>
Note that the user itself (tesla in the above example) is returned as a pseudo group.
The group expansion manager also includes a “proxy ldap server”. This is disabled by default and requires pre-installation of another service (Ldap group cache). When enabled, this proxy allows search engines such as the GSA to use Aspire for group expansion. The proxy expects to receive all requests from the engine. Requests for groups for user are intercepted. The username is extracted from the ldap request and sent as a “group expansion request” job to the same pipeline as the http requests. Once the expansion has been performed, the returned groups are gathered and formatted as an ldap server response and sent back to the engine.
Requests which are not requests for groups (such as general ldap searches or login requests) are forwarded to a “real” ldap server via an ldap connection component. This ldap connection component is not installed as part of the group expansion manager and must be configured externally. The ldap connection component is included in the Ldap Group Cache service (see later).
The group expansion manager includes a number of workflow processors to allow the system administrator the chance to manipulate the group expansion requests or responses and change either the user to be looked up, or the groups to be returned.
Workflow rules can be added via the UI and are executed in the following points in the process:
Workflow name | Position in process | Usage |
---|---|---|
onRequest | Immediately after the request is received | Change or modify the user name to be looked up |
onResponse | After all expansion has been performed, before the groups are returned back to the requester | Modify the groups returned to the requester |
By default, the group expansion manager does not alter the domains of incoming requests or outgoing responses. However, the manager allows the following options for both incoming and outgoing domains:
Option | Description |
---|---|
Leave alone | The username is untouched. If the username has a domain it will be left alone. If it doesn’t, none will be added |
Strip | Any domain will be removed from the user or group name |
Add | The specified domain name will be added to the user or group name (replacing of any existing domain name) |
The group expansion manager will allow you to add supplementary groups should you need to. These groups are added after all other expansion has been performed.
The group expansion manager by default adds the Aspire “PUBLIC:ALL” group. This group is used by connectors to indicate content that is identified as public. The addition of this can be turned off if required.
The group expansion manager will optionally add static groups. These are any additionally groups that you wish to be added to expansion responses. You may configure as many as you wish by specifying the name of the group (including domain if required) in the UI. Note that the groups are added exactly as entered and any domain remains unaltered, even if you have configured domain handling as described above.