The Federation Dispatcher stage takes a job (most likely originating from the HTTP Feeder) and dispatches it to a number of applications (which normally include stages such as the FAST Query Builder and Fetch URL) that perform a query on a search engine and place the results back in their Aspire document.

The dispatcher then collects these federated results, but does not merge them. That is done by Merger.

Federation Dispatcher
Factory Namecom.searchtechnologies.aspire:aspire-federation
subType

default

InputsAspire Jobs
OutputsAspire Jobs

Detail

The dispatcher holds a list of servers to post queries to. These servers are organised in to zones. When the job is received, the stage attempts to find a zone to federate to by looking at a tag in the incoming document (or failing that may be configured with a default). Once the zone has been established, the dispatcher will get the servers for the zone and create a child job for each server. This job will be published via the branch handler, configured with a route to a query application that is associated with the server.

This query application will typically be search vendor specific and will do any conversion of parameters required in order allow the search to executed and the results loaded, typically using stages such as the FAST Query Builder, Fetch URL and XML Loader.

The Federation Dispatcher uses the job listener to see the completion of the child jobs and take the results from the individual search and adds then to the parent job

Servers

The list of configured servers held by the Federation Dispatcher defines the list of search applications that may be queried. The definition of a server consists of an id for identification (which should be unique), a searchUrl which defines the actual URL on which the search should be executed and a query application which is an Aspire application which performs the actual search and handles any conversion required (both before the query to convert parameters and after the query to load and convert results to the form required by the Merger.

A simplistic definition is shown below:

 <server id="fast" searchUrl="http://host/cgi-bin/xsearch" queryApplication="/FederationFastQuery" />

Server Failure list

Search servers may be inaccessible at times. This can take time to establish and therefore slow the delivery of results to the client. When a job has been dispatched, the dispatcher will assume after a certain timeout (default 15s) that the server is unavailable and will return without that result set. When this happens, a failure count for the server is incremented. If this count reaches a certain threshold (default 3), the server will be added to a deny list and no further jobs will be dispatched for this server for a period of time (the default period  is 15 minutes). Once this period has passed, the failure count for the server will be reset and jobs dispatched to it once more.

Parameters can be set in the server definition thus:

 <server id="fast" searchUrl="http://host/cgi-bin/xsearch" queryApplication="/FederationFastQuery" blacklistThreshold="5" blacklistPeriod="180000" />

Boosting when merging

When merging by rank, you may require results from a certain server to be boosted. This can be achieved by setting the boost attribute.

 <server id="fast" searchUrl="http://host/cgi-bin/xsearch" queryApplication="/FederationFastQuery" boost="2.5" />

Server parameter set

The full server parameter set is shown below:

ElementTypeDefaultDescription
server/@idString
Mandatory The id of the server.
server/@searchUrlString
Mandatory The URL of the search server application for this server.
server/@queryApplicationString
Mandatory The Aspire application for which jobs for this server should be routed.
server/@boostFloat1.0The boost factor for results from this server when the Merger is merging by rank.
server/@blacklistThresholdint3The number of server failures (in a row) that will cause this server to be blacklisted.
server/@blacklistPeriodlong900000 ms
(=15 minutes)
The period in ms for which a server will remain blacklisted once it reaches the blacklist threshold.

Zones

Zones allow grouping of servers to perform federation. You need at least one zone, and a zone without any servers does not make sense. Zones are identified by an id (which should be unique) and reference servers by ids that should match those in the server configuration.

When the Federation Dispatcher receives a job, it looks in the attached document for a zone (or falls back to a default). Once the zone has been established, the dispatcher publishes jobs for each server configured in the zone

A simple zone definition would be:

 <zone id="zoneOne">
   <server id="server1"/>
   <server id="server2"/>
 </zone>

As mentioned above, servers may be unavailable, and the dispatcher times out requests after a certain period of time. This timeout may specified in the zone definition:

 <zone id="zoneOne" timeout="15001">
   <server id="server1"/>
 </zone>

Zone parameter set

The full zone parameter set is shown below:

ElementTypeDefaultDescription
zone/@idString
Mandatory The id of the zone.
zone/@timeoutlong15000 ms
(=15 seconds) but can be globally overriden
The time out for requests in this zone.
zone/@mergeTypeString
The suggested method used for merging (merging in performed in the Merger).
zone/server/@idString
Mandatory The id of the server to add to this zone (multiples allowed) .

Result Collection

The dispatcher does not merge the results from the federated query applications (this is done by the Merger) but it does collect them under a single tag in document passed to the stage. The primary reason for this action is that the jobs containing individual results may not be available when the job from this stage reaches the merger.

During result collection, the dispatcher looks for a named tag in the document from the child job from the federated query, and adds it as a child of the parent document, resulting in a single node in the parent document containing multiple children, where each child contains the results from a single search

Configuration

The following configuration items are supported:

ElementTypeDefaultDescription
federationResultTagStringaspireFederationResultThe document tag to hold all of the federation result sets.
resultTagStringSEGMENTSThe result tag in the federated results set.
federationEventStringonFederationThe event to publish the federation jobs on.
federationBranchEventString
The branch to set on the job if federation occurred.
noFederationBranchEventString
The branch to set on the job if federation did not occur.
zonePathString/doc/federationZoneThe element in the document that hold the name of the zone to federate to.
defaultZoneString
The default zone if one is not found in the document.
timeoutlong15000 ms
=15s
The default timeout for zones. After this period, outstanding federation queries will be deemed to have failed.
blacklistThresholdlong3The number of federation queries that can fail before the server is blacklisted.
blacklistPeriodlong900000 ms
=15m
The default period for which a blacklisted server will remain blacklisted.
serverssee Servers above
Mandatory One or more servers specifying where queries should be federated to.
zonessee Zones above
Mandatory One or more zones specifying how queries should be federated.

Example Configuration

 <component name="Dispatcher" subType="default" factoryName="aspire-federation">
   <debug>${debug}</debug>
   <zonePath>${zonePath}</zonePath>
   <defaultZone>${defaultZone}</defaultZone>
   <servers>
     <server id="server1" searchUrl="http://myServer1/mySearchUrl1" queryApplication="/federateQuery1" blacklistThreshold="11" blacklistPeriod="180001" boost="0.1" />
     <server id="server2" searchUrl="http://myServer2/mySearchUrl2" queryApplication="/federateQuery2" blacklistThreshold="12"  boost="0.2" />
     <server id="server3" searchUrl="http://myServer3/mySearchUrl3" queryApplication="/federateQuery3" />
     <server id="server4" searchUrl="http://myServer4/mySearchUrl4" queryApplication="/federateQuery4" />
     <server id="server5" searchUrl="http://myServer4/mySearchUrl5" queryApplication="/federateQuery5" />
   </servers>
   <zones>
     <zone id="zoneNone" />
     <zone id="zoneOne" timeout="15001">
       <server id="server1" />
     </zone>
     <zone id="zoneTwoOne" />
       <server id="server1" />
       <server id="server2" />
     </zone>
     <zone id="zoneTwoTwo">
       <server id="server3" />
       <server id="server4" />
     </zone>
     <zone id="zoneAll" timeout="2500" mergeType="rank">
       <server id="server1" />
       <server id="server2" />
       <server id="server3" />
       <server id="server4" />
     </zone>
   </zones>
   <branches>
     <branch event="onFederation" pipelineManager="../FederationPipelineManager"/>
   </branches>  
 </component>


  • No labels