Date: Thu, 28 Mar 2024 20:45:15 -0500 (CDT) Message-ID: <1471106425.49.1711676715754@slrs01vf4p1cn02.as.ad.digital.accenture.com> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_48_1926931241.1711676715752" ------=_Part_48_1926931241.1711676715752 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html
The Federation Merger stage takes a job (most likely originatin= g from the Dispatcher) and merges= the result sets from a number of different federated queries to form a sin= gle result set that can be sent back to the client.
The Federation Merger uses a tag in the document to identify th= e results to be merged and assumes that each child of this tag is a single = result set from a federated query. The format of this child result set is d= etailed below.
The Federation Merger is able to use different merge methods, w= ith the actual method used being specified in the incoming document. Once t= he result sets have been merged, the resulting set is added to the document= and the source result sets are removed (in order to reduce the payload ret= urned back to the client).
At the same time as merging the results set, the Federation Merger= em> selects the appropriate page of results based on the incoming job param= eters.
The Federation Merger is designed to merge XML results sets from FAST se= arch engines. If your search application is not FAST search, the query = application specified in the Dispatcher= a> should include a stage to convert the results to the FAST format.
The FAST XML format is shown below. This information is taken from the F= AST documentation - ESP Query Integration Guide.
XML Template | Description |
---|---|
<SEGMENTS> <SEGMENT NAME=3D"webcluster"> |
Normally only one segment (cluster) returne= d |
<QUERYTRANSFORMS>=20 <QUERYTRANSFORM=20 NAME=3D=20 ACTION=3D=20 QUERY=3D=20 CUSTOM=3D=20 MESSAGE=3D=20 MESSAGEID=3D=20 INSTANCE=3D />=20 ...=20 </QUERYTRANSFORMS>=20 |
Query transformation block=20 One query transformation feedback=20 NAME element=20 ACTION element=20 QUERY element=20 CUSTOM element=20 MESSAGE element=20 MESSAGE ID element=20 INTANCE element=20 Refer to Query Transformations in the Query Language and Parameters Guide f= or description of the elements. |
<NAVIGATION ENTRIES=3D >=20 <NAVIGATIONENTRY=20 NAME=3D=20 USEDHITS=3D=20 DISPLAYNAME=3D=20 TYPE=3D=20 UNIT=3D=20 MODIFIER=3D=20 SCORE=3D=20 SAMPLECOUNT=3D=20 HITCOUNT=3D=20 RATIO=3D=20 MIN=3D=20 MAX=3D MEAN=3D=20 ENTROPY=3D SUM=3D >=20 <NAVIGATIONELEMENTS=20 COUNT=3D >=20 <NAVIGATIONELEMENT=20 NAME=3D=20 MODIFIER=3D=20 COUNT=3D />=20 ... (more modifiers)=20 </NAVIGATIONELEMENTS> </NAVIGATIONENTRY>=20 ... (more navigators)=20 </NAVIGATION>=20 |
Navigators Number of navigators=20 Navigator name=20 Number of used (considered) hits for each navigator=20 Display name=20 Navigator type=20 Unit=20 Modifier=20 Score=20 Sample count=20 Hit count=20 Ratio=20 Min value=20 Max value=20 Mean value=20 Entropy=20 Aggregated sum of all values Navigator name Modifier Document count |
<CLUSTERS>=20 <CLUSTER TYPE=3D >=20 <NODE ID=3D=20 SUBMEMCNT=3D >=20 <LABELS COUNT=3D >=20 <LABEL>...</LABEL>=20 ... (more labels)=20 </LABELS>=20 <MEMBERS COUNT=3D >=20 <MEMBER OFFSET=3D >=20 ... (more members)=20 </MEMBERS>=20 </NODE>=20 ... (more nodes)=20 </CLUSTER>=20 ... (more clusters)=20 </CLUSTERS>=20 |
Clusters and cluster nodes.=20 A cluster node ID (e.g. "S.0.1")=20 Number of sub-members=20 Cluster label Cluster member |
<RESULTSET=20 FIRSTHIT=3D=20 LASTHIT=3D=20 HITS=3D=20 TOTALHITS=3D=20 MAXRANK=3D=20 TIME=3D >=20 =20 <HIT NO=3D=20 RANK=3D=20 SITEID=3D=20 MOREHITS=3D >=20 <FIELD NAME=3D >=20 field_content=20 </FIELD>=20 ... (more fields)=20 </HIT>=20 ... (more hits)=20 </RESULTSET>=20 |
Start of query result set.=20 Index to first hit in result set=20 Index to last hit in result set=20 Number of hits presented=20 Total number of hits for query=20 MAXRANK is a theoretical maximum rank for a document for a specific query (= if the document contained all the query terms close to each other, early in= the document, in all the important fields, etc.). In practice the best doc= ument in the result set will usually have a rank score much lower then MAXR= ANK.=20 Time used to process query=20 Index to this result entry=20 Rank value for result entry=20 Field Collapse entries:=20 SITEID =3D Field ID=20 MOREHITS 1 if collapsed entries exist below the entry=20 Field name and content=20 End of this result entry=20 |
<PAGENAVIGATION>=20 <NEXTPAGE=20 FIRSTHIT=3D=20 LASTHIT=3D=20 URL=3D />=20 <PREVPAGE =20 FIRSTHIT=3D=20 LASTHIT=3D=20 URL=3D /> </PAGENAVIGATION>=20 |
Information about next page in result set:= =20 First hit on next page (f)=20 Last hit on next page (l)=20 URL to retrieve next page (u)=20 |
</SEGMENT>=20 </SEGMENTS>=20 |
Normally only one segment (cluster) returne= d=20 |
Certain information from the FAST XML results set are read or updated du= ring the merge and the operation of the merger is undefined if the= se are not present. These attributes are detailed below:
Element | Description |
---|---|
NAVIGATION/@ENTRIES | Updated to hold the correct number of navigators= . |
NAVIGATION/NAVIGATIONENTRY/@NAME | Navigators from different result sets with the s= ame name will be merged. |
NAVIGATION/NAVIGATIONENTRY/NAVIGATIONELEMENTS/@C= OUNT | Updated to hold the correct number of elements f= or this navigator. |
NAVIGATION/NAVIGATIONENTRY/NAVIGATIONELEMENTS/NA= VIGATIONELEMENT/@COUNT | Updated to hold the correct number of hits for t= his navigator element. |
RESULTSET/@FIRSTHIT | Updated to hold the hit number of the first hit = in this result page. |
RESULTSET/@LASTHIT | Updated to hold the hit number of the last hit i= n this result page. |
RESULTSET/@HITS | Updated to hold the number of hits in this resul= t page. |
RESULTSET/@TOTALHITS | Updated to hold the total number of hits in this= result set. |
RESULTSET/@MAXRANK | Updated to hold the maximum rank this result set= . |
RESULTSET/HIT/@NO | Updated to hold the correct hit number for this = hit. |
The Federation Merger merges results set from the incoming Aspi= re document. The document includes a node containing a number of results se= ts (typically one from each server the query was federated too). The result= s sets should be in the FAST format described above. The merge process spli= ts the results sets in to their constituent parts (QUERYTRANSFORMS, NAVIGAT= ION, CLUSTERS and RESULTSET) and merges each in turn. A single result set i= s then re-created from the merged pieces.
Merging of the query transforms simple concatenates the query transforms= from each result set
Navigation merge examines the navigators returned from each server in tu= rn. For the first server, all navigators are simple added to the merged set= . For subsequent servers the following approach is used:
Merging is similar for the navigator elements
The counts for the navigators as elements are also updated as part of th= e merge
Merging of the clusters simple concatenates the clusters from each resul= t set
Result set merging takes the results sets extracted from the incoming do= cument and merges them using the schema suggested by the Dispatcher zone (or falling back to the default). The approp= riate page of results (as requested by the query) is then selected.
The following types of merge are supported
In the round robin merge method, a single hit is taken from eac= h result set (from a specific server) in turn and added to a merged hit lis= t. Once the hit list for a specific server is exhausted, then it is no long= er considered and the lists for the remaining servers are used in turn unti= l all results set from all servers are exhuasted. As hits are added to the = list, the hit number is adjusted to the correct value. The total hists and = max rank for the results set are also updated. The appropriate page of resu= lts is then selected.
In the rank merge method, the results are assume to be in desce= nding rank order. The highest ranking hit from all result sets is removed a= nd added to a merged hit list. This continues until all results set from al= l servers are exhuasted. As hits are added to the list, the hit number is a= djusted to the correct value. The total hists and max rank for the results = set are also updated. The appropriate page of results is then selected.
NOTE: the actual implementations of merge algorithms ar= e optimised for performance and only collect the required page of results.<= /p>
The following configuration items are supported:
Element | Type | Default | Description |
---|---|---|---|
federationResultTag | String | aspireFederationResult | The tag in the document holding all of the resul= ts sets from the federated queries. |
resultTag | String | SEGMENT | The tag of elements holding the individual resul= t sets. |
mergeType | String | robin | The default merge type to use if the merge type = is not given in the document. |
<component subType=3D"merge" name=3D"Merger" factoryName=3D"aspire= -federation"> <resultTag>SEGMENTS</resultTag> <federationResultTag>aspireFederationResult</federationResultTa= g> <mergeType>robin</mergeType> <debug>false</debug> </component>