The Branch Handler is a common utility used by many components to specify how jobs are routed.
See Programming Components which use the Branch Handler for information on how to code your component to use the branch handler.
Components and stages which use the branch handler will branch the job based on a particular type of event. Most components will use the "onPublish" event - the most common type, which simply says that the job is being published, i.e., the job is ready to be sent someplace else.
The pipeline manager also has "onError", "onComplete" and "onTerminate" events, which branch jobs on exceptions, job completion and job termination. See the Pipeline Manager for more details.
Other types of branch events may be defined by other components. See the component configuration for a description of the various branch events that they define and what they mean.
Components and stages which use the branch handler will require a <branches> tag in their <config> section, as shown in the following examples.
For each branch, you can determine which pipeline manager will receive the job when the branch event occurs:
<branches> <branch event="onPublish" pipelineManager="ProcessPatentPipelineManager" /> </branches>
In the above example, the job will be sent to the default pipeline within the "process-patent-pipeline-manager" when the "onPublish" event occurs for the job.
You can also identify which pipeline should receive the job:
<branches> <branch event="onPublish" pipelineManager="ProcessPatentPipelineManager" pipeline="process-application" /> </branches>
As well as which stage within the pipeline:
<branches> <branch event="onPublish" pipelineManager="ProcessPatentPipelineManager" pipeline="process-application" stage="processInventors"/> </branches>
You can use the @writeToFile attribute to write all jobs branched from a branch handler to a file. This is typically used for unit testing of components.
<branches> <branch event="onPublish" writeToFile="testout/scanDirTest.out"/> </branches>
See Programming Components which use the Branch Handler for more details.
The above method, using branchhandler.enqueue() to enqueue a job on a pipeline manager, is good for new jobs.
If you want to make the current job go someplace else, the best method is to use the pipeline manager's branching structure. This is done by calling job.setBranch("branchLabel") in your component, for example:
job.setBranch("onMissingData");
Note that this can be called by Groovy scripting components as well.
Next, in your pipeline manager, specify where the branch event should go:
<pipeline name="doc-process" default="true"> <stages> <stage component="fetchUrl" /> <stage component="extractText" /> <stage component="splitter" /> <!-- The following pipeline stage causes an "onMissingData" event --> <stage component="checkForMissingData" /> </stages> <branches> <branch event="onError" pipeline="error-pipeline" /> <branch event="onMissingData" pipelineManager="dataEnhancementPipelineManager"/> </branches> </pipeline>
See the Pipeline Manager for more details on configuring pipelines.
Using this technique, the pipeline stage can be written to cause certain events to occur. It is the pipeline configuration that determines the actual location where the job will be sent.
Notes:
One of the most frequent confusions related to the branch handler is where to place the <branches> configuration.
If you are enqueuing or processing jobs from a component (typically feeders or enqueue() or process() in Groovy scripts), you should place the <branches> tag with in the component configuration. In this case, the branch must contain a reference to the pipelineManager. You cannot enqueue() or process() jobs on a branch where the pipeline manager is not specified.
<component name="httpFeeder" subType="default" factoryName="aspire-http-feeder"> <config> <servletName>cgi-bin</servletName> <feederLabel>httpFeeder</feederLabel> <branches> <branch event="onPublish" pipelineManager="pipeManager" pipeline="query"/> <<<< WORKS <branch event="onUpdate" pipeline="query"/> <<<< THROWS EXCEPTION WHEN JOB ENQUEUED </branches> </config> </component>
If you are branching the current job from a component (say aspire-tools/conditionalBranch or job.setBranch("event") in Groovy scripts) then should place the <branches> tag with in the pipeline configuration of the pipeline manager (see pipeline manager branches for more details). In this case, if the branch does not contain a reference to the pipelineManager, it is assumed to be the current one.
<component name="pipeManager" subType="pipeline" factoryName="aspire-application"> <config> <components> . . <component name="federate" subType="default" factoryName="aspire-groovy"> <config> <script> <![CDATA[ . . // Set the main job to branch so we miss the unfederated query job.setBranch("onFederatedQuery"); ]]> </script> </config> </component> . . </components> <pipelines> <pipeline name="query" default="true"> <stages> . . <stage component="federate" /> <stage component="loadXMLResults" /> <stage component="waitForFederate" /> . . </stages> <branches> <branch event="onFederatedQuery" stage="waitForFederate"/> </branches> </pipeline> </pipelines> </config> </component>
The branch handler can be configured to handle batches of jobs. The purpose of this is to permit several smaller jobs to be put together into a larger one, hopefully reducing the amount of transactions/requests required on pipeline stages and thus increasing performance.
When batching is enabled, only components that do batching (such as Post HTTP) will actually take any advantage of this option. Other components will work the same (single jobs).
See each particular component documentation to find out if that component supports job batching.
Example configuration:
<branches> <branch event="onPublish" pipelineManager="ProcessPatentPipelineManager" pipeline="process-application" batching="true" batchSize="10" batchTimeout="1000" simultaneousBatches="2" batchPipeline="batchCompletedPipeline" batchPipelineManager="ProcessCompletedBatchPipelineManager" batchStage="batchCompletedStage" /> </branches>
The options batchPipelineManager, batchPipeline and batchStage can be configured the same way as the options pipelineManager, pipeline and stage, with the difference that they are going to be used for sending the batch job once the batch is completed. For further details see Batch Job