Configuration
The scheduler recognizes the following configuration tags.
Element | Type | Default | Description |
---|---|---|---|
enabled | boolean | true | Whether the scheduler is enabled. If false, then no jobs will be submitted for any configured schedule. |
schedules | One or more schedules on which jobs will be fired. Also see the section on schedules stored in a database below. | ||
schedules/schedule | A schedule on which jobs will be fired. | ||
schedules/schedule/@name | String | The (optional) name for the schedule. | |
schedules/schedule/@enabled | boolean | true | Whether this specific schedule is enabled. If false, then no jobs will be submitted for this schedule. |
schedules/schedule/@singleton | boolean | true | Specifies that this schedule may only fire one job at a time. If true and the scheduled time is reached again, then a new job will only be published if the previous job has completed. |
schedules/schedule/cron | String | Mandatory for this schedule | Specifies the schedule in cron style (see above for the format). This must be specified for any schedule configured here. |
schedules/schedule/job | String | Specifies the job data that will be published when the scheduled time is reached. The data can be specified in either XML or JSON style (indicated by the type attribute – see below). The data will have the scheduler information added as attributes to the root node. If not specified, an empty document will be published. NOTE: this configuration item is a String and XML/JSON text should be surrounded with a <[CDATA[]]>. | |
schedules/schedule/job/@type | String | xml | Specifies style of the data in the <job> tag. Can be either xml or json. |
schedules/schedule/event | String | Mandatory for this schedule | Specifies the event to publish the job to. Must match one of the events configured in the branch handler <branches> configuration. |
quartz | N/A | Container for the properties to be passed to the Quartz Scheduler. | |
quartz/property | String | The value of the property to be passed to the Quartz Scheduler. | |
quartz/property/@name | String | The name of the property to be passed to the Quartz Scheduler. |
The scheduler can read its schedules from a database. To configure this, the following configuration can be used:
Element | Type | Description |
---|---|---|
rdb/@component | String | If schedules should be loaded from a database, this attribute holds the path to the Aspire database connection pool component (aspire-rdb). |
rdb/sql/schedules | String | If schedules should be loaded from a database, this element holds the SQL that will be used to extract the schedules from the database configured via the schedules/@rdb attribute. See below for the columns that should be returned. |
rdb/sql/jobRunningCheck | String | If schedules taken from the RDB are singletons, this SQL will be run when the schedule fires to check whether a job is still running. If not specified, no check on the database will be performed, but the existing check making sure that the number of outstanding jobs is 0 may still prevent the job from firing. The SQL provided is a template that has values substituted. See below for the values that may be substituted. |
rdb/sql/jobStarted | String | This SQL is run when a job is started. Typically it is used to allow singleton control via an external database. The SQL provided is a template that has values substituted. See below for the values that may be substituted. |
rdb/sql/jobStopped | String | This SQL is run when a stop job is sent. The SQL provided is a template that has values substituted. See below for the values that may be substituted. |
rdb/sql/jobPaused | String | This SQL is run when a pause job is sent. The SQL provided is a template that has values substituted. See below for the values that may be substituted. |
rdb/sql/jobResumed | String | This SQL is run when a resume job is sent. The SQL provided is a template that has values substituted. See below for the values that may be substituted. |
rdb/sql/jobFinished | String | This SQL is run when a job finishes successfully. Typically it is used to allow singleton control via an external database. This SQL may be blank, to allow completion of a job to be marked by an external process. The SQL provided is a template that has values substituted. See below for the values that may be substituted. |
rdb/sql/jobFailed | String | This SQL is run when a job finishes with an error. Typically it is used to allow singleton control via an external database. This SQL may be blank, to allow completion of a job to be marked by an external process. However, if the job failed, the external process may not have marked the job as complete, meaning singleton jobs would be blocked. The SQL provided is a template that has values substituted. See below for the values that may be substituted. |
rdb/sql/crawlId | String | The SQL used to determine the crawl id. If this SQL exists, it is run whenever a job is published and the result is added to the job in the crawlId attribute of the document. The first column of the first row of the result set is used as the crawl ID. |
rdb/autoReloadSchedules | long | Time in milliseconds between automatic reloads of the schedules from the RDB. If missing or 0, automatic reloads will be disabled. |
Database Schedule Selection SQL
The SQL should return the mandatory columns and may return the optional columns from the following:
Column | Description |
---|---|
name | The schedule name |
enabled | True if the schedule is enabled (defaults to true). |
singleton | True if this schedule is a singleton (defaults to true). |
cron | The cron schedule (mandatory). |
jobType | The type of data given in the jobData column (defaults to XML). |
jobData | The data to be sent in the job when the scheduled time is reached. This may be given in XML or JSON format as specified by the jobType column and should be given as a string. |
event | The event to publish the job on (mandatory). |
sourceId | The external ID (of the source) to be added to the job (if available). |
The format of the columns follows the formats given in the Basic Configuration section above. Column names can be enforced by use of the SQL “AS” keyword.
Database Job Control SQL
SQL contained in the jobRunningCheck, jobStarted, jobFinished and jobFailed may contain variables for substitution. Variables are surrounded with { } (see Simple Templates for more details). The following variables my be specified:
Variable | Available | Description |
---|---|---|
scheduler | always | The component name of the scheduler. |
scheduleId | always | The ID of the schedule that fired this job. |
sourceName | always | The name of the source that fired this job. |
sourceId | always | The source ID of the source that fired this job if available (from the sourceId column of the schedule SQL). |
jobNumber | jobStarted, jobStopped, jobPaused, jobResumed,jobFinished, jobFailed | The unique number allocated to this job from the scheduler. |
jobId | jobStarted, jobStopped, jobPaused, jobResumed,, jobFinished, jobFailed | The job ID associated to the Job object published for this schedule. |
jobSuccess | jobFinished, jobFailed | true if the job listener received a JobComplete event (i.e. the job completed the pipeline without failure), false otherwise. |
jobResult | jobFinished, jobFailed | XML representation of the result from the JobEvent. |
Branch Configuration
The Aspire Scheduler publishes jobs using the branch manager. Thus it requires the standard Branch Handler configuration detailed below:
Element | Type | Description |
---|---|---|
branches/branch/@event | String | The event to configure. At the very least, you should include the onPublish event. |
branches/branch/@pipelineManager | String | The URL of the pipeline manager to publish to. Can be relative. |
branches/branch/@pipeline | String | The name of the pipeline to publish to. |
branches/branch/@stage | String | The name of the stage to publish to. |
Example Configuration
<component name="myScheduler" subType="default" factoryName="aspire-scheduler"> <schedules> <schedule name="myFirstSchedule" enabled="false"> <cron>1/10 * * * * ?</cron> <event>onPublish</event> <job> <![CDATA[ <doc> <fetchUrl>support.searchtechnologies.com</fetchUrl> </doc> ]]> </job> </schedule> <schedule enabled="false"> <cron>2/10 * * * * ?</cron> <event>onPublish2</event> </schedule> <schedule enabled="false"> <cron>3/10 * * * * ?</cron> <event>onPublish3</event> <job type="json"> <![CDATA[ { "doc" : { "fetchUrl" : "www.searchtechnologies.com" } } ]]> </job> </schedule> <schedule enabled="false"> <cron>4/10 * * * * ?</cron> <event>onPublish4</event> <job type="json"> <![CDATA[ { "doc" : { "fetchUrl" : "repositories.searchtechnologies.com" } } ]]> </job> </schedule> </schedules> <branches> <branch event="onPublish" pipelineManager="PipelineManager" /> <branch event="onPublish2" pipelineManager="PipelineManager" pipeline="myPipeline" /> <branch event="onPublish3" pipelineManager="PipelineManager" pipeline="myPipeline" stage="myStage" /> <branch event="onPublish4" pipelineManager="PipelineManager-not-exist" /> </branches> </component>
Servlet Commands
The following servlet commands are available via the scheduler (via http://server:port/scheduler?cmd=XXXX¶m=value):
Command | Description | Parameters |
---|---|---|
add | Adds a schedule to the scheduler | event: the event the schedule should publish to cron: the cron schedule |
delete | Deletes a schedule from the scheduler | extId: the external ID of the schedule to be deleted (optional, but this or schedId must be specified) schedId: the ID of the schedule to be deleted (optional, but this or extId must be specified) |
disable | Disables the scheduler, or a schedule if specified | extId: the external ID of the schedule to be disabled (optional) schedId: the ID of the schedule to be disabled (optional) |
enable | Enables the scheduler, or a schedule if specified | extId: the external ID of the schedule to be enabled (optional) schedId: the ID of the schedule to be enabled (optional) |
reload | Reloads all the schedules from the database. | None |
start | Sends a 'start' job for the given schedule | extId: the source (external) ID of the schedule to be started (optional, but this or schedId must be specified) schedId: the ID of the schedule to be started (optional, but this or extId must be specified) |
stop | Sends a 'stop' job for the given schedule | extId: the source (external) ID of the schedule to be stopped (optional, but this or schedId must be specified) schedId: the ID of the schedule to be stopped (optional, but this or extId must be specified) |
pause | Sends a 'pause' job for the given schedule | extId: the source (external) ID of the schedule to be paused (optional, but this or schedId must be specified) schedId: the ID of the schedule to be paused (optional, but this or extId must be specified) |
resume | Sends a 'resume' job for the given schedule | extId: the source (external) ID of the schedule to be resumed (optional, but this or schedId must be specified) schedId: the ID of the schedule to be resumed (optional, but this or extId must be specified) |
Services interface
Other components will be able to access the scheduler via a number of methods. These are made available via two interfaces – one to handle the schedules and one to handle the scheduler.
The component exposes the following interface to handle jobs:
AspireSchedule.java
The component will expose the following interface to handle the scheduler:
AspireScheduler.java