Configuration
Configuration for Connecting to Remote Relational Databases
Element | Type | Default | Description |
---|---|---|---|
jdbcDriverJar | string | null | (Required for Remote Database) The location where the JDBC driver Jar file can be located. The driver jar is usually available for download from the Relational Database vendor (see below). The jar should be copied into a directory in your Aspire distribution and then referenced from here. The path specified for "jdbcDriverJar" is relative to the Aspire Home directory. See below for an example. |
jdbcDriverClass | string | specified in the META-INF/services/java.sql.Driver file in the driver Jar file | (Optional, but required for some database vendors) The name of the JDBC driver class if the class name from the META-INF/services/java.sql.Driver file in the driver Jar file should not be used, or that file does not exist in the driver jar file (Oracle). |
jdbcUrl | string | null | (Required for Remote Database) The JDBC connection URL for accessing the relational database. This is a standard JDBC URL string which specifies the connection protocol, database type, server, port, etc. See below for an example. |
user | string | default | (Optional) The user name to use when connecting to the relational database. |
password | string | default | (Optional) The password to use when connecting to the relational database. |
jdbcProperties/property/@name jdbcProperties/property | string | N/A | (Optional) Arbitrary JDBC connection properties (multiples allowed). |
startup | string | default | (Optional) SQL commands to execute on startup. Note that these commands are executed every time the component is started or initialized (i.e. restarted, configuration file reload, etc). See below for more details. |
Configuration for controlling the connection pool
These parameters are for any type of database, remote or embedded. All of these parameters are optional.
Element | Type | Default | Description |
---|---|---|---|
timeout | int | 0 (= connections never closed) | The time in ms after which a connection in the pool should be considered for closing and purging from the pool. NOTE: the connection will not actually be closed until the next purge, so still could be reused if a request for a connection is received before the next purge occurs. |
purgeThreshold | int | 0 (= infinite connections) | The minimum number of available connections in the pool before a purge will take place. It is not guaranteed that the pool will ever have this many connections available, but the pool will not be purged unless it does. |
purgePoll | int | 60000 (= 60s) | The period in ms between purges of the connection pool. |
Example Configuration
Simple
<component name="RDBConnection" subType="default" factoryName="aspire-rdb"> <jdbcUrl>jdbc:sqlserver://localhost\SQLEXPRESS:1433</jdbcUrl> <jdbcDriverJar>lib/sqljdbc4.jar</jdbcDriverJar> <user>test</user> <password>test</password> </component>
Complex
<component name="RDBConnection" subType="default" factoryName="aspire-rdb"> <jdbcUrl>jdbc:sqlserver://localhost\SQLEXPRESS:1433</jdbcUrl> <jdbcDriverJar>lib/sqljdbc4.jar</jdbcDriverJar> <user>test</user> <password>test</password> <timeout>60000</timeout> <purgePoll>300000</purgePoll> <purgeThreshold>10</purgeThreshold> <jdbcProperties> <property name="jdbcCollection">NULLIDRA</property> </jdbcProperties> <startup> ignore errors; create table URLS ( httpcode varchar(100), url varchar(100)); notice errors; delete from URLs; </startup> </component>
Example of connection to existing Apache Derby RDB
Download driver JAR from Derby Download Page. Extract and get derbyclient.jar from "lib" folder.
<component name="CWSExternalRDBConnector" subType="default" factoryName="aspire-rdb"> <debug>true</debug> <jdbcDriverClass>org.apache.derby.jdbc.ClientDriver</jdbcDriverClass> <enabled>true</enabled> <jdbcUrl>jdbc:derby://localhost:1527/data/CSManager/db/csManager</jdbcUrl> <jdbcDriverJar>C:\jdbc\derbyclient.jar</jdbcDriverJar> <user>app</user> <password></password> <timeout>60000</timeout> <purgePoll>300000</purgePoll> </component>
The above configuration would connect you to the CSManager database (Aspire 1.X version). Note the use of the user name app. The following explanation is taken from here:
If a database is created in Derby using the embedded driver and no user name is specified, the default schema used becomes APP. Therefore any tables created in the database have a schema name of APP. However, when creating a Derby database using the Network Server, the value for the schema becomes the value of the username used to connect with as part of the database URL
Example Use Within A Pipeline
<pipeline name="process-patent" default="true"> <stages> <stage component="FetchUrl" /> <stage component="LoadXML" /> <stage component="RDBConnection"/> <!-- Now that the connection has been opened and attached to the job, you can use the connection inside of Groovy scripts with the "jdbc." variable. In the future, there may be other components which will use it as well. --> </stages> </pipeline>
Example use from within a Groovy scripting component
Typically, Groovy users will want to use the standard Groovy "Sql" object, this can be done as follows:
<component name="createCPAVars" subType="default" factoryName="aspire-groovy"> <script> import groovy.sql.Sql; import com.searchtechnologies.aspire.services.AXML; doc.putVariable("gsql", new Sql(jdbc)); gsql.execute("""insert into MyTable(country, pp_num, kind) values(${country}, ${docNumber}, ${kind})"""); gsql.commit(); row = gsql.firstRow("select @@IDENTITY ident"); println "The new identity = ${row.ident}"; </script> </component>
Note how, in the above, the JDBC connector can be accessed simply as the variable "jdbc". If you want to use this variable directly (it is the type "java.sql.Connection"), feel free.
Example Use from within a Custom Pipeline Stage
Getting and using the RDB connection from within your custom pipeline stage is easy:
public void process(Job j) throws AspireException { AspireObject doc = (AspireObject) j.getObject(); Connection rdbConn = doc.getObject("jdbc"); . . // Use the connection here . // Note: No need to return the connection back to the connection pool. // It will be returned to the pool automatically when the job is completed }
Example For Requesting Connections from Non-Pipeline-Stage Custom Components
If you are writing your own component in Java and you need an RDB connection, you can get components from an aspire-rdb stage, as follows.
Notes:
- You will need to specify "com.searchtechnologies.aspire.rdb" in your <Import-Package> statement in your pom.xml file for your component (see How to Include Third-Party Jars in Your Component (Aspire 2) for more details).
- Your Maven project will also need to specify the "com.searchtechnologies.aspire:aspire-rdb" maven artifact as a dependency.
- Because your component will be dependent on "com.searchtechnologies.aspire.rdb", you will need to make sure that the RDB Component bundle is loaded before your component in the Component Manager (Aspire 2) configuration.
- This can be done by loading the aspire-rdb bundle before the bundles for dependent components
- It can also be done by <autoStart> the appropriate settings configuration file (one with an aspire-rdb bundle) before your own system configuration file. For more information on auto-starting, see the settings configuration.
// Your component will need a service tracker variable: // Service tracker for the CCD. ServiceTracker rdbTracker = null; // *** Somewhere in your initialization code, do the following: *** { . . . // Probably you should get the name of the ''connection component'' from your configuration file String rdbConnectionComponentName = getStringFromConfig(config,"connectionPoolName","/system/RDBConnectionPool"); rdbTracker = this.getComponentServiceTracker(rdbConnectionComponentName); . . . } // ** Once the tracker is initialized, you can use it to get RDBMS connections: { . . . // Note: The following should be done *every time* you need to access a connection // Do not put the results into member variables for convenience. If the RDB connection is // shut down dynamically and then started back up, you want the following code to access // the latest and greatest RDBMS pool component. // Fortunately, the following executes very, very fast. So there are no performance issues. rdbPool = (RDBMSConnectionPool) rdbTracker.getService(); if(rdbPool == null) { rdbPool = (RDBMSConnectionPool) rdbTracker.waitForService(AspireConstants.STANDARD_COMPONENT_AVAILABLE_TIMEOUT); } if(rdbPool == null) { // throw an aspire connection - the pool was not available // even after waiting a reasonable amount of time. } // Now, fetch a connection: Connection rdbConn = rdbPool.getConnection(); . . // Now, use it . // Now, return it to the pool rdbPool.returnConnectionToPool(rdbConn); }
Startup SQL Statements
Some limited capability is available to execute SQL statements whenever an aspire-rdb component is initialized. These statements are executed using the <startup> command:
<startup> ignore errors; create table people(age int, name varchar(100)); notice errors; delete from people; insert into people values(34, 'James Dean'), (92, 'Lady Godiva'); </startup>
Startup commands are especially useful for in-memory databases such as aspire-derby, which are completely empty when first initialized.
The language between the <startup> tags is simply a list of SQL statements, each statement terminated with a semi-colon, with a few additional special commands.
Language Details:
- Statements can be any SQL statement other than SELECT (insert, update, drop, create, delete, etc.)
- Every statement must end with a semi-colon
- String constants are surrounded by single-quotes
- Embedded single quotes are escaped with a pair of quotes, for example:
'William ''Buddy'' Johnson'
- Embedded single quotes are escaped with a pair of quotes, for example:
Additional Commands Include:
- ignore errors - Ignore any SQL errors which occur, for example creating a table which already exists.
- notice errors - Errors in executing SQL statements are trapped and will cause initialization to fail.
- Note that these errors will be considered to be 'fatal', meaning that all jobs sent to the component will fail with an exception error.
- commit - Commit all changes.
- Not typically needed, because auto commit is turned on by default.
- rollback - Rollback all changes.
- Probably that not useful a command.
- auto commit off - Turn off auto commit.
- auto commit on - Turn on auto commit.
Improving Performance with JDBC Properties
Many JDBC connectors have properties that you can modify to change behavior or performance metrics.
For example, in Oracle, you might try the following to improve fetching performance with SELECT statements that retrieve lots of rows:
<jdbcProperties> <property name="defaultRowPrefetch">500</property> </jdbcProperties>
This will change the default number of rows fetched on a transaction from 10 to 500. This will result in fewer overall transactions and may result in better SELECT performance.