Best Practices To Improve Performance in JDBC
Best Practices To Improve Performance in JDBC
This topic illustrates the best practices to improve performance in JDBC with the
following sections:
• Overview of JDBC
• Choosing the right Driver
• Optimization with Connection
• Set optimal row pre-fetch value
• Use Connection pool
• Control transaction
• Choose optimal isolation level
• Close Connection when finished
• Optimization with Statement
• Choose right Statement interface
• Do batch update
• Do batch retrieval using Statement
• Close Statement when finished
• Optimization with ResultSet
• Do batch retrieval using ResultSet
• Setup proper direction of processing rows
• Use proper getxxx() methods
• Close ResultSet when finished
• Optimization with SQL Query
• Cache the read-only and read-mostly data
• Fetch small amount of data iteratively instead of fetching whole data at
once
• Key Points
Overview of JDBC
JDBC defines how a Java program can communicate with a database. This section
focuses mainly on JDBC 2.0 API. JDBC API provides two packages they are java.sql
and javax.sql . By using JDBC API, you can connect virtually any database, send SQL
queries to the database and process the results.
JDBC architecture defines different layers to work with any database and java, they are
JDBC API interfaces and classes which are at top most layer( to work with java ), a
driver which is at middle layer (implements the JDBC API interfaces that maps java to
database specific language) and a database which is at the bottom (to store physical data).
The following figure illustrates the JDBC architecture.
JDBC API provides interfaces and classes to work with databases. Connection interface
encapsulates database connection functionality, Statement interface encapsulates SQL
query representation and execution functionality and ResultSet interface encapsulates
retrieving data which comes from execution of SQL query using Statement.
We will look at these areas one by one, what type of driver you need to load, how to use
Connection interface in the best manner, how to use different Statement interfaces, how
to process results using ResultSet and finally how to optimize SQL queries to improve
JDBC performance.
Note1: Your JDBC driver should be fully compatible with JDBC 2.0 features in order to
use some of the suggestions mentioned in this section.
Note2: This Section assumes that reader has some basic knowledge of JDBC.
Driver is the key player in a JDBC application, it acts as a mediator between Java
application and database. It implements JDBC API interfaces for a database, for example
Oracle driver for oracle database, Sybase driver for Sybase database. It maps Java
language to database specific language including SQL.
JDBC defines four types of drivers to work with. Depending on your requirement you
can choose one among them.
Type of
Tier Driver mechanism Description
driver
This driver converts JDBC calls to ODBC calls
through JDBC-ODBC Bridge driver which in turn
1 Two JDBC-ODBC
converts to database calls. Client requires ODBC
libraries.
This driver converts JDBC calls to database
2 Two Native API - Partly - Java driver specific native calls. Client requires database
specific libraries.
This driver passes calls to proxy server through
network protocol which in turn converts to
3 Three JDBC - Net -All Java driver
database calls and passes through database
specific protocol. Client doesn't require any driver.
This driver directly calls database. Client doesn't
4 Two Native protocol - All - Java driver
require any driver.
Obviously the choice of choosing a driver depends on availability of driver and
requirement. Generally all the databases support their own drivers or from third party
vendors. If you don't have driver for your database, JDBC-ODBC driver is the only
choice because all most all the vendors support ODBC. If you have tiered requirement
( two tier or three tier) for your application, then you can filter down your choices, for
example if your application is three tiered, then you can go for Type three driver between
client and proxy server shown below. If you want to connect to database from java
applet, then you have to use Type four driver because it is only the driver which supports
that feature. This figure shows the overall picture of drivers from tiered perspective.
This figure illustrates the drivers that can be used for two tiered and three tiered
applications. For both two and three tiered applications, you can filter down easily to
Type three driver but you can use Type one, two and four drivers for both tiered
applications. To be more precise, for java applications( non-applet) you can use Type
one, two or four driver. Here is exactly where you may make a mistake by choosing a
driver without taking performance into consideration. Let us look at that perspective in
the following section.
Type 3 & 4 drivers are faster than other drivers because Type 3 gives facility for
optimization techniques provided by application server such as connection pooling,
caching, load balancing etc and Type 4 driver need not translate database calls to ODBC
or native connectivity interface. Type 1 drivers are slow because they have to convert
JDBC calls to ODBC through JDBC-ODBC Bridge driver initially and then ODBC
Driver converts them into database specific calls. Type 2 drivers give average
performance when compared to Type 3 & 4 drivers because the database calls have to be
converted into database specific calls. Type 2 drivers give better performance than Type
1 drivers.
3. Use Type 1 driver if your database doesn't support a driver. This is rare situation
because almost all major databases support drivers or you will get them from third party
vendors.
4.Use Type 3 driver to communicate between client and proxy server ( weblogic,
websphere etc) for three tiered applications that gives better performance when compared
to Type 1 & 2 drivers.
3. Control transaction
Each of these operations effects the performance. We will walk through each operation
one by one.
We have different approaches to establish a connection with the database, the first type of
approach is :
1. DriverManager.getConnection(String url)
props.put("user","scott");
props.put("password","tiger");
props.put("defaultRowPrefetch","30");
props.put("defaultBatchValue","5");
You need to figure out appropriate values for above properties for better performance
depending on application's requirement. Suppose, you want to set these properties for
search facility, you can increase defaultRowPrefetch so that you can increase
performance significantly.
You can get the connection using javax.sql.DataSource interface. The advantage of
getting connection from this approach is that the DataSource works with JNDI. The
implementation of DataSource is done by vendor, for example you can find this feature in
weblogic, websphere etc. The vendor simply creates DataSource implementation class
and binds it to the JNDI tree. The following code shows how a vendor creates
implementation class and binds it to JNDI tree.
dsi.setServerName("oracle8i");
dsi.setDatabaseName("Demo");
ctx.bind("jdbc/demoDB", dsi);
This code registers the DataSourceImpl object to the JNDI tree, then the programmer can
get the DataSource reference from JNDI tree without knowledge of the underlying
technology.
Context ctx = new InitialContext();
DataSource ds = (DataSource)ctx.lookup("jdbc/demoDB");
By using this approach we can improve performance. Nearly all major vendor application
servers like weblogic, webshpere implement the DataSource by taking connection from
connection pool rather than a single connection every time. The application server creates
connection pool by default. We will discuss the advantage of connection pool to improve
performance in the next section.
Creating a connection to the database server is expensive. It is even more expensive if the
server is located on another machine. Connection pool contains a number of open
database connections with minimum and maximum connections, that means the
connection pool has open connections between minimum and maximum number that you
specify. The pool expands and shrinks between minimum and maximum size depending
on incremental capacity. You need to give minimum, maximum and incremental sizes as
properties to the pool in order to maintain that functionality. You get the connection from
the pool rather directly .For example, if you give properties like min, max and
incremental sizes as 3, 10 and 1 then pool is created with size 3 initially and if it reaches
it's capacity 3 and if a client requests a connection concurrently, it increments its capacity
by 1 till it reaches 10 and later on it puts all its clients in a queue.
1. You can depend on application server if it supports this feature, generally all the
application servers support connection pools. Application server creates the connection
pool on behalf of you when it starts. You need to give properties like min, max and
incremental sizes to the application server.
3. Or you can create your own connection pool if you are not using any application
server or JDBC 2.0 compatible driver.
By using any of these options, you can increase performance significantly. You need to
take care of properties like min, max and incremental sizes. The maximum number of
connections to be given depends on your application's requirement that means how many
concurrent clients can access your database and also it depends up on your database's
capability to provide maximum number of connections.
3. Control transaction
In general, transaction represents one unit of work or bunch of code in the program that
executes in it's entirety or none at all. To be precise, it is all or no work. In JDBC,
transaction is a set of one or more Statements that execute as a single unit.
boolean getAutoCommit();
void commit();
void rollback();
By default in JDBC transaction starts and commits after each statement's execution on a
connection. That is the AutoCommit mode is true. Programmer need not write a commit()
method explicitly after each statement.
Obviously this default mechanism gives good facility for programmers if they want to
execute a single statement. But it gives poor performance when multiple statements on a
connection are to be executed because commit is issued after each statement by default,
that in turn reduces performance by issuing unnecessary commits. The remedy is to flip it
back to AutoCommit mode as false and issue commit() method after a set of statements
execute, this is called as batch transaction. Use rollback() in catch block to rollback the
transaction whenever an exception occurs in your program. The following code illustrates
the batch transaction approach.
try{
connection.setAutoCommit(false);
ps.setString(1,"Austin");
ps.setString(2,"RR");
ps.executeUpdate();
ps1.setString(2,"RR");
ps1.executeUpdate();
connection.commit();
connection.setAutoCommit(true);
finally{
if(ps1 != null){ps1.close();}
if(connection != null){connection.close();}
This batch transaction gives good performance by reducing commit calls after each
statement's execution.
Isolation level represent how a database maintains data integrity against the problems like
dirty reads, phantom reads and non-repeatable reads which can occur due to concurrent
transactions. java.sql.Connection interface provides methods and constants to avoid the
above mentioned problems by setting different isolation levels.
int getTransactionIsolation();
}
You can get the existing isolation level with getTransactionIsolation() method and set the
isolation level with setTransactionIsolation(int isolationlevelconstant) by passing above
constants to this method.
The following table describes isolation level against the problem that it prevents :
Permitted Performance
Transaction Level
Phenomena impact
Dirty Non Repeatable Phantom
reads reads reads
TRANSACTION_NONE N/A N/A N/A FASTEST
TRANSACTION_READ_UNCOMMITED YES YES YES FASTEST
TRANSACTION_READ_COMMITED NO YES YES FAST
TRANSACTION_REPEATABLE_READ NO NO YES MEDIUM
TRANSACTION_SERIALIZABLE NO NO NO SLOW
YES means that the Isolation level does not prevent the problem
By setting isolation levels, you are having an impact on the performance as mentioned in
the above table. Database use read and write locks to control above isolation levels. Let
us have a look at each of these problems and then look at the impact on the performance.
The problem is that T2 gets wrong PRICE=20 for PRODUCT = A001 instead of 10
because of uncommitted read. Obviously it is very dangerous in critical transactions if
you read inconsistent data. If you are sure about not accessing data concurrently then
you can allow this problem by setting TRANSACTION_READ_UNCOMMITED or
TRANSACTION_NONE that in turn improves performance otherwise you have to use
TRANSACTION_READ_COMMITED to avoid this problem.
Here the problem is that Transaction1 reads 10 first time and reads 20 second time but it
is supposed to be 10 always whenever it reads a record in that transaction. You can
control this problem by setting isolation level as TRANSACTION_REPEATABLE_READ.
COMPANY_ID= 10
Here the problem is that T1 gets 2 rows instead of 1 row up on selecting the same
condition second time. You can control this problem by setting isolation level as
TRANSACTION_SERIALIZABLE
If you write a critical program like bank or stocks analysis program where you want to
control all of the above mentioned problems, you can choose
TRANSACTION_SERIALIZABLE for maximum safety. Here it is the tradeoff between the
safety and performance. Ultimately we need safety here.
If you don't have to deal with concurrent transactions your application, then the best
choice is TRANSACTION_NONE to improve performance.
Other two isolation levels need good understanding of your requirement. If your
application needs only committed records, then TRANSACTION_READ_COMMITED
isolation is the good choice. If your application needs to read a row exclusively till you
finish your work, then TRANSACTION_REPEATABLE_READ is the best choice.
Note: Be aware of your database server's support for these isolation levels. Database
servers may not support all of these isolation levels. Oracle server supports only two
isolation levels, TRANSACTION_READ_COMMITED and TRANSACTION_SERIALIZABLE
isolation level, default isolation level is TRANSACTION_READ_COMMITED.
2. Do batch update
3. Do batch retrieval using Statement
There are three types of Statement interfaces in JDBC to represent the SQL query and
execute that query, they are Statement, PreparedStatement and CallableStatement.
Statement is used for static SQL statement with no input and output parameters,
PreparedStatement is used for dynamic SQL statement with input parameters and
CallableStatement is used for dynamic SQL satement with both input and output
parameters, but PreparedStatement and CallableStatement can be used for static SQL
statements as well. CallableStatement is mainly meant for stored procedures.
2. Do batch update
You can send multiple queries to the database at a time using batch update feature of
statement objects this reduces the number of JDBC calls and improves performance. Here
is an example of how you can do batch update,
statement.executeBatch();
Statement.setFetchSize(30);
Here it retrieves 30 rows at a time for all result sets of this statement.
Close statement object as soon as you finish working with that, it explicitly gives a
chance to garbage collector to recollect memory as early as possible which in turn effects
performance.
Statement.close();
ResultSet interface also provides batch retrieval facility like Statement as mentioned
above. It overrides the Statement behaviour.
This feature significantly improves performance when you are dealing with retrieval of
large number of rows like search functionality.
ResultSet has the capability of setting the direction in which you want to process the
results, it has three constants for this purpose, they are
ResultSet.setFetchDirection(FETCH_REVERSE);
ResultSet interface provides lot of getxxx() methods to get and convert database data
types to java data types and is flexibile in converting non feasible data types. For
example,
If you give non recommended parameters, it needs to cast it to proper java data type that
is expensive. For example consider that you select a product's id from huge database
which returns millions of records from search functionality, it needs to convert all these
records that is very expensive.
Close ResultSet object as soon as you finish working with ResultSet object even though
Statement object closes the ResultSet object implicitly when it closes, closing ResultSet
explicitly gives chance to garbage collector to recollect memory as early as possible
because ResultSet object may occupy lot of memory depending on query.
ResultSet.close();
Optimization with SQL Query
This is one of the area where programmers generally make a mistake
The returned result set contains all the columns data. you may not need all the column
data and want only salary for RR.
If an application reads data from these tables for every client request, then it is redundant,
unnecessary and expensive. The solution for this problem is to cache the read-only table
data by reading the data from that table once and caching the read-mostly table data by
reading and refreshing with time limit. This solution improves performance significantly.
See the following link for source code of such caching mechanism.
http://www.javaworld.com/javaworld/jw-07-2001/jw-0720-cache.html
You can tweak this code as per application requirement. For read-only data, you need not
refresh data in its life time. For read-mostly data, you need to refresh the data with time
limit. It is better to set this refreshing time limit in properties file so that it can be changed
at any time.
1. Cache the search data at the server-side and return the data iteratively to the client. For
example, the search returns 1000 records, return data to the client in 10 iterations where
each iteration has 100 records.
2. Use Stored procedures to return data iteratively. This does not use server-side caching
rather server-side application uses Stored procedures to return small amount of data
iteratively.
Out of these solutions the second solution gives better performance because it need not
keep the data in the cache (in-memory). The first procedure is useful when the total
amount of data to be returned is not huge.
Key Points
1. Use Type two driver for two tiered applications to communicate from
java client to database that gives better performance than Type1
driver.
2. Use Type four driver for applet to database communication that is two
tiered applications and three tiered applications when compared to
other drivers.
3. Use Type one driver if you don't have a driver for your database. This
is a rare situation because all major databases support drivers or you
will get a driver from third party vendors.
4. Use Type three driver to communicate between client and proxy server
( weblogic, websphere etc) for three tiered applications that gives
better performance when compared to Type 1 &2 drivers.
5. Pass database specific properties like defaultPrefetch if your database
supports any of them.
6. Get database connection from connection pool rather than getting it
directly
7. Use batch transactions.
8. Choose right isolation level as per your requirement.
TRANSACTION_READ_UNCOMMITED gives best performance for
concurrent transaction based applications. TRANSACTION_NONE gives
best performance for non-concurrent transaction based applications.
9. Your database server may not support all isolation levels, be aware of
your database server features.
10. Use PreparedStatement when you execute the same statement more
than once.
11. Use CallableStatement when you want result from multiple and
complex statements for a single request.
12. Use batch update facility available in Statements.
13. Use batch retrieval facility available in Statements or ResultSet.
14. Set up proper direction for processing rows.
15. Use proper getXXX() methods.
16. Close ResultSet, Statement and Connection whenever you finish your
work with them.
17. Write precise SQL queries.
18. Cache read-only and read-mostly tables data.
19. Fetch small amount of data iteratively rather than whole data at once
when retrieving large amount of data like searching database etc.