Working with Oracle

Abstract

I have recently had to port a product to using Oracle backend. That was an opportunity for me to work with Oracle again, and with 10g [1] for the first time. I ran into quite a handful of issues and thought I would put them somewhere so I won't forget later. This article is aimed towards developers working with Oracle and Java JDBC. (September 2005).

Paging

The objective is to retrieve blocks of n rows from the results. If the page is not the first one, then the first skip rows would be discarded until the n rows of the page of interest are read. The skip rows could be read then discarded, but that would be an expensive operation if thousands of rows were to be skipped. A better way is to have the SQL discard them so that only rows of the page of interest are returned. Below are the two most common SQL forms to do so in Oracle; the first form being the fastest according to AskTom [2].

-- Form 1
SELECT * FROM
  (SELECT Q.*, rownum AS rnum FROM 
      (THE_QUERY) Q WHERE rownum <= (skip+n))
WHERE rnum > skip

-- Form 2
SELECT * FROM
  (SELECT Q.*, rownum AS rnum FROM (THE_QUERY) Q)
WHERE rnum BETWEEN skip+1 AND skip+n

Though the first form is faster, the second form is more generic. Experience has shown that the first form does not work well when THE_QUERY has JOINs or an ORDER BY clause without a unique column. In these cases, records could be repeated from page to page, resulting, some times, in identical pages.

Large Objects (LOBs)

Large objects are a set of datatypes designed to hold large amounts of data (in order of terabytes). There are three types of LOBs in Oracle: binary (BLOB), character (CLOB) and BFILE. The latter is a read-only datatype that points to an external file.

Restrictions

There are some restrictions on using LOBs. Table 1 below summarizes those restrictions. Oracle also requires that all columns in the ORDER BY clause must appear in the SELECT clause, so that the following statement yields to an error: SELECT col1, col3 FROM atable ORDER BY col2.

Table 1 - Unsupported use of LOBs in SQL.
Unsupported SQL OperationsExample of unsupported usage
SELECT DISTINCT/UNIQUESELECT DISTINCT clobCol FROM...
ORDER BYSELECT ... ORDER BY clobCol
GROUP BYSELECT avg(num) FROM ... GROUP BY clobCol
UNION, INTERSECT, MINUS
(Note that UNION ALL works for LOBs)
SELECT clobCol1 FROM table1 UNION SELECT clobCol2 FROM table2
JOIN queries SELECT ... FROM ... WHERE table1.clobCol = table2.clobCol
INDEX columnsCREATE INDEX clobIndx ON atable(clobCol)...

Functions and Operations on LOBs

Aggregated functions (count, min, max, grouping) and unicode functions are not supported for use with LOBs. Table 2 lists the functions and operations by category and specifies whether they are supported in SQL or PL/SQL. CNV indicates the corresponding functions are performed by converting the CLOB to a character datatype such as VARCHAR2. In SQL environment, only the first 4K bytes of the CLOB are converted and used in the operation. In PL/SQL environment, only the first 32K bytes are converted and used.

Table 2 -Character Functions and Operations on LOBs.
CategoryOperation / Function SQL Example / CommentsSQLPL/SQL
Concatenation||, concat() SELECT clobCol1 || clobCol2 FROM atableYesYes
Comparison = , !=, >, >=, <, <=, <>, ^= IF clobCol1 = clobCol2 THEN ...NoYes
IN, NOT INIF clobCol NOT IN (clob1, clob2, clob3) THEN ... NoYes
BETWEENIF clobCol BETWEEN clobCol2 AND clobCol3 THEN ... NoYes
LIKE [ESCAPE]IF clobCol LIKE '%pattern%' THEN ... YesYes
IS [NOT] NULL... WHERE clobCol IS NOT NULL YesYes
Character Functions lower, nls_lower,
upper, nls_upper
... WHERE upper(clobCol1) = upper(clobCol2)YesYes
lpad, rpadSELECT rpad(clobCol, 10000, 'a') FROM... YesYes
trim, ltrim, rtrim... WHERE ltrim(clobCol, '0') = '100' YesYes
replaceSELECT replace(clobCol, 'old', 'new') FROM ... YesYes
substr... WHERE substr(clobCol, 1, 4) = 'this' YesYes
translateSELECT translate(clobCol, '123', 'abc') FROM ... cnvcnv
asciiSELECT ascii(clobCol) FROM ... cnvcnv
instr... WHERE instr(clobCol, 'book') = 11 YesYes
length... WHERE length(clobCol) < 50 YesYes
nlssort ... WHERE nlssort(clobCol, 'NLS_SORT = French') > nlssort('S', 'NLS_SORT = French') cnvcnv
Regular Expression regexp_like, regexp_replace, regexp_instr, regexp_substr ... WHERE NOT regexp_like(clobCol, '%pattern%') YesYes
Conversionchartorowid chartorowid(clobCol)cnvcnv
hextorawhextoraw(clobCol)Nocnv
to_dateto_date(clobCol)cnvcnv
to_numberto_number(clobCol)cnvcnv
to_timestampto_timestamp(clobCol)Nocnv
to_char, to_ncharto_char(clobCol)YesYes
to_clob, to_nclobto_nchar(varchar2Clob) YesYes

LOBs and JDBC

Data interface for LOBs

The usual way to set LOBs values is using java.sql.Blob and java.sql.Clob interfaces or Oracle's extensions oracle.sql.BLOB and oracle.sql.CLOB.

Connection con = DriverManager
    .getConnection("jdbc:oracle:thin:@localhost:1521:orcl", dbprops);
PreparedStatmenet pstmt =
    con.prepareStatement("INSERT INTO blobtable VALUES (?)");
Blob blob = BLOB.createTemporary(con, true, BLOB.DURATION_SESSION);
OutputStream out = blob.setBinaryStream(0);
out.write(byteValues);
pstmt.setBlob(1, blob);

In Oracle 10g, the setBytes, setBinaryStream, setString, setCharacterStream and setAsciiStream methods of PreparedStatement have been extended to work on BLOB and CLOB columns. According to Oracle's documentations, there is an automatic switching of setBytes to setBinaryStream for data larger than 2000 bytes (not 2KB) and of setString to setCharacterStream for data larger than 32766 characters.

Conversely, the getter methods have been extended to work on LOB columns. The same auto-switching rules apply to the getters as well.

However, there are bugs in the 10.1 driver that prevent some auto-switching to work properly. Even worse, setBinaryStream and setCharacterStream do not work for calling stored procedures passing large size of data. Fortunately, these bugs have been fixed in 10.2, but there are still some bugs related to using national characters (see Oracle & Unicode). Therefore, the safest way to work with LOBs is still using Blob and Clob interfaces.

Passing NULL values

It is good practice to use PreparedStatement.setNull to set a null value to a LOB column, otherwise an ArrayIndexOutOfBoundsException may occur. You can download test 1 and run it to produce the error, or just look at its source. The test uses the below lobtest table and a stored procedure c_lobtest to insert a record into the table.

CREATE TABLE lobtest (id NUMBER, sz INTEGER, text CLOB, binary BLOB);

The test consists in inserting two records into the lobtest table, the first record with a text and no binary data, and the second record with no text and some binary data. Note the use of setBytes and setString on lines 78 and 90 respectively to set null values. The test works fine if only the second record is inserted, or by using data files bigger than 100 KB, or by replacing the stored procedure with a regular SQL INSERT statement.

According to the folks at Oracle, the error comes from the fact that the PL/SQL is invoked the first time with an assignment of a CLOB, which places the SQL in the shared pool with the CLOB parameter, then the second time with an assignment of a String, which is inconsistent with what is now in the shared pool. The workaround is shown below:

// insert text
pstmt.setLong(1, id);
pstmt.setInt(2, size);
pstmt.setCharacterStream(3, new InputStreamReader(bain), size);
pstmt.setNull(4, java.sql.Types.BLOB);
// ...
// insert binary
pstmt.setLong(1, id);
pstmt.setInt(2, size);
pstmt.setNull(3, java.sql.Types.CLOB);
pstmt.setBinaryStream(4, new FileInputStream(fis), size);

LONG Datatypes

LONG and LONG RAW are older datatypes used to store large data. Starting with 8i, Oracle deprecates LONG datatypes and encourages the use of LOBs over them. There are several advantages for doing so:

However, if you work with legacy applications or if you have to access the system tables, then you would probably run into LONG datatypes.

Streaming LONG Datatypes

The methods getBytes and getString of ResultSet can be used to read the content of a LONG or LONG RAW column. However, to conserve memory, you can also read their content as a stream using getInputStream, getAsciiStream and getUnicodeStream.

LONG RAW Data Conversions

A call to getInputStream returns raw data, that is bytes in its raw form.
A call to getAsciiStream converts the raw bytes to hexadecimal and returns the ASCII representation.
A call to getUnicodeStream converts the raw bytes to hexadecimal and return the Unicode characters.

LONG Data Conversions

When getAsciiStream is used, the driver assumes that the underlying data in the database uses an US7ASCII or WE8ISO8859P1 character set. It just returns the bytes as corresponding ASCII characters. If the assumption is false, then the returned data could be meaningless.

When getUnicodeStream is used, the driver returns a stream of Unicode characters in UTF-16 encoding. This applies to all underlying character set that Oracle supports.

When getBinaryStream is used, there are two possible cases.

Sequential Access

If a query fetches multiple columns and one of which is a LONG or LONG RAW column, then the columns must be read sequentially. Any attempt to read a column beyond the LONG or LONG RAW column closes the latter. Reading the LONG or LONG RAW column afterwards (in Java) will result in a "java.sql.SQLException: Stream has already been closed" error as shown in the below example.

// supposing we have the following query
RestulSet rs = stmt.executeQuery("SELECT col1, longCol, col3 FROM atable");

// reading in this order is ok
rs.getString( "col1" );
rs.getString( "longCol" );
rs.getString( "col3" );

// reading "col3" before "longCol" results in
// "java.sql.SQLException: Stream has already been closed"
rs.getString( "col1" );
rs.getString( "col3" );
rs.getString( "longCol" );

In the above example, if col3 must be read before longCol (for instance, the processing of longCol depends on the value of col3), then the order of the columns in the query can simply be rearranged so that col3 would come first.

Oracle & Unicode

Oracle supports storing of characters in different encoding than ASCII. This is commonly known as National Language Support (NLS) or Globalization Support.

NLS Parameters

A set of NLS parameters specifies, among others, the character set to be used by Oracle (i.e. the encoding used for storing characters in the database). The following SQL command displays all NLS parameters:

SQL> SELECT * FROM nls_database_parametes;

The ones that are of interest for us, as far as unicode support, are:

A unicode encoding can be specified as database character set if all applications on the database support unicode. Otherwise using the national-language columns will allow using unicode along side with existing applications that only support ASCII. It also allows to slowly convert existing application to using unicode.

Setting Character Set

In Oracle 10g, the default values for NLS_CHARACTERSET is an 8-bit West European encoding (the exact encoding depends on the platform Oracle server is installed on) and the default value for NLS_NCHAR_CHARACTERSET is AL16UTF16, which is a UTF-16 encoding. Follow the below steps to change their values:

Log into the database using SQL*Plus with sysdba privilege:

% sqlplus sys/<password>@<oracle_sid> as sysdba

Run the database in restricted mode in order to alter it:

SQL> shutdown immediate    -- or normal
SQL> startup mount
SQL> alter system enable restricted session;
SQL> alter system set job_queue_processes=0;
SQL> alter database open;
SQL> alter database national character set UTF8;
SQL> shutdown immediate    -- or normal
SQL> startup

Altering the database national character set (NLS_NCHAR_CHARACTERSET) is only allowed when there is no table in the database defined with a national-language column. Some default schema in Oracle already have tables with these columns, but they could be dropped as follow:

SQL> DROP USER oe CASCADE;
SQL> DROP USER pm CASCADE;

I18n and JDBC

The Oracle JDBC driver (ojdbc14.jar) only contains US7ASCII, WE8DEC, WE8ISO8859P1, UTF8 and AL16UTF16 character sets. If other character sets were to be used, then orai18n.jar must be included as well.

By default, Oracle's JDBC treats all character columns as CHAR. In order to insert Java strings into NCHAR, NVARCHAR2 and NCLOB columns, OraclePreparedStatement.setFormOfUse has to be invoked on each national-language column as in the below code example:

PreparedStatement pstmt = con.prepareStatement("INSERT INTO i18n VALUES (?,?,?)");
pstmt.setLong(1, id); // number column
pstmt.setString(2, "name"); // VARCHAR2 column
((OraclePreparedStatement)pstmt)
    .setFormOfUse(3, OraclePreparedStatement.FORM_NCHAR);
pstmt.setString(3, "some unicode string"); // NVARCHAR2 column

Alternately, if the system property oracle.jdbc.defaultNChar or the connection property defaultNChar is set to true, then JDBC treats all character columns as national-language, thus there is no need to use setFormOfUse on each national-language column.

However, if defaultNChar is set to true, the database will implicitly convert all CHAR data into NCHAR. This conversion has a substantial performance impact. In which case, setFormOfUse(OraclePreparedStatement.FORM_CHAR) could be used to tell the database not to convert CHAR datatypes.

Issues with defaultNChar

It appears that an "ORA-01460: unimplemented or unreasonable conversion requested" error will occur when using setCharacterStream and setString for passing strings (both ASCII and unicode) of length between 2K and 16K as national-language parameters to a stored procedure. This error occurs when using thin JDBC 10.x against 10.x and 9.x databases. It does not occur when using JDBC 9.2.0.6 against 9.2.0.6 database.

If the parameter is an NCLOB, then setClob may be used as a workaround. However, I don't know of a workaround when the parameter is an NVARCHAR2.

Test 2 is a set of two tests that illustrate the issue.

Unicode and LIKE ESCAPE

Wildcard characters such as '%' and '_' may need to be escaped in a pattern of the LIKE clause. Oracle allows to specify the character used as escape character with the ESCAPE clause. For instance, the following query uses '\' to escape '_', thus looking for all texts that start with "SYS_".

SELECT * FROM atable WHERE textcol LIKE 'SYS\_%' ESCAPE '\';

For national-character columns, Oracle seems to implicitly convert the pattern to the encoding of the column. It seems to do the same with the escape character except when the column is an NVARCHAR2. That is, the above SQL query works fine if textcol is a NCLOB, but an error (ORA-01425: escape character must be character string of length 1) occurs if it is a NVARCHAR2. In that case, the escape character must be converted into the right encoding with to_nchar.

Optimizing NOT IN

Oracle is very slow when a sub-query is used for the NOT IN clause [3]. (This is proper to Oracle, other databases work fine). An optimization of NOT IN consists in converting the query to use NOT EXISTS as shown below.

-- This query uses NOT IN and a sub-query
SELECT name FROM Foo F1
WHERE F1.id NOT IN
    (SELECT id FROM Bar B1 WHERE B1.priority > 9)

-- But it's much faster when using NOT EXISTS
SELECT name FROM Foo F1
WHERE NOT EXISTS
    (SELECT id FROM Bar B1 WHERE B1.priority > 9 AND B1.id = F1.id)

--Also faster by adding NOT NULL
SELECT name FROM Foo F1
WHERE F1.id IS NOT NULL AND F1.id NOT IN
    (SELECT id FROM Bar B1 WHERE B1.id IS NOT NULL AND B1.priority > 9)

References

[1] Oracle Database Online Documentation
Online documentation for Oracle 10g release 2 (requires TechNet login)
[2] Paging in Oracle
Discussion on paging in Oracle on AskTom
[3] Speeding up NOT IN
"Speeding Up Queries with Semi-Joins and Anti-Joins: How Oracle Evaluates EXISTS, NOT EXISTS, IN, and NOT IN", Roger Schrag
Last updated: 2006-04-16 15:44:15 -0700