All These Oracle SQL Statement IDsBy Sergei Romanenko
Edited by Gary Gordhamer
Submitted to SELECT Journal in January, 2013
Published by SELECT Journal in vol. 20, Number 2, 2013
Different IDs have been used in Oracle database to refer a particular SQL statement. How many different IDs do you know for a SQL statement? Look into the data provided by the v$sql view: HASH_VALUE, ADDRESS, CHILD_ADDRESS, SQL_ID, EXACT_MATCHING_SIGNATURE, FORCE_MATCHING_SIGNATURE... It may be difficult to know if one ID is the same, will another ID be same as well? Does the same ID across databases always mean the same SQL statement? If one set of IDs are observed in development will the IDs be the same in production? Will it change if database parameters are changed? This article explains the different SQL IDs.
SQL ADDRESS, CHILD_ADDRESS, CHILD_NUMBER
Before Oracle 10g, SQL ADDRESS and SQL HASH were the only identifiers used to refer to a SQL statement. For example, to find out which SQL statement is executed by the active session, the row in v$session view would contain these two corresponding columns. These two attributes are still there, including Oracle 11g. According to the Oracle documentation, ADDRESS means: "Address of the handle to the parent for this cursor." This is not very descriptive and my understanding is that (at least in the latest versions) it is not the address, such as the pointer in C language, simply because it points outside of the accessible space of the oracle process. Probably this value is a handle itself, which is used to locate the structures in the shared pool. Anyway, this value is of little use other than just something that can be joined to other views to find related data.
For every SQL statement in Oracle there is a parent and child cursors. Very briefly, there can be several versions of the same SQL statement. For example, if current schema is SCOTT, then "select * from EMP" is accessing SCOTT.EMP table. But if current schema is not SCOTT, this will be a different table and another parsed version of the same SQL statement will be stored in shared pool. Generally, it is the best, when only one child cursor (one version) exists for every SQL statement. Multiple child cursors mean that SQL statement is not shared, wasting shared pool. This can indicate the database application design flaw (and sometimes Oracle bug) and should be investigated using v$sql_shared_cursor view.
Using this query
select address, child_address, child_number from v$sql order by address, child_number;
We get the result in Figure 1 showing three child cursors having the same parent cursor, circled in red.
The CHILD_ADDRESS is similar to ADDRESS identifying the child cursor. All children of the parent are numbered in CHILD_NUMBER column, which normally starts with 0. Note, that one ADDRESS (Parent cursor) has one or more CHILD_ADDRESS (child cursors).
Also note, that at any given moment, both CHILD_ADDRESS and (ADDRESS, CHILD_NUMBER) are unique in v$sql, which has a record for every single child cursor cached in the shared pool. Over time, the same CHILD_ADDRESS may not necessarily mean same cursor, as the aging algorithm can flush the cursor out of the shared pool. Same is true for the ADDRESS. As result of discarding cursors, CHILD_NUMBER over time can become non-sequential, as can be seen in the picture above: all child cursors 0 - 68 expired from cache.
The parent cursor contains data and attributes common to all child cursors, such as SQL text. Is it possible for two parent cursors with different ADDRESS to have same SQL text? The answer depends on Oracle version and if the cursor is obsolete or not. For versions before 126.96.36.199 the answer is always "no". For 188.8.131.52 and higher the answer is "yes". But when limiting the scope to IS_OBSOLETE='N', non-obsolete cursors, the answer will be "no" in all versions. Therefore there is one-to-one relationship between SQL Text and ADDRESS for non-obsolete parent cursors.
The SQL text is the most important attribute which drives many things in Oracle, therefore many IDs are based on SQL text. Let's see in the next section how it is done.
MD5 Hash Values Derived From SQL Text
Oracle is using standard MD5 hash function for calculating SQL_ID, HASH_VALUE, OLD_HASH_VALUE, and SIGNATURE. The MD5 takes SQL text (including terminating 0-byte) as a parameter and returns 128-bit value. This value can be represented as an array of 16 1-byte integer values, or array of 2 64-bit integer values.
Let's demonstrate this using a very simple SQL statement: "select 0 from dual". First, we will create a text file containing this SQL text and 0-byte:
echo -en 'select 0 from dual\x00'>a.txt
Then execute 'md5sum' on this text:
md5sum a.txt 77922cbb2a50dfb5d7ce56a3566656f3 *a.txt
The output of 'md5sum' is 128-bit value. It is a sequence of 16 bytes presented in hexadecimal format. For example, the first byte is 0x77, second 0x92 etc. Later we will see how this MD5 hash value is used to form several IDs.
Let's get HASH_VALUE for the SQL statement mentioned above. First we execute the statement, then query HASH_VALUE column in v$sql view. The HASH_VALUE is an integer number; the following query presents it in decimal and hexadecimal formats.
First lets populate the SQL cache with our statement:
select 0 from dual;
Now lets find our ADDRESS and HASH_VALUE with results shows in Figure 2.
select ADDRESS, HASH_VALUE, to_char(HASH_VALUE,'FMXXXXXXXX') HASH_VALUE_HEX from v$sql where sql_text='select 0 from dual';
Comparing it to MD5 value, we can see that HASH_VALUE is the last 4 bytes of MD5 in reverse order: '566656f3'. This is because an integer number in hexadecimal notation shows the most significant bytes on the left, while the MD5 value has the most significant bytes on the right.
Now we can conclude that there is one-to-one relationship between SQL text and SQL HASH, assuming that no hash collisions happens. Although these collisions are possible, for practical purposes this one-to-one relationship works quite well.
For the sake of completeness, the OLD_HASH_VALUE column should be mentioned. Before Oracle 10g, SQL text hash value was calculated using proprietary algorithm. In 10g+ versions, this value is reported in OLD_HASH_VALUE and can be used to compare performance of the SQL statements before and after the upgrade. As this value is not used for other purposes, it can be ignored. Probably it will be dropped completely in future versions.
Starting with 10g, the SQL_ID column was introduced. Since then it was used almost in every v$ view dealing with SQL statements. The last 8 bytes of MD5 value are used to form SQL_ID. From the example above, these will be: d7ce56a3 566656f3. Let's reverse the order of bytes in each 4-byte sequences and form two 32-bit integer numbers: a356ced7 f3566656. Together these two sequences form a 64-bit integer number: a356ced7f3566656. This number in the binary format will look as following:
0xA356CED7F3566656 = 1010001101010110110011101101011111110011010101100110011001010110
Let's divide the 64-bit value into groups of 5-bit each starting from least significant bits (from right to left): there will be 12 groups 5-bit each, and the last group will have remaining 4-bit. Each 5-bit group can be represented by a 5-bit integer number in the 0-31 range:
1010 00110 10101 10110 01110 11010 11111 11001 10101 01100 11001 10010 10110 10 6 21 22 14 26 31 25 21 12 25 18 22
Finally, we are going to encode each 5-bit number into a single character using this lookup table:
Here is a final result, 13-character string, used in Oracle as SQL_ID:
10 6 21 22 14 26 31 25 21 12 25 18 22 a 6 p q f u z t p c t k q SQL_ID = 'a6pqfuztpctkq'
Let's check that with the following SQL and results in Figure 3.
select ADDRESS, SQL_ID, HASH_VALUE, to_char(HASH_VALUE,'FMXXXXXXXX') HASH_VALUE_HEX from v$sql where sql_text='select 0 from dual';
Let's make few practical conclusions. The possibility of SQL_ID collisions is significantly lower (about 4 billion times lower) than for HASH_VALUE. That means much stronger one-to-one relationship between SQL text and SQL_ID. Therefore, the same SQL_ID means the same SQL text. It no longer matters if they were on different databases, different versions, or different platforms. In this case, same text means that both texts are byte-to-byte identical, including white-space characters.
Similar to SQL_ID, SQL Signatures derive from MD5 hash value of the SQL Text. Unlike SQL_ID, the SQL Text is first converted into normal (standard) SQL form. There are deterministic rules used to normalize SQL text, such as removing extra white spaces, converting character case of object names etc. For the SQL statement discussed above, the normal form is 'SELECT 0 FROM DUAL'. Let's calculate MD5 value on of this text. Note, this time we are not including terminating 0 byte:
echo -en 'SELECT 0 FROM DUAL' | md5sum 7c45b1d3a5efa2f41cd0c69e6969eeb3 *-
Taking last 8 bytes (1cd0c69e 6969eeb3) and reversing the order of bytes, as done for SQL_ID, we get 64-bit integer: 9ec6d01c b3ee6969
0x9EC6D01CB3EE6969 = 11441060725077731689
Let's query the signature reported for the SQL statement in v$sql (10g R2 and later):
select SQL_ID, EXACT_MATCHING_SIGNATURE, to_char(EXACT_MATCHING_SIGNATURE,'FMXXXXXXXXXXXXXXXX') EXACT_SIGNATURE_HEX, FORCE_MATCHING_SIGNATURE, to_char(FORCE_MATCHING_SIGNATURE,'FMXXXXXXXXXXXXXXXX') FORCE_SIGNATURE_HEX from v$sql where sql_text='select 0 from dual';
As shown in Figure 4, the EXACT_MATCHING_SIGNATURE column shows same number.
We can conclude that SQL Signature is tolerant to some variations of SQL, such as extra white spaces or case of some elements of SQL text. Therefore several statements with different text, and different corresponding SQL_ID, can have same signature. Technically, there is one-to-many relationship between Exact Matching Signature and SQL Text. But, in fact, all corresponding SQL Texts are variations of the essentially same SQL Statement.
SQL Force Matching Signature
You may notice the existence of two signature values, which happened to be different for our simple SQL statement. Until now we ignored distinction between them. As we already know, Oracle SQL engine normalizes text, and the Exact Matching Signature is calculated on the normal text. Optionally, SQL engine can go even further and recognize that '0' in the query represents a number and can be replaced with the bind variable. The resulting statement will be more generic and will represent semantically different statements, such as 'select 1 from dual' or similar ones. Now the parsed and optimized SQL statement can be shared between more SQL statements. This type of optimization is controlled by cursor_sharing instance parameter. By default, this parameter set to EXACT value and cursor sharing is based on EXACT_MATCHING_SIGNATURE. To enable this extra optimization, cursor_sharing parameter must be changed to FORCE value, and then the FORCE_MATCHING_SIGNITURE will be used.
It is logical to expect that Oracle SQL engine would substitute '0' in the query with the bind variable, and then calculate the signature value. Indeed, Oracle is going to modify SQL text to: 'SELECT :"SYS_B_0" FROM DUAL'. To confirm, let's calculate MD5 on this text:
echo -en 'SELECT :"SYS_B_0" FROM DUAL' | md5sum 92b295a98642a44092f98992865a0d52 *-
Taking last 8 bytes (92f98992 865a0d52) and reversing (9289f992 520d5a86) the order of bytes, as done for SQL_ID, we get 64-bit integer:
0x9289F992520D5A86 = 10559245208183986822
This is same value reported in FORCE_MATCHING_SIGNATURE column as in Figure 4.
The 'SELECT :"SYS_B_0" FROM DUAL' statement is more generic than 'SELECT 0 FROM DUAL' because it will be used for any of the following queries:
select 1 from dual; select 1.2 from dual; select 'a' from dual;
All these statements will be hashed to the same FORCE_MATCHING_SIGNATURE value, despite different data type (NUMBER and VARCHAR2) of the constant. On the other hand, these statements have different EXACT_MATCHING_SIGNATURE values. Therefore, the relationship is as shown in Figure5.
If EXACT_MATCHING_SIGNATURE and FORCE_MATCHING_SIGNATURE values are equal, this indicates that SQL statement doesn't have literals or constants that potentially can be replaced by bind variables. If they are not equal, then the literals in these SQL statements can be replaced with bind variables, potentially decreasing the number of parent cursors and improving cursor sharing. This fact can be used in finding the candidates for switching to bind variables:
select FORCE_MATCHING_SIGNATURE, count(*) from v$sql where EXACT_MATCHING_SIGNATURE != FORCE_MATCHING_SIGNATURE group by FORCE_MATCHING_SIGNATURE having count(*) > 10 order by count(*) desc;
Then check SQL statement, corresponding to top-5 or top-10 signatures, and consider re-writing them using bind variables.
It should be mentioned that both types of signature could be calculated using DBMS_SQLTUNE package. All you need is to provide SQL text and force_match flag. For example:
select DBMS_SQLTUNE.SQLTEXT_TO_SIGNATURE('select 0 from dual', force_match=>1) n from dual;
With the advent of 11g, another SQL identifier was introduced - SQL_HANDLE. This identifier is stored in sys.sql$text table and used to refer SQL statements in plan stability handling. New records are added to this dictionary table, when a new SQL plan baseline or SQL profile is created. You can see SQL_HANDLE column in DBA_SQL_PLAN_BASELINES view, and it is often used as a call parameter to DBMS_SPM package.
You cannot specify a value for the SQL_HANDLE, it is generated automatically using SQL statement signature. For example, the SQL statement 'select 0 from dual' the default SQL_HANDLE value will be 'SQL_9ec6d01cb3ee6969'. Recall that EXACT_MATCHING_SIGNATURE for this statement was 11441060725077731689 or 0x9EC6D01CB3EE6969 in hexadecimal notation. So you can easily see how SQL_HANDLE is formed: string 'SQL_'||'hexadecimal value in lower case'.
SQL_HANDLE is just another alias to SQL signature, this one-to-one relationship is instantiated in the sys.sql$text table. Both SIGNATURE and SQL_HANDLE columns have unique indexes; therefore it is not possible to have two records with same SIGNATURE or SQL_HANDLE. It is not clear why Oracle designers introduced SQL_HANDLE which is functionally identical to the SIGNATURE, at least in 11g. The DBMS_SPM package could use SIGNATURE parameter in place of SQL_HANDLE. This redundancy invites a certain level of confusion and, hopefully, this article clarifies the issue. We can make a guess that Oracle wanted to reserve a logical ID which will not be tied up to any particular implementation.
There are many speculative explanations of the relationship between different SQL statements IDs used in Oracle database without giving further explanations. This article exposes mechanics used to form these IDs, eliminating any guesswork. This understanding is very important if not critical for anyone involved in SQL plan stability management because it helps to relate SQL statements referred by numerous identifiers. And I hope that readers curious about variety of SQL IDs will see how they evolved over time reflecting improvements in the Oracle database system. The diagram in Figure 6 summarizes the uncovered relationship between SQL statement IDs.
Note: all examples were done using Oracle Database 184.108.40.206 64-bit on Linux x86.