Performance troubleshooting techniques
Lab128 supports three major performance troubleshooting/tuning techniques:
- Response time-based using active session history (ASH);
- Innovative SQL statistics-based using frequent snapshots from v$sqlstats or v$sql views;
- System-wide tuning
Let's take a look at them in more details.
Response time-based tuning. Active Session History (ASH)
The response time of the database system is the elapsed time between submitting a SQL query and receiving the result. The response time is the most important metric to the end user, as it is a major component of perceived database performance. Response time-based technique focuses on database user actions for which the performance needs to be improved. It then employs Oracle Wait Interface (OWI) and identifies SQL statements that cause the most of the response time. The required actions are based on analysis of the elapsed time components for the selected statements. After these actions are taken, it may be necessary to iterate through this process again until a desired effect is achieved.
Oracle Wait Interface includes v$system_event, v$session_event, v$session_wait, v$event_name views. Oracle 10g extended OWI, introducing v$active_session_history view, emphasizing the importance of OWI-based tuning. More specifically, v$active_session_history is a collection of v$session view snapshots taken at a regular time interval for the active sessions. Lab128 uses a similar collection technique although it is not dependent on the v$active_session_history view, so it is compatible with earlier Oracle versions. Please note that v$active_session_history is a part of the Oracle 10g Diagnostic Pack and requires a purchase of the ODP license. Therefore, Lab128 provides additional savings by implementing ASH functions and not requiring the purchase of ODP.
Having ASH data collected, Lab128 provides sophisticated graphical Activity Explorer, where system-wide wait events are combined with ASH data. Here you can identify which database resources were the biggest bottlenecks during any selected time period, which SQL statement took the most time to execute and see wait time broken down by different events. The text of a SQL statement is displayed as well as statement's other detailed data.
SQL statistics-based tuning
SQL statistics-based tuning is a new approach. To our best knowledge, there is no other tool that has implemented this technique. It is based on frequent snapshots of the v$sql (or v$sqlstats in 10g Release 2) view. The delta between snapshots can be calculated for any period of time. It has been a pleasant surprise to realize how accurate and productive this technique is in reality. This approach is a viable (if not better) alternative to ASH. It really shines in tuning OLTP systems with frequent short queries, where ASH is less effective. SQL statistics can be used along with ASH data to greatly enhance troubleshooting effectiveness. You may have heard that AWR (Automatic Workload Repository) in Oracle 10g has already explored this opportunity by capturing v$sql every hour. The default capture rate in Lab128 is one snapshot every 15 seconds, so there is simply no comparison to the scope of new possibilities and unprecedented accuracy. In Lab128, it is truly possible to track statistics of individual SQL statement over time. Now it is easy to find which SQL was responsible for intermittent CPU spikes, disk reads, etc - something that is not possible with AWR.
There are many challenges in collecting SQL statistics: the sheer amount of data,
the transient nature of SQL statements that can go away and reappear
in the SQL area, and the interpretation and presentation of collected data.
When we contacted Oracle Corporation asking if they plan to incorporate
this approach in the database, they answered that this would be prohibitively
expensive. Well, it was difficult but not impossible. Heavy use of compression
and adaptive algorithms made this a reality. Now you can take advantage of
this pioneering work and employ this technique. Because of its effectiveness,
it could be one of the most important Oracle troubleshooting techniques
in the future.
(Also see related article: "Oracle Tuning Using Frequent SQL Area Statistics Snapshots")
System-wide tuning is based on the analysis of Oracle performance statistics. This rather classical approach may not be efficient when a clear disparity exists between the contributions of different SQL statements to the degraded performance. Meanwhile, for some situations when different SQL statements show uniform contribution to the response time, or when performance problems are experienced across many areas of user interaction, the real cause can be of a system-wide nature (such as an undersized system, for example). System-wide statistics will expose the problem, providing data for required actions.
Oracle performance data collection with uncompromised entirety
Oracle tuning and monitoring is based on v$ views that provide access to database performance data. Since most of the statistics in v$ views return cumulative values, it is not truly possible to say what state the database is in when only one snapshot is taken. Measuring the change of values between snapshots gives information about the activity that happened between these snapshots. Measurements taken more frequently provide greater precision about the time when events occur, i.e. the time series with higher a sampling rate will have higher time resolution. It would be natural to assume that a higher sampling rate is better; in reality, each measurement may carry a cost to the measured system. This is particularly true for Oracle, where each request to v$ views partly shares an execution path with a regular query.
Another consideration is the number and variety of statistics to capture. The traditional approach is to focus on the most important ones, such as those in v$sysstat, v$system_event, etc. Out of this large selection, which statistics should be collected? Every manual on this topic would have its own list of the most important statistics depending on the nature and scope of the problem. What if the nature or scope of the problem is unknown? Why not collect all the available statistics?
These two factors - what to collect and how often - have been the subject of compromise for many monitoring products. The approach used in Lab128 is a unique one: it captures all statistics originated from v$sysstat, v$system_event, v$filestat, v$rollstat, v$latch, and many more. That's more than 1,000 statistics to keep track of, not to mention that some statistics may have multiple values (like those in v$system_event, v$latch, v$filestat, etc). Another unique feature of Lab128 is its high sampling rate of 6-12 seconds between measurements. Effectively, sampling rate is limited by the cost of executing queries against v$ views. The rate of 6-12 seconds is optimal, having low overhead for the measured system. In order to keep the impact of data collection to a minimum, Lab128 can automatically lower the sampling rate.
Lab128 stores collected data compressed in its own in-memory database. For relatively busy database it is able to store hours and even days of data using small amount of memory, for 9 hours it needs 15-20 MB of RAM. This would not be possible without extremely effective compression: Lab128 uses a proprietary lossless compression algorithm that provides both a high compression ratio and instant random access to the stored data.
Sophisticated, real-time monitoring with an intuitive user interface
While it's important to know how an Oracle instance performed in the past, it is very important to see how and what it is doing now. Lab128 keeps the latest state of the Oracle instance internally, intelligently refreshing only those areas being viewed by the user. Using this technique, Lab128 offers extensive picture at low cost to the Oracle instance being monitored. As the user shifts attention into new area of interest, more detailed information is immediately available. The development team of Lab128 believes that the usability and convenience of an application takes the same priority as its sophistication. If something is not convenient, it will not be used. The user interface broadly uses hot links or drill-down technique to make even the most intricate details only few clicks away. That gives an overstressed DBA a tremendous advantage in overseeing internal processes, tracking down problems, or simply learning and understanding Oracle functioning. Some certain features of Lab128 real-time monitoring, such as "Block Buffer Explorer" and others, have never been explored in other Oracle monitoring tools.
Easy installation and maintenance.
To install, just copy the Lab128 executable and run it. That's all! Well, to tell the truth, you also need a Help file unless you are already familiar with the workings of Lab128. The program does not use the Windows registry, as all settings are stored in text files in the directory where Lab128 was started. It is advisable to create a dedicated directory, for example d:\Program Files\Lab128, and keep files there. This directory should be writeable to allow for the saving of user preferences. To uninstall Lab128, simply delete the Lab128 directory.
State of the art implementation
There are two components to the performance of any monitoring tool: the impact on the system being monitored, and the speed of the user interface.
The philosophy behind the development of Lab128 was to present the lowest possible load to the Oracle instance. In order to achieve this goal, Lab128 exploits many techniques. For example, instead of running queries against two or more Oracle views with joins between them, it would be better to run simpler queries against each of the views and make the joins locally in the application. Similarly, anything accessed by Lab128 is kept in the local cache to avoid unnecessary query in the future. To conserve resources on the Oracle side, Lab128 never uses Oracle to store any data (such as segment extents data stored by Quest's monitoring tool) that will be used locally in the application and never asks Oracle to perform processing (such as binary search implemented in PL/SQL code in the aforementioned tool) if it could be done locally. For example, Lab128's Block Buffer Explorer takes its feed from v$bh view, but all rollup aggregation is done locally in the application. This is true for many other processes.
When it comes to usability, the speed of the user interface is a big concern. There are some great tools written in Java but the sloppiness of their user interface makes them undesirable for everyday use. Lab128 is written in C++ and uses efficient data processing algorithms that scale well to the large amounts of data being stored and processed. It has an efficient processing engine with a small footprint, and employs multi-threading to keep all background tasks from interfering with user interface code. As a result, Lab128 runs well even on old computers. As it consumes very few resources, it can be kept running on workstations without any impact on other applications.
What Lab128 is not
You can find many uses of Lab128. It is an excellent monitoring tool but it lacks the feature of sending alerts to the user, as of yet. Lab128 assumes a lot of user interaction, as it is more about advanced tuning and exploration of thousands of statistics. This tool was not intended for administering Oracle, although you can use the SQL Central editor with the extendable query repository for doing database administrative work. Finally, this tool does not give you advisory or problem fix solutions; it is all about collecting facts. We respect your experience and knowledge of your particular database application. We believe that making your own decisions based on collected facts is the best way to maintain top performance of the database.