Tuesday, January 13, 2009

Reading Statspack


In Oracle, Performance Tuning is based on the following formula:

Response Time = Service Time + Wait Time

Where

  • Service Time is time spent on the CPU
  • Wait Time is the sum of time spent on Wait Events i.e. non-idle time spent waiting for an event to complete or for a resource to become available.

Service Time is comprised of time spent on the CPU for Parsing, Recursive CPU usage (for PLSQL and recursive SQL) and CPU used for execution of SQL statements (CPU Other).

Service Time = CPU Parse + CPU Recursive + CPU Other

The above components of Service Time can be found from the following statistics:

· Service Time from CPU used by this session

· CPU Parse from parse time cpu

· CPU Recursive from recursive cpu usage

From these, CPU Other can be calculated as follows:

CPU other = CPU used by this session - parse time CPU - recursive CPU usage

Many performance-tuning tools (including Statspack) produce a list of the top wait events. For example, Statspack’s report contains the "Top 5 Wait Events" section.(Pre-Oracle9i Release 2).

It is a common mistake to start dealing with Wait Events first and not taking in consideration the corresponding response time. So always compare the time consumed by the top wait events to the 'CPU used by this session' and identify the biggest consumers.

Here is an example where CPU Other was found to be a significant component of total Response Time even though the report shows direct path read as top wait event:

Top 5 Wait Events

Events

Waits

Wait Time(cs)

% Total Wt Time

direct path read

4232

10827

52.01

db file scattered read

6105

6264

30.09

direct path write

1992

3268

15.70

control file parallel write

893

198

.95

db file parallel write

40

131

.63

Statistic

Total

Per Second

Per Trans

CPU used by this session

358806

130.5

12372.6

parse time cpu

38

0.0

1.3

recursive cpu usage

186636

67.9

6435.7

From these figures we can obtain:

· Wait Time = 10,827 x 100% / 52,01% = 20,817 cs

· Service Time = 358,806 cs

· Response Time = 358,806 + 20,817 = 379,623 cs

· CPU Other = 358,806 - 38 - 186,636 = 172,132 cs

If we now calculate percentages for the top Response Time components:

· CPU Other = 45.34%

· CPU Recursive = 49.16%

· direct path read = 2.85%

· etc. etc.

So we can see the I/O-related Wait Events actually are not a significant component of the overall Response Time. For us it makes sense concentrate our tuning effort on the service time component.

CPU Other is a significant component of Response Time, so a possible next step is to look at the CPU intensive SQL and not at direct path read wait event.

Starting with Oracle9i Release 2, Statspack presents Service Time (obtained from the statistic CPU used by this session) together with the top Wait Events in a section called Top 5 Timed Events, which replaces the section Top 5 Wait Events of previous releases.

Here is an example:

Top 5 Timed Events

Events

Waits

Time(s)

% Total Ela Time

library cache lock

141

424

76.52

db file scattered read

3367

96

17.4

CPU time

32

5.79

db file sequential read

161

1

.18

control file parallel write

40

0

.05

Statistic

Total

Per Second

Per Trans

CPU used by this session

3211

4.3

1605.5

parse time cpu

59

0.1

29.5

recursive cpu usage

232

0.3

116.0

These figures give us directly the percentages of the Wait Events against the total Response Time so no further calculations are necessary to assess the impact of Wait Events. Service Time is presented as CPU time in this section and corresponds to the total CPU utilisation. We can drill down to the various components of Service Time as follows:

· CPU Other = 3,211 - 59 - 232 = 2,920 cs

· CPU Other = 2,920 / 3,211 x 5.79% = 5.26%

· CPU Parse = 59 / 3,211 x 5.79% = 0.11%

· CPU Recursive = 232 / 3,211 x 5.79% = 0.42%

In this example, the main performance problem was an issue related to the Library Cache.
The second most important time consumer was waiting for physical I/O due to multiblock reads (db file scattered read).

Identifying problematic SQL’s from Statspack

From the above calculations you will get the significant components which caused the performance problem. Based on this components lets decide on the various Statspack section to identify the problematic SQL’s.

  • Other CPU

If this shows CPU other as being significant the next step will be to look at the SQL performing most block accesses in the SQL by Gets section of the Statspack report. A better execution plan for this statement resulting in fewer Gets/Exec will reduce its CPU consumption.

  • CPU Parse

If CPU Parse time is a significant component of Response Time, it can be because cursors are repeatedly opened and closed every time they are executed instead of being opened once, kept open for multiple executions and only closed when they are no longer required. The SQL ordered by Parse Calls can help find such cursors.

  • Disk I/O related waits.

Identifying SQL statements responsible for most physical reads from the Statspack section SQL ordered by Reads has similar concepts as for SQL ordered by Gets.
% Total can be used to evaluate the impact of each statement. Reads per Exec together with Executions can be used as a hint of whether the statement has a suboptimal execution plan causing many physical reads or if it is there simply because it is executed often. Possible reasons for high Reads per Exec are use of unselective indexes require large numbers of blocks to be fetched where such blocks are not cached well in the buffer cache, index fragmentation, large Clustering Factor in index etc
.

  • Latch related waits.

Statspack has 2 sections to help find such unsharable statements, SQL ordered by Sharable Memory and SQL ordered by Version Count. This can help with Shared Pool and Library Cache/Shared Pool latch tuning. Statements with many versions (multiple child cursors with the same parent cursor i.e. identical SQL text but different properties such as owning schema of objects, optimizer session settings, types & lengths of bind variables etc.) are unsharable. This means they can consume excessive memory resources in the Shared Pool and cause performance problems related to parsing e.g. Library Cache and Shared Pool latch contention or lookup time e.g. Library Cache latch contention.

No comments:

Post a Comment