Constantly watching CPU utilization is critical to a database administrator's understanding of their database system. The sar utility, while seemingly small and insignificant can and should be used to help initially determine if any high-level CPU resource usage is out of balance, potentially signaling a database performance tuning problem.
Being an effective database
administrator means that we must at times be able to use system level utilities
to find things like CPU, memory, and I/O bottlenecks/issues to help point us in
the proper direction.
So often, I see database administrators diving straight into
the database to determine performance problems, forgetting about the operating
system and the wealth of information it contains. Being an effective DBA means
that we must at times be able to use system level utilities to find things like
CPU, memory, and I/O bottlenecks/issues to help point us in the proper
direction. These three high-level resources are the main limiting factors to
database performance that cannot be altered; we have only so many CPU cycles, so
much memory, and a specific disk configurations. Without properly
understanding how they are currently behaving, before looking inside the
database, will almost always gets us in trouble. This article looks at
how we as DBAs might monitor CPU utilization from the operating system and when
it might signal issues we need to look into.
The sar utility is part of the sysstat package and will
collect, store in logs, and report on a wide variety of system activities. Of
all the system level commands that you can use, the sar utility has multiple
switches/options that allow the DBA to quickly and easily monitor those
resources (CPU, context switches, interrupts, paging, memory usage, buffer
usage, network usage) that can give us quick insight into the health of our
database systems. What makes sar very nice to use is that sar constantly
collects and logs system activity. To see this, and get some rudimentary, but
powerful, commands out of the way, the following examples help show how to call
sar for days and time intervals; remembering that sar maintains daily logs for
each day in a file such as /var/log/sa/sa<dd>, where <dd> is the
day of the month:
Use sar to
look at today’s collected activity:
[oracle@wagner ~]$ sar
Use sar to
look at today’s collected activity; reporting every 10 seconds for 15 times
[oracle@wagner ~]$ sar 10 15
Use sar to
look at the daily activity on the 13th:
[oracle@wagner ~]$ sar -f /var/log/sa/sa13
Use sar to
look at today’s collected activity between 6:00am and 7:00am
[oracle@wagner ~]$ sar -s 06:00:00 -e 07:00:00
Use sar to
look at the daily activity on the 13th and between 6:00am and 7:00am
[oracle@wagner ~]$ sar -s 06:00:00 -e 07:00:00 -f /var/log/sa/sa13
For the
above examples, the following options were used:
-f : is the filename
to be used in the report
-s : is the
starting period of the report
-e : is the
ending period of the report
Now that we have some of the preliminaries out of the way,
finding and looking at report data with sar to help us tune a database really
is all about looking at that system level resource and then diving into the
specific database engine, where possible, to see what might be consuming that
resource. Granted, there may be ill-use of the resource that doesn’t
necessarily show up at this high level but what looking at the resource from
the system level does for us is give us a good picture as to whether the
resource is limiting our database from performing optimally.
Of the most basic options to the sar command, the –u option
lets us report on CPU utilization. Combining the above examples to look at the
CPU utilization over a given time period is critical for zeroing in on when a
system, at least from a CPU perspective, is having issues, might be overloaded,
or, under better circumstances, will allow you to schedule additional workload
through the system at low periods. A sample output, on a VERY idle system,
might look like the following:
[oracle@wagner ~]$ sar -u -s 15:00:00 -e 16:00:00 -f /var/log/sa/sa13
Linux 2.6.18-194.3.1.el5 (wagner) 07/13/2010
03:00:01 PM CPU %user %nice %system %iowait %steal %idle
03:10:01 PM all 1.04 0.00 3.97 0.05 0.00 94.94
03:20:01 PM all 0.77 0.00 2.95 0.00 0.00 96.27
03:30:01 PM all 0.72 0.11 3.29 0.01 0.00 95.88
03:40:01 PM all 0.73 0.00 2.91 0.01 0.00 96.36
03:50:01 PM all 0.71 0.00 2.89 0.01 0.00 96.39
Average: all 0.79 0.02 3.20 0.02 0.00 95.97
Obviously, this system is very idle from a CPU perspective
during this time period.
Often times we are asked how the system is currently
performing and issuing the sar command to take a look at the CPU could easily
produce the following output; showing a clear indication that our CPU is
overtaxed.
[oracle@wagner ~]$ sar -u 5 30
Linux 2.6.18-194.3.1.el5 (wagner) 07/13/2010
08:31:07 PM CPU %user %nice %system %iowait %steal %idle
08:31:12 PM all 0.40 0.00 1.40 0.00 0.00 98.20
08:31:17 PM all 10.58 0.00 31.54 1.80 0.00 56.09
08:31:22 PM all 11.98 0.00 88.02 0.00 0.00 0.00
08:31:27 PM all 13.80 0.00 86.20 0.00 0.00 0.00
08:31:32 PM all 12.35 0.00 87.65 0.00 0.00 0.00
08:31:37 PM all 11.38 0.00 88.62 0.00 0.00 0.00
From this point, it is a good idea to ask ourselves what is
obviously happening on the system that would cause the system to go from 98.20 %idle
to 0.00 %idle. From the above, it should be noted that we are also not waiting
for any I/O as the %iowait is basically 0.00. Actually, the issue here is
either the CPU running in system mode or the CPU running in user mode. The key
here is to look at the processes that are consuming CPU resources and determine
if there is anything we can do about it. Issuing a ps command such as the
following, or using the top command, can quickly get us to the processes on the
system that are consuming CPU resources.
[root@wagner ~]# ps -e -o pcpu,pid,user,args | sort -r | head
%CPU PID USER COMMAND
8.1 6235 oracle sqlplus as sysdba
1.9 2862 root /usr/bin/Xorg :0 -br -audit 0 -auth /var/gdm/:0.Xauth -nolisten tcp vt7
0.8 3325 oracle ora_vktm_vm11r2
0.3 6260 root top
0.3 3226 root gnome-terminal
0.1 6236 oracle oraclevm11r2 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
0.1 3345 oracle ora_ckpt_vm11r2
0.1 3341 oracle ora_dbw0_vm11r2
0.1 3337 oracle ora_dia0_vm11r2
Clearly, the issue here is that someone has used SQLPlus to
log into the database and issue some commands that are consuming a majority of
the CPU resources. From here, we leave the operating system, log into the
database (Oracle), and begin to investigate the appropriate objects to extract
what this particular user is doing. Here is a quick SQL that can get you to
that process, user, and SQL:
SQL> select sesion.process,
sesion.username,
sesion.osuser,
sqlarea.sql_text
from v$sqlarea sqlarea, v$session sesion
where sesion.sql_hash_value = sqlarea.hash_value
and sesion.sql_address = sqlarea.address
and sesion.process = 6235;
PROCESS USERNAME OSUSER SQL_TEXT
-------- -------- -------- --------------------------------------------------
6235 SYS oracle select a.*, b.* from dba_tables a, dba_tab_columns b
Obviously, this DBA doesn’t know what he/she is doing as
they are producing a Cartesian product and really consuming CPU resources to
get the result set. Maybe it’s time to talk to them and see exactly what they
are trying to accomplish and then kill the process, rewrite the SQL, and
hopefully go on to the next bottleneck. Not shown in the query, but very
valuable, in the V$SQLAREA view there are two columns, CPU_TIME and
ELAPSED_TIME that provide the microseconds of CPU time and elapsed time used by
the SQL respectively. Together, with the assumed problem noticed by using sar
should help you pinpoint CPU resource intensive processes. I say assumed here
because we don’t want to spend our time messing with too many SQL statements
that spike the CPU occasionally when they may only consume 5-10 microseconds of
the CPU total. Now if these SQL statements are executed continually throughout
the day then that would be another story and research is warranted.
Other things to think about / look at when using the sar command
would be, as is the case with my system, is a low %idle (CPU totally being
consumed) would/might indicate an under-powered CPU, not just the poorly
written SQL statement. Always look at the %wio (percent of waiting on I/O) as
this would/could indicate a SQL statement(s) doing too may I/Os and an
opportunity to tune the SQL, add/remove indexes, or modify applications.
Because CPU utilization is one of the key indicators on the
system that signals good or bad performance, constantly watching it for peaks
and valleys is critical to a DBA’s understanding of their database system. The
sar utility, while seemingly small and insignificant can and should be used to
help initially determine if any high-level CPU resource usage is out of balance,
potentially pointing to database processing that requires tuning.
Additional Resources
Oracle Database 11gR2 I/O Performance Tuning: A Primer
Oracle Database 11g: SQL Performance Analyzer, Part 1
Oracle Database 11g: SQL Performance Analyzer, Part 2
Oracle Database 11g: SQL Performance Analyzer, Part 3
Back to DBAsupport.com