2003 06 Process Accounting Resolving Cpu Load in the Kernel

KNOW HOW CPU Monitoring
Recognizing and Resolving CPU Load in the Kernel
Fluctuating Processors
Most CPU load monitors, such as top
Throughput
140
and xosview access the Linux kernel s
CPU send
CPU recv
/proc filesystem to try and determine 120
the correct value. However, this inter-
100
face can deliver incorrect values if
80
certain conditions occur.We explain
60
why and the patch described in this
40
article resolves this issue.
20
BY ARNE WIEBALCK,
TIMM M. STEINBECK
0
500 550 600 650 700 750 800 850 900 950
Seconds
AND VOLKER LINDENSTRUTH
Figure 1: Inexplicable fluctuations of processor load and throughput while transferring data across the
wire between two SMP systems
he Department of Technical Com- to the transfer rate. This phenomenon data to memory showed a similar reac-
puter Science at the University of affected both the sender and the receiver tion. The rate at which this program
THeidelberg develops data parsing of the data. copies is also configurable via usleep.
systems for planned large scale applied These fluctuations occurred periodi- Calling the same program on a system
research into elementary particles and cally within a timespace in the region of with a single processor kernel showed a
heavy ions, where Linux clusters can several hundred seconds. Figure 1 shows minimal CPU load (slightly above 0 per
comprise 1000 nodes. an example of the phenomenon. This cent) despite copying 80 Mbyte/sec&
Within this framework one working effect only occurred when the usleep
Where s Harvey?
group has been looking into efficient net- function was called to limit the transfer
work communication mechanisms, to rate, but not if the total network band- This led us to suspect that our program s
avoid processor load during data transfer width was used. processes might have escaped monitor-
as far as possible. The group was partic- A small test program running outside ing by, and might be partly invisible to,
ularly interested in minimal network of the kernel and using memcpy to copy the kernel. To verify this hypothesis we
transfer overhead using unmodified net-
work card drivers. With this aim in
The Scheduler
mind, the authors of this article devel-
Linux allows processes to run more or less (context) to another increases. If the times-
oped a small footprint kernel module
simultaneously on a CPU by dividing the lot is too long, the processes will no longer
capable of transferring data from a pro-
processing time into timeslots.The decision appear to be running simultaneously.
gram across a network without using a as to which process receives which timeslot
When a process terminates or needs to wait
when is made by the scheduler.The sched-
protocol like TCP/IP.
for an external event, it no longer requires
uler applies a scheduling algorithm to select
the remainder of its timeslot, and thus
an executable process from a list and allots
Inexplicable Fluctuations
releases it prematurely, by calling the usleep
the process one timeslots worth of CPU
function, for example.
This module was intended to control the
cycles.
CPU load for multiple data transfer rates.
In this case, an exceptional call to the sched-
A timeslot is normally 10 ms for Linux.The
But our measurements showed an inter- uler occurs and the scheduler passes the
length of the timeslot is important to a sys-
remainder of the timeslot to the next regu-
esting phenomenon on a dual processor
tem s performance: if it is too short,
lar process. Independently of this, the
system: despite constant transfer rates
overhead caused by the scheduler making a
scheduler is again called after the timeslot
system load tended to deviate between 0
decision and switching from one process
has elapsed.
per cent and an upper limit proportional
46 June 2003 www.linux-magazine.com
CPU Load in % resp. Throughput in MB/s
CPU Monitoring KNOW HOW
wrote a program that works and sleeps The structure elements are written to Based on this background knowledge,
alternately in an infinite loop. the kernel/timer.c file by the update_ we can now better understand the
On account of its ability to hide from process_times function, that calls timer behavior shown by Harvey and the other
the kernel s standard process accounting interrupt routines every time a timeslot programs. Dropping the timeslot by call-
functionality, we called the program elapses. Update_process_times checks ing usleep means that there is no active
Harvey (see Listing 1). The infinite which process is currently active and, if process in update_process_times. Thus
loop that starts in line 13, contains it is not the idle process, increments one the routine records the timeslot as
the two functional blocks that perform of the three counters: user, nice, or sys. unused, although in fact it was partly
calculations and sleep. While working At the same time it decides which used.
Harvey repeatedly calls gettimeofday to process to attribute the elapsed timeslot
Phenomenon Observed
ascertain the elapsed time for a block. to. A timeslot is always attributed to an
Previously
When the time specified in RUNTIME active process at the point where it
elapses, Harvey calls usleep( 0 ) to elapses. If there is no active process at A Google search for this problem showed
release the remainder of its timeslot (see this point, the kernel marks the timeslot that this issue is not unknown. Early in
The Scheduler boxout). Figure 2 as unused. 2000 Jan Astalos produced a patch for
shows a screenshot of top, with Harvey
running. Obviously Harvey is not creat-
Listing 1: Harvey Listing 2: CPUmeter
ing any load on the system and thus
#include
#include
demonstrates the same behavior as the
#include
#include
other programs.
#include
As it is improbable that a process that
/* runtime in microseconds */
spends 90 per cent of its runtime in a
#define TIME 1000000
#define RUNTIME 9000
loop will create any noticeable CPU load,
we used the CPUmeter program shown
unsigned long long
int main( int argc, char** argv )
as Listing 2 to measure the load. The
get_iterations( unsigned long
{
program s infinite while loop calls the
t_musec )
unsigned long n=0;
get_iterations function to measure the
{
unsigned long t;
number of loops per second. If the pro-
unsigned long long n = 0;
struct timeval s, e;
gram is launched with lowest possible
struct timeval s, e;
priority, the scheduler assigns it less CPU
unsigned long tdiff;
while ( 1 )
cycles, assuming that at least one other
gettimeofday( &s, NULL );
{
process is running on the system.
while ( 1 )
Thus, the number of iterations per-
{
/* work */
formed per second is a measure of the
n++;
gettimeofday( &s, NULL );
genuine load on the processor: the less
gettimeofday( &e, NULL );
do
iterations, the more load there is. Look-
tdiff = (e.tv_sec-
{
ing at CPUmeter shows that Harvey
s.tv_sec)*1000000+(e.tv_usec-
n++;
creates a load of about 90 per cent on
s.tv_usec);
gettimeofday( &e,
our system as one would expect from
if ( tdiff >= t_musec )
NULL );
viewing the listings.
break;
t = (e.tv_sec-
}
s.tv_sec)*1000000
Watching the Kernel
return n;
+(e.tv_usec-
The interface used by top and co.
}
s.tv_usec);
/proc/stat yields the CPU load measured
}
in timeslots only. Thus, one would sus-
int main( int argc, char** argv )
while ( t < RUNTIME );
pect that the kernel also applies the same
{
level of granularity. In the kernel
unsigned long long cur;
/* sleep */
sources, fs/proc/proc_misc.c is respon-
usleep( 0 );
sible for outputting the /proc/stat
while ( 1 )
}
pseudo-file, as specified in [1].
{
}
The structure used at this point, kstat,
cur = get_iterations(
contains the data on the timeslots used
TIME );
by each CPU. Each entry is placed in one
Table 1: Context Change
printf( " %20Lu
of three categories: user, nice, and sys.
Latency figures iter./s\n", cur );
The global counter, jiffies, is used to out- Process size Latency unpatched Latency patched
}
0 kByte 0.89 �s 0.97 �s
put non-utilized timeslots, that are either
return 0;
4 kByte 1.02 �s 1.11 �s
used up by the kernel s idle task, or not
}
16 kByte 4.31 �s 4.44 �s
used at all.
www.linux-magazine.com June 2003 47
KNOW HOW CPU Monitoring
single-processor systems running the added to the process
2.2.14 kernel [2]. This patch uses the structure. The last_
timestamp counter (TSC) available on cycles array used for
more modern processors to count the this purpose contains
number of CPU cycles consumed by each the value of the TSC for
process. the CPU in question at
When asked for a later, SMP capable the point when update_
version, Jan sent us a version for 2.4.0, process_cycles was last
that was the basis for the port to the later called (line 10).
kernel we were using. We additionally Lines 8 and 9 set the
added the ability to collate the number global counter that con-
of CPU cycles consumed either globally tains the used cycles
or by CPU, and to detect cycles used by per CPU. This only hap-
interrupts and soft IRQs to the patch. pens if the current
The latter was particularly important for process is not the idle
the kind of network measurements our process, which has a
department needed to perform. process ID (PID) of
In contrast to the process accounting zero. Figure 2: top with Harvey running
performed by the standard kernel, that In a similar fashion
occurs only at the end of each timeslot, kernel functions count the number of number of cancelled cycles. Each of
as previously discussed, we called the CPU cycles used to handle interrupts and these values are displayed as a total for
function shown as Listing 3, update_ soft IRQs. all CPUs and for each individual CPU.
process_cycles, shortly before the sched- The CPU cycles used by each process
Output via /proc/stat
uler assigns a new process to the CPU. is displayed in /proc/PID/stat again on
This is why the function is also called in The values ascertained here are output a per CPU basis and in total. /proc/inter-
the kernel/sched.c file. via standard process accounting facili- rupts_cycles contains a more detailed
In lines 3 through 5 update_process_ ties, that is, the /proc pseudo-filesystem breakdown of the cycles used by inter-
cycles first ascertains the current process, for the kernel. /proc/stat lists the number rupts and soft IRQs. Listing 4 shows the
the active CPU and the current value of of processes, interrupts, and soft IRQs output for the pseudo-file: the lines start-
the TSC. Line 7 updates the array, cycles, used, and not used, as well as the total ing with numbers contain the cycles
used by the corresponding interrupts,
and the last four lines show the four
Listing 3: Read CPU cycle statistics for the kernel
different soft IRQ types.
void update_process_cycles(void)
The system load values caused by
{
our program were ascertained for the
struct task_struct *p = current;
patched kernel, and correlate to the
int cpu = smp_processor_id();
values indicated by the CPUmeter
cycles_t t = get_cycles();
program (and make sense in program-
ming terms).
p->cycles[cpu] += t - last_cycles[cpu];
Our example in Figure 3 shows a
if ( p->pid )
comparison between the load generated
kstat.used_cycles[cpu] += t - last_cycles[cpu];
by Harvey as measured for the standard
last_cycles[cpu] = t;
kernel and the patched version. As you
}
can see, Harvey successfully hides from
the normal kernel, whereas the patched
Listing 4: /proc/interrupts_cycles
INFO
CPU0 CPU1
0: 20242854219 16586735080 IO-APIC-edge timer [1] Linux Cross-Reference: http://lxr.linux.no/
1: 1636655 1225320 IO-APIC-edge keyboard
[2] 2.2.14 Precise Accounting Patch Posting:
2: 00 XT-PIC cascade
http://www.beowulf.org/pipermail/
10: 00 IO-APIC-level usb-ohci beowulf/2000-February/008415.html
14: 251664916 263921645 IO-APIC-edge ide0
[3] Lmbench homepage: http://www.
23: 5601393923 5431463426 IO-APIC-level eth0
bitmover.com/lmbench
HI_SOFTIRQ: 3924680280 3191840981
[4] Precise Accounting Patch: http://www.ti.
NET_TX_SOFTIRQ: 1115199064 1274524965
uni-hd.de/HLT/documentation/
NET_RX_SOFTIRQ: 5157803275 4835883365
software-and-documentation.
TASKLET_SOFTIRQ: 90516230 90298686 html#kernel
48 June 2003 www.linux-magazine.com
CPU Monitoring KNOW HOW
kernel knows exactly what he is up to.
The patched kernel also displays correct
Patched
100
values for the load generated by
programs running on SMP systems.
Original
To discover the effect this patch had
80
on the scheduler s performance, we used
the LMbench [3] benchmark suite on
both the patched and unpatched single-
processor kernel. The values the
60
benchmark returned for context change
latency are shown in Table 1. More exact
accounting figures cost about a 10 per
40
cent increase in latency for context
changes.
Conclusion 20
The process accounting implementation
provided by the Linux kernel can return
incorrect values for the system load
0
01020 25
5 15
under specific circumstances. This is
Time in Sec.
caused by the timeslot based granularity
that the kernel applies to measure CPU Figure 3: Harvey exposed
cycle use.
The two test programs discussed here, discussed in this article, which is timestamp counter registers, and
Harvey and CPUmeter, paint a clear pic- available at [4], implements a process returned reliable results in our lab envi-
ture of this issue. The kernel patch accounting method based on the CPUs ronment for the system load. %
advertisement
CPU Load in %

Wyszukiwarka

Podobne podstrony:
load in frames
process accounting vol35goyha4z57lsdjr3czpqvikqq2kfqlry2jq vol35goyha4z57lsdjr3czpqvikqq2kfqlry2jq
Ionic liquids as solvents for polymerization processes Progress and challenges Progress in Polym
[13]Role of oxidative stress and protein oxidation in the aging process
2003 06 Najprostsze zdalne sterowanie
06 Procesy przyklady
2003 06 Szeregowy sterownik urządzeń
2003 06 Syrena elektroniczna
2003 06 Genialne schematy
[06] The Man In The Monn came down to soon
In the?rn
Ghost in the Shell 2 0 (2008) [720p,BluRay,x264,DTS ES] THORA
Superficial Fascia in the Hip Adductor Muscle Group tapeSP
Andrew Jennings 18 England in the iron grip
Flashback to the 1960s LSD in the treatment of autism
Bigfoot War 2?ad in the Woods
Barry Manilow Leavin In The Morning

więcej podobnych podstron