Kernel Hacking HOWTO
Andrew Ebling
November 2001
Chapter 1
Introduction
Document version 0.1
1.1
About this document
This document is aimed at:
• Those who are confident compiling the Linux kernel and would like to contribute
to it’s development, but feel intimidated by the 2 million lines of kernel source
code.
• Those who want to find out if kernel programming is of interest.
• Those who want to improve their kernel programming productivity.
It aims to accelerate the learning process by:
• Bring together as much relevant information as possible.
• Providing ”the basics” in key areas.
• Supplying pointers to more in-depth information.
1.2
Contacting the author
The author of this document can be contacted at this email address: kh.howto@lineone.net
Please use the following subject line conventions (in addition to your own short and
concise subject) to help me handle queries efficiently:
• [KH-HOWTO QUESTION]
for any howto related questions or to suggest an FAQ entry.
• [KH-HOWTO RFC]
for any ”request for change” type suggestions.
1
• [KH-HOWTO BUG]
for any factual error/bug reports.
• [KH-HOWTO TYPO]
for typo corrections.
• [KH-HOWTO CONTRIB]
to submit a new/to-do section.
• [KH-HOWTO FEEDBACK]
for general feedback.
• [KH-HOWTO]
for any other howto related correspondance.
I ruthlessly filter on subject, so any emails not following these conventions will silently/automatically
get deleted. Thankyou for understanding and co-operation.
1.3
Why hack the kernel?
Some reasons why people get into kernel programming:
• They find the current kernel inadequate in some way and want to help fix the
problem, perhaps by writing a new device driver or by working on improving the
performance of one of the kernel’s subsystems.
• They want to learn how a real operating system works, as opposed to the vague,
high level concepts taught on most computer architecture courses.
• They find application programming too easy or boring and are looking for some-
thing more challenging!
1.4
Prerequisites
What should I know before I start working through this document?
• A fair amount of Linux experience from a user/developer perspective (suggested
minimum of a year).
• A good, working knowledge of the C programming language. If this does not
include you, why not start learning now? There are many good books on C, some
of which are well suited to home/self learning e.g. SAMS Publishing ”Teach
Yourself in 21 Days/24 Hours” series. Alternatively you can find good C tutorials
online at:
http://directory.google.com/Top/Computers/Programming/Languages/C/Tutorials/
• Some knowledge of operating systems and associated concepts is useful but not
essential.
2
Chapter 2
Linux Kernel Overview
This section gives some background to the rest of the document; a very brief overview
of the Linux kernel and the key concepts needed to understand its fundamental opera-
tion.
2.1
Introduction
The kernel can be seen as the heart of an operating system, loaded into RAM at boot
time and remaining present until power down, it has two main responsibilities:
• To service low level hardware programming requirements (e.g. responding to
hardware interrupts).
• To provide an environment for processes; instances of programs or threads in
execution.
The Linux kernel is said to be monolithic, that is it is a single large executable,
consisting of several logically divided components.
2.2
Kernel Modes
The kernel can operate in one of two modes: user or kernel. Programs normally run in
user mode, in which they have no direct access to kernel data structures or hardware
devices. A switch to kernel mode can be triggered by:
• A system call (a library function that makes a request to the kernel).
• A CPU signaling an exception (an anomolous condition that requires special
attention e.g. divide by zero).
• An interrupt issued to the CPU by a hardware device to indicate that it requires
attention.
3
The kernel spends much of it’s time in kernel mode running on behalf of a user process.
However several threads are executed in kernel mode on behalf of the kernel itself,
carrying out ”house keeping” activities. Once the pending operation in kernel mode is
complete, the kernel switches back to user mode.
2.3
Modules
The kernel is capable of dynamically loading additional portions of code (modules) on
the fly, to enhance its functionality. Amongst other things, modules can add support
for file systems or specific hardware devices. When the functionality provided by a
module is no longer required, the module can be unloaded, freeing memory.
2.4
Processes
A process is an instance of a program in execution. Every process has:
• A state, either runnable, interruptible, uninterruptible, stopped or zombie.
• A context, a snapshot of all CPU registers (PC, SP, PSW, general purpose, float-
ing point & memory management).
• A process descriptor, a data structure that holds all the information associated
with a process.
The kernel provides a multiprogramming environment; many processes can be ac-
tive simultaneously. Each processes contends for the various hardware resources avail-
able. The kernel must ensure that the resources are shared appropriately. Multipro-
gramming is supported by giving each process in the runnable state queue an opportu-
nity to run on the CPU in turn. The process that ”owns” the CPU at a particular instant
is referred to as current. The procedure for swapping between runnable processes is
termed a context switch. A context switch involves saving the context (a snapshot of
the CPU state) of the current process and loading the context of the next runnable pro-
cess. Context switches can only occur when the kernel is in user mode, so the kernel
cannot perform immediate context switchs and is termed non-preemptive.
Each user process runs in its own address space, an assigned portion of the total
memory available. Address spaces (or parts of) may be shared between processes upon
request, or automatically if the kernel deems it appropriate. The separation of the
address space of processes prevents one process from interfering with the operation of
another or the operating system as a whole.
In addition to the normal user processes running on a system, several kernel threads
are created at system startup and run permanently in kernel mode, carrying out various
house keeping functions for the kernel.
4
2.5
Synchronisation
The kernel is reentrant; several processes may be executing in kernel mode at one time.
Of course, on a uniprocessor system only one process can make progress, with all
others blocked and waiting in a queue. Example: A process requests a file read. The
Virtual File System translates the request into a low level disk operation and passes it
to the disk controller, on behalf of the process. Instead of waiting until the operation
is complete (many thousands of CPU cycles later), the process voluntarily gives up the
CPU after making the request and the kernel allows a waiting process to make progress
in kernel mode. When the disk operation is completed (signalled by an interrupt), the
current process gives up the CPU to an interrupt handler and the original process is
woken up, resuming where it left off.
In order to implement a reliable reentrant kernel, care must be taken to ensure the
consistency of kernel data structures. For example if one process modifies a resource
counter behind the back of another waiting process, the result could be potentially
disastrous. The following steps are taken to prevent this occurrence:
• One process may only replace another in kernel mode if it has voluntarily relin-
quished the CPU, leaving data structures in a consistent state.
• Interrupts may be disabled during critical regions; areas of code that must be
completed without uninterruption.
• Use of a spin lock to control access to data structures. (SMP systems only)
• Access of data structures is carefully controlled using semaphores.
Semaphores consist of the following:
• A counter variable (an integer), initialised to 1.
• A linked list of processes waiting to access the data structure.
• Two atomic methods up() and down() which increment and decrement the
counter respectively.
When a kernel control path accesses a data structure protected by a semaphore, it calls
down()
, if the new value of the counter is not negative access is granted, if it is negative
access is blocked and the process is added to the semaphore’s linked list of waiting
processes. Similarly, when a process has finished with a data structure, it calls up()
and the next process in the waiting list gains access.
Precautions must be taken to ensure deadlock is avoided, where several processes
own a single resource, but each are waiting on a resource owned by another waiting
process. If this list waiting processes forms a closed circle, deadlock is reached. For
an in-depth explanation of deadlock see ”The Dining Philosophers Problem” on the
internet or in any computer architecture textbook.
5
2.6
Signals and Inter Process Communication
A signal is a short message, sent between two processes or between a process and the
kernel. Two types of signal are used to notify processes of system events:
• Asynchronous events (e.g. SIGTERM, issued by the Ctrl-C key sequence).
• Synchronous errors/exceptions (e.g. SIGSEGV when a process attempts to ac-
cess an illegal address).
There are about 20 different signals defined in the POSIX standard, some of which may
be ignored. Some signals cannot be ignored and are not even handled by the process
itself. Linux uses System V IPC which is made up of:
• Semaphores (requested via semget() system call).
• Message queues (received via msgget(), sent via msgsnd() system calls).
• Shared memory (requested via shmget(), accessed via shmat() and relinquished
via shmdt() system calls).
2.7
Memory Management
Linux uses virtual memory, a level of abstraction between process memory requests
(linear addresses) and the physical addresses used to fulfill them. It makes the following
possible:
• Enables processes to run whose memory requirements exceed the physical RAM
available.
• A continuous address space, independent of physical memory organisation.
• Demand paging; portions of data or code are only loaded into RAM when re-
quired.
• Shared images of programs and libraries, making memory use more efficient.
• Transparent relocation of running programs in memory.
The address space is divided into 4kB portions called ”pages”, which form the basic
unit of all virtual memory transactions. Physical RAM is also divided into 4kB por-
tions called “page frames”, each of which can store any arbitrary page. Because the
total address space exceeds that of the RAM available, only a subset of all the available
pages can be held in RAM at one time. However a page must be present in RAM for it
to be accessed as data or executed as a program. Because any page can be relocated in
any page frame, the kernel must keep track of where the used pages are kept. This is
implemented using page tables, which are used to convert logical addresses into phys-
ical ones. On Intel x86 hardware, Linux actually uses a two level page table scheme
(but uses a three level scheme internally to improve portability) to reduce the amount
6
of memory taken up by page tables. To convert a linear address into a physical one,
the tables are consulted in this order: Page Global Directory then Page Table to yield a
page number and an offset within the page. Therefore a linear address can be broken
down into three parts: Directory, Table and Offset. Because Linux 2.2 can address
4GB of address space (using 32 bit addresses) and uses a 4kB page size, the 10 most
significant bits make up the Directory, the next 10 most significant bits make up the
Table (hence identify the page required) and the 12 least significant bits make up the
offset from the start of the page.
2.8
Virtual File system
The VFS is responsible for providing a common interface to all underlying file systems
present on a system; ”The Common File Model” which is capable of representing
files on any type of file system. The file systems supported by the VFS fall into three
categories:
• Disk based, including hard disk, floppy disk and CD-ROM.
• Network based, including NFS, AFS and SMB.
• Special file systems, including /proc and /dev/pts.
The common file model can be viewed as object-oriented, with objects being software
constructs (data structures and associated methods/functions) of the following types:
• Super block object; stores information relating to a mounted file system; corre-
sponds to a file system control block stored on disk (for disk based file systems).
• Inode object; stores information relating to a single file, corresponds to a file
system control block stored on disk (for disk based file systems).
• File object; stores information relating to the interaction of an open file and a
process. This object only exists while a process is interacting with a file.
• Dentry object; links a directory entry (a pathname) with its corresponding file.
Recently used dentry objects are held in a dentry cache to speed up the translation
from a pathname to the inode of the corresponding file. The dentry cache consists of
two types of data structure:
• Dentry objects in the following states: in use, unused or negative.
• A hash table to speed up pathname to inode translation.
2.9
Disk Caches
**FIXME** This section needs updating for 2.4.x kernels. Linux dynamically sets
aside a certain proportion of the available memory for two disk caches; the buffer
cache and the page cache. Use of caches increases system performance by minimising
the number of time consuming disk access required.
7
2.9.1
The Buffer Cache
The buffer cache is made up of lots of buffers, each of which refer to a single arbitary
block on a block device. Each buffer consists of a buffer head data structure and an
area of memory equal to the blocksize of the associated device, used to hold data. To
minimise the CPU overhead involved in maintaining the buffer cache, all the buffers
are held in one of several linked lists. Each of the linked lists contains buffers in the
same state; unused, free, clean, dirty, locked etc. In order to gain a significant perfor-
mance benefit using a cache, the process of checking the buffer cache for a particular
buffer must be as efficient as possible. Everytime a call to read() is made, the buffer
cache must be checked for the required block(s) first. To enable buffers to be found
quickly, a hash table is maintained, containing all the buffers present in the cache.
The getblk() function is the main service routine for the buffer cache; it performs
the functions described above. The buffer cache can also be used to improve the disk
writing performance of the system. Instead of carrying out all writes immediately, the
kernel stores data to be written in the buffer cache, waiting to see if the writes can be
grouped together. A buffer that contains data that is waiting to be written to disk is
termed ”dirty”. A field in the buffer head data structure indicates if a particular page is
dirty or not.
2.9.2
The Page Cache
The page cache is made up of pages, each of which refers to a 4kB portion of data
associated with an open file. The data contained in a page may come from several disk
blocks, which may not be next to each other on the disk. The page cache is largely
used to interface the requirements of the memory management subsystem (which uses
fixed, 4kB pages) to the VFS subsystem (which uses variable size blocks). The page
cache has two important data structures, a page hash table and an inode queue. The
page hash table is used to quickly find the page descriptor of the page holding data
associated with an inode and offset within a file. The inode queue contains lists of
page descriptors relating to open files. The three main service routines for the page
cache are find page(), add to page cache() and remove inode page(). Special
care must be taken to synchronise the two caches, to prevent processes from receiving
stale data. Should the kernel become short on memory, memory can be reclaimed by
emptying the disk caches of old, unused data. This task is performed by a dedicated
kernel thread.
8
Chapter 3
Source Tour
The kernel source is made up of around two million lines of code. While that may
seem intimidating, it is important to remember that very few people understand all the
subsystems and associated source code in depth. You can improve your programming
productivity if you know where to look for specific code, down to a directory and a
source file.
3.1
What goes where
Fortunately, the source is well organised into a logical directory structure. This section
gives a quick guide to the top level kernel source directory:
• Documentation: Information about specific platforms & devices as well as gen-
eral kernel information.
• arch: Architecture specific code; i386, sparc etc.
• drivers: Device specific code; sound card, network card etc.
• fs: Filesystem specific code; ext2, vfat etc.
• include: Kernel header files.
• init: All the code associated with the boot and initialisation process.
• ipc: Inter Process Communication code; shared memory implementation etc.
• kernel: The core kernel code; scheduling, signals etc.
• libs: Kernel related libraries; image decompression etc.
• mm: Memory Management related code.
• net: Network related code.
• scripts: kernel related scripts (e.g. patch-kernel)
9
3.2
Key data structures
This section gives a quick guide to some of the fundamental data structures, including
key fields and where to find them in the kernel source tree.
3.2.1
Process descriptor
Process Descriptor: task struct:
/include/linux/sched.h:281
3.2.2
Page descriptor
10
Chapter 4
Tools
This section aims to explain the development tools that are fundamental to the kernel
development process.
4.1
Editors
There is a wide choice when it comes to choosing an editor. Many (heated/religious)
debates have taken place over which editor(s) are best suited to (kernel) programming.
In order to appease both sides and to give a balanced view, both vi/vim and emacs
will be presented here! The idea is help the undecided figure out which editor best
suits their needs.
4.1.1
vi/vim
Just about every UNIX-like system has at least vi available, if not vim (vi-improved)
which has some extra, tasty features like syntax hightlighting. Reasons to use vim for
kernel programming:
• Integrates really nicely with ctags. If you make tags in the top level kernel
source directory, you can then quickly access function & variable definitions by
pressing ctrl-] whilst the cursor is over the function/variable in question.
• Autocompletion of variable and function names can save edit-compile cycles by
reducing spelling mistakes. This particularly applies to kernel programming as
some of the function names are quite long and involved.
• Fast loading time.
• Small memory footprint; leaves more memory for kernel recompiles.
Other reasons to use vim
11
• If you can use vim, you will always have an editor to hand. Even the most basic
rescue boot disks have at least vi. In short, one day you’ll need to know how
to use it, be it sooner or later! The more proficent you are, the less painful that
experience will be.
• The keyboard shortcuts involve less hand movement, for an unmodified key map
at least.
Reasons not to use vim
• vim is linked against a lot of libraries: ldd /usr/bin/vim including some X
libraries. ldd /usr/bin/vi is a much shorter list.
• Some people really don’t get on with the modal way of doing things.
Probably the best way learn the basics of vim is to follow the tutorial; start vimtutor
and enjoy. The vim HOWTO also has a lot of useful information; it is available from:
http://www.linuxdoc.org/HOWTO/Vim-HOWTO.html. For more specific information
on editing C source code with vim, see the ”C editing with Vim HOWTO”, available
from: http://www.linuxdoc.org/HOWTO/C-editing-with-VIM-HOWTO/index.html. For
more general information on vim see http://www.vim.org.
4.1.2
emacs
Any die hard emacs user is welcome to make contributions for this section! Reasons
to use emacs for kernel programming:
•
Other reasons to use emacs
•
Reasons not to use emacs
• It is a big package.
• Longer loading time on modest systems.
• Larger memory footprint, so less memory available for kernel recompiles.
For more information on emacs see http://www.gnu.org/software/emacs/.
4.2
Development
4.2.1
make
make
is a program that is used to determine which parts of a multi-source-file project
need to be rebuilt. Only those source files which have been modified are rebuilt be-
fore the final linking step. This is a big plus for the kernel programmer as once the
12
”stock” kernel has been built, recompiles to include changes take seconds instead of
minutes. make operates on a makefile, which contains the relationships/dependencies
between files and the commands used to build them. Just about every directory in the
kernel source tree has a makefile, including the top level directory. When you invoke
make dep bzImage modules
, make operates on the top level makefile which in turn
takes it through the other makefiles in subdirectories, depending on which options were
included during the make config step.
4.2.2
lclint
Some programming errors can be caught early on in the development cycle, saving time
and effort. You may not have bothered with formal analysis of your C code before,
but now would be a good time to start; bugs in kernel code are generally harder to
track down and can have more serious implications than their user-space counterparts.
Generally speaking, a higher standard of programming is called for; while a user may
be prepared to put up with an email client that occasionally core dumps, they certainly
won’t put up with kernel freezes or worse, data loss. lclint is a program that can
be used to statically check C code; that is checking performed after writing code, but
before compilation or execution. Like lint, lclint can be run on unmodified C
source code and used to catch classic programming mistakes. However, lclint can
do a whole lot more for you, but you must give it ”clues” by annotating your source
code in the appropriate way. In short, lclint can give you some of the foresight that
a very experienced programmer has at their disposal. Whilst it may not be perfectly
suited to kernel C code (which has some unusual contributes), it can still be used to
prove fragments of code prior to inclusion in the kernel source. More information on
lclint
can be found at the home page: http://lclint.cs.virginia.edu/
4.3
Source Code Navigation
This section describes some invaluable tools for navigating the Linux Kernel source
code.
4.3.1
grep
grep
is used to find lines matching a given pattern; an invaluable tool for quickly
locating variable/function definitions and use. Example: Suppose you want to find all
instances of task struct:
$grep -r task_struct * | less
grep
recursively (-r) scans all files in the source tree and prints lines containing
task struct
, those lines are piped to less, which provides a scrollable output. man
grep
for more details!
13
4.3.2
lxr
lxr
enables you to navigate source code via a web browser, where all variables and
functions are links to their respective definitions. Probably the best way to read and un-
derstand code. You can browse the kernel source code online: http://lxr.linux.no/source/,
or download lxr and set it up on your box, useful if you don’t have a permenant Internet
connection. The setup process is quite involved but details are given in the README
file that comes with the lxr source code.
4.3.3
cscope
cscope
is useful for doing many of the things you could use grep for, but is more
intelligent and provides a nicer interface to work with. You can search for definitions,
uses, strings etc. Before you can use cscope you need to build an index file. This can
be done by issuing the command cscope -b -R -k in the top level source directory;
-b
to build the index, -R to search recursively through the source tree and -k to indicate
kernel use; this ensure the appropriate include files are used when generating the index.
To start a cscope session, type cscope -d. You will then get something that looks like
this:
-----------8<--start of screen dump--8<------------
Cscope version 15.3
Press the ? key for help
Find this C symbol:
Find this global definition:
Find functions called by this function:
Find functions calling this function:
Find this text string:
Change this text string:
Find this egrep pattern:
Find this file:
Find files #including this file:
-----------8<--end of screen dump--8<------------
14
The top half of the screen is used to display search results, whilst the lower half is
used to issue commands. Example; suppose you want to find the definition of the
file system type
data structure. Use the arrow keys to move the cursor to the Find
this global definition:
field and enter the name of the data structure, followed
by enter. If all is well, cscope will find just one instance and will open your favourite
editor (set by the EDITOR environment variable) to display the appropriate section of
the file. Now suppose you want to find the definition of super block. Follow the
procedure above, which should give you this output:
-----------8<--start of screen dump--8<------------
Global definition: super_block
File
Line
0 vxfs_extern.h
44 struct super_block;
1 super.c
265 int (*test)(struct super_block *, struct buffer_head *);
2 udfdecl.h
52 struct super_block;
3 fs.h
688 struct super_block {
4 fs.h
936 struct super_block *(*read_super) (struct super_block *,
void *, int );
5 udf_fs_sb.h
71 __u32 (*s_partition_func)(struct super_block *, __u32,
__u16, __u32);
Find this C symbol:
Find this global definition:
Find functions called by this function:
Find functions calling this function:
Find this text string:
Change this text string:
Find this egrep pattern:
Find this file:
Find files #including this file:
-----------8<--end of screen dump--8<------------
This time, cscope has found multiple possible definitions. You can view each defini-
tion in its context by pressing the numbers given next to the list of files, alternatively,
you can use Tab to move to the top half of the screen and the arrow keys to select a
definition. Simply quit out of the editor to return to cscope. Press Tab again to return
to the command area. Note that you can also move up and down using Ctrl-p and
Ctrl-n
, which saves moving your hands away from a typing position. To exit cscope,
press Ctrl-d.
cscope
source code and documentation is available from http://cscope.sourceforge.net/,
I would advise using version 15.3 or later.
15
4.3.4
ctags
ctags
provides similar functionality to cscope, but integrates closely with your editor,
allowing look-ups with a few key strokes. Like cscope, ctags builds an index file
to speed up searches. The easiest way to generate the index is to type make tags in
the top level kernel source directory. When using vim, moving the cursor to a use
of a function/variable/data structure/type and pressing Ctrl-] should take you to the
definition. Ctrl-t takes you back to where you were. Note that lookups can be nested,
in which case, Ctrl-t takes you up one level. Details for using ctags with emacs are
welcome! ctags can be obtained from http://ctags.sourceforge.net/. Debian users will
want to install the exuberant-ctags package.
4.3.5
Summary
A brief summary of the tools presented: lxr is useful for browsing and understanding
code, not so good for editing. cscope is useful for finding a definitions if you know the
name of function/variable/data structure concerned and want quick access to the source
file, perhaps to add a field to a data structure. ctags is useful for doing quick look-ups
during an editing session, but isn’t quite as smart as cscope is.
4.4
Source code manipulation
4.4.1
diff
diff
is used to compare two files and output any differences between them. When used
in unified mode (-u option) to compare two files (original and modified) a ”patch” is
produced:
diff -u linux-2.4.14/drivers/char/keyboard.c linux/drivers/char/keyboard.c > my_keyboard_patch
Where linux-2.4.14 directory holds the original, unmodified source tree and linux
directory holds the one you have hacked around with. Distribution and ideas and
changes to the kernel source as patches is a lot more convenient and efficient than
distributing complete modified files or source trees. The procedure above can be used
when just one file has been modified, but what if you need to modify a lot files and
produce a patch?
diff -urN linux-2.4.14 linux > my_hefty_kernel_patch
Note that the convention is to generate patches from the directory above the top level
kernel source directory i.e. /usr/src if you keep your modified kernel source in
/usr/src/linux
. Note that if you are generating a patch to post to the Linux Ker-
nel Mailing List, be sure to follow the instructions given in the FAQ exactly. The FAQ
can be found at: http://www.tux.org/lkml/. Of course, use of diff is not restricted to
generating patches; it is a useful tool for finding out what has changed between two
kernel releases. Uncompressed patches are human readable to some extent; the format
16
is fairly self explanatory. You can even grep a patch if you know what you are looking
for.
4.4.2
patch
patch
is used to apply patches to a file or a source tree:
cd linux-2.4.13
patch -p1 patch-2.4.14
This procedure would update a 2.4.13 tree to 2.4.14. The -p1 option strips the top
level directory off all filenames within the patch (as you are patching from inside the
top level source directory). Of course, you could apply the patch from the directory
above, but you would need to have your directories named in the same way as when
the patch was generated. Note that patches are often distributed in compressed form,
to save bandwidth. You can save disk space (and typing) by uncompressing patches as
you apply them:
bzip2 -dc /usr/src/patch-2.4.14.bz2 | patch -p1
Simply replace bzip with gzip if the patch was gzipped instead of bzipped. A useful
script is included in the kernel source tree to semi-automate the process of upgrading
a source tree by applying successive patches: linux/scripts/patch-kernel. Read
the script to see what it does and how to use it, instructions are given in the comments
at the top of the file. It is often a good idea to do a ”dry run” of applying a patch,
especially if you are patching a heavily modified tree, or are attempting to apply an old
patch against a newer tree. Backing out a partially applied patch can be time consuming
and generally is not much fun!
bzip2 -dc /usr/src/patch-2.4.14.bz2 | patch -p1 --dry-run
What if the patch does not apply cleanly? If only a couple of files failed, you could ap-
ply the patch anyway and sort things out with a text editor afterwards, alternatively you
could manually go through and apply the patch by reading it and making the changes
by hand. The manual approach is sometimes neccessary to resolve conflicting patches
and is a useful technique if you want to understand exactly what a patch changes. What
if you want to remove a previously applied patch? Easy:
bzip2 -dc /usr/src/patch-2.4.14.bz2 | patch -R -p1
This would take your source tree back to 2.4.13.
4.4.3
RCS
Any developement work is an incremental process but partically so at the debuging
stage. It always pays to keep a record of the process using some form of revision control
system, so that a bad change can backed out for example. A very basic revision control
system can be implemented by just keeping a backup of a file before making major
17
changes, however this approach tends to become cumbersome very quickly. Enter
RCS, the little brother CVS. CVS is great for large projects with many contributors, but
is overkill for small personal projects. One of the attractions of RCS is it’s simplicity:
First create the RCS directory in the same directory as your source files:
mkdir RCS
Then ”check in” a source file:
ci -u some-file.c
You will then be prompted to give a description. By default, RCS deletes the working
file upon check in, so you will want the -u option which automatically checks the file
out again. Check in all the files you are working on in this way. Make some changes to
one of your source files and check it in again. You’ll be prompted for a summary of the
changes and the version number will be incremented. Suppose you have made a mess
of the working file and want to revert to a known good version (1.7), check it out using
this command:
co -l -r1.7 some-file.c
The -l flag locks the file and gives you write access to it (otherwise you get read access
only). RCS stores the initial file and only differences between versions, saving disk
space. For more information on RCS, see man rcs, ci, co, rcsintro, rcsdiff,
rcsclean, rcsmerge, rlog, rcsfile and ident
18
Chapter 5
Tasks
This chapter provides descriptions of some common kernel related tasks.
5.1
General Kernel
The information in the section is a summary of the kernel-HOWTO, available from:
http://www.linuxdoc.org/HOWTO/Kernel-HOWTO.html
5.1.1
Building and installing a kernel
• Unpack the source code: tar xIvf <tarball name> -C <destination directory>
• make config and answer the list of questions with ”y” to build an option into
the kernel, ”n” to leave it out and ”m” to build it as a module. ”?” gets you help
on most options.
• Review the .config file found in the top level source directory and change as
neccessary.
• make bzImage modules
• Become root and make modules install
• Copy System.map to /boot/System-<kernel-version>.map
• Copy arch/i386/boot/bzImage to /boot/bzImage-<kernel-version>
• Edit /etc/lilo.conf and add a new entry for your new image, using one of the
existing ones as a template. Keep the old entries in case the new kernel does not
boot.
• Run /sbin/lilo, as root.
• Shutdown and reboot, be sure to select your new kernel at the lilo prompt (or
whatever bootloader you use), if you did not set it up to boot by default.
19
• Enjoy!
Note that you can set the default kernel for the next reboot only by re-running lilo with
-R <kernel image label>
.
5.2
General Hacking
This section is still very incomplete...
5.2.1
Print messages to kernel logs
The printk() can be used to output info to the kernel logs.
5.2.2
Creating a new module
See forthcoming revised Linux Module Programming Guide.
5.2.3
Configuration options
The easiest way to add boot time options is to find the existing kernel option that
matches your requirements most closely and copy that. Take a look at and edit the
Config.in
file in the associated directory. Then surround your option specific code
with inclusion guards.
5.2.4
Boot time parameters
It is possible to pass information to the kernel at boot time in the form of command line
options. For information about current options, see ”The Linux BootPrompt-HOWTO”
and Some drivers make use of this facility to set base addresses where hardware auto-
detection is not possible or not implemented. Boot time parameters are especially
useful when drivers or features are compiled into the kernel and not as a module, where
options can be given when the module is loaded.
¡description of how to add boot params goes here¿
5.2.5
Adding a system call
It should be possible to do just about everything you could need to without doing this.
Really.
5.2.6
Add a /proc entry
¡description of how to add a /proc entry goes here¿
20
5.3
Drivers
This topic is too diverse to be covered in the first version of this document. Please see
the Oreilly book entitled ”Linux Device Drivers” for a comprenhensive guide.
21
Chapter 6
Kernel Debugging
6.1
When things go wrong
The Linux kernel is not perfect, but it is getting better all the time. Occasionally bugs
even creep into the stable kernel series as ”improvements” are made. What course of
action is appropriate if a problem is discovered? If you are using an unmodified kernel,
try going down this checklist before posting a bug report to the kernel mailing list:
• Is the problem reproducable in the latest stable kernel?
• Has the problem always existed? If so, report it as a bug in the latest kernel. If
not, test successive kernels until you have found the version that introduces the
problem.
• Search the kernel mailing list archives for similar/related reports. If any come to
light, try to concentrate your efforts on providing additional information to that
already supplied.
• Check the changelog for clues. If anything in the changelog looks suspicious,
examine the patch for that kernel version and find out what relevant code changes
occur in that version.
Some (but not all) problems give rise to a screen dump of cryptic debugging informa-
tion, also known as an ”oops”.
6.2
Analysis of an Oops
6.2.1
What is an oops?
When the kernel detects that a serious anomolous condition exists, an ”oops” is trig-
gered. An oops has two main functions:
• To dump useful debugging information that can be used to diagnose the cause of
problem.
22
• To try and prevent the kernel from going out of control and causing data corrup-
tion, or worse, damage to hardware (although this is very rare).
To the uninitiated, an oops appears completely incomprehensible; a lines of hex values
and seemingly cryptic, even amusing error messages:
CPU:
0
EIP:
0010:[<c011933c>]
Tainted: P
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010002
eax: 00000ce0
ebx: 00001000
ecx: c778a510
edx: 00000610
esi: 00000002
edi: 00000000
ebp: c02165c0
esp: c6663f58
ds: 0018
es: 0018
ss: 0018
Process pcmcia (pid: 1003, stackpage=c6663000)
Stack: 00000000 c02165a0 00000000 c02165c0 c6663fc4 c01193cf c010ac96 c0116406
c0116340 00000000 00000001 c02165c0 fffffffe c011616a c02165c0 00000000
c0214900 00000000 c6663fbc 00000046 c010817d 00000000 080caa18 00000000
Call Trace: [<c01193cf>] [<c010ac96>] [<c0116406>] [<c0116340>] [<c011616a>]
[<c010817d>] [<c0109f48>]
Code: 89 42 04 89 10 c7 41 04 00 00 00 00 c7 01 00 00 00 00 fb 53
> > EIP; c011933c <timer_bh+228/27c>
<=====
Trace; c01193cf <do_timer+3f/70>
Trace; c010ac96 <timer_interrupt+62/110>
Trace; c0116406 <bh_action+1a/48>
Trace; c0116340 <tasklet_hi_action+40/60>
Trace; c011616a <do_softirq+5a/ac>
Trace; c010817d <do_IRQ+a1/b4>
Trace; c0109f48 <call_do_IRQ+5/d>
Code;
c011933c <timer_bh+228/27c>
00000000 <_EIP>:
Code;
c011933c <timer_bh+228/27c>
<=====
0:
89 42 04
mov
%eax,0x4(%edx)
<=====
Code;
c011933f <timer_bh+22b/27c>
3:
89 10
mov
%edx,(%eax)
Code;
c0119341 <timer_bh+22d/27c>
5:
c7 41 04 00 00 00 00
movl
$0x0,0x4(%ecx)
Code;
c0119348 <timer_bh+234/27c>
c:
c7 01 00 00 00 00
movl
$0x0,(%ecx)
Code;
c011934e <timer_bh+23a/27c>
12:
fb
sti
Code;
c011934f <timer_bh+23b/27c>
13:
53
push
%ebx
<0>Kernel panic: Aiee, killing interrupt handler!
3 warnings issued.
Results may not be reliable.
23
6.2.2
Anatomy of an oops
¡to do¿
6.2.3
Decoding an oops
The information provided by an oops is in a very ”raw” form, some of which is specific
to the kernel image that generated it. Therefore, some post-processing needs to be
carried to obtain useful information on where to start with the debugging process. This
section contains a step by step to decoding an oops. ¡to do¿
6.3
Using a Debugger
6.3.1
A Word about using debuggers
Use of a debugger is generally looked down on by the likes of Linus. Consider these
quotes of his from the Linux kernel mailing list:
"’Use the Source, Luke, use the Source.
Be one with the code.’.
Think of Luke Skywalker discarding the automatic firing system
when closing on the deathstar, and firing the proton torpedo (or
whatever) manually.
_Then_ do you have the right mindset for
fixing kernel bugs."
Also:
"I’m afraid that I’ve seen too many people fix bugs by looking
at debugger output, and that almost inevitably leads to fixing
the symptoms rather than the underlying problems."
So are there any good reasons why you should use a debugger? Stop for a moment and
consider how the top kernel programmers go about locating and fixing a problem; how
do they do it? The answer is that they have many years of programming experience to
bring to bare on the situation; chances are they have seen something like this before.
They have those ”hunches” that semi-automatically lead them to the right place; to the
real root of the problem. So how does the ”up and coming” kernel hacker nurture skills
like these to maturity, especially when time pressure demands a quick solution? The
answer comes in the form of intelligent use of a debugger.
• Use the debugger to collect the evidence surrounding the problem area(s).
• Study the code and think about what is going on. hard.
• Try to concentrate on thinking about possible causes of the symptoms you are
seeing in the debugger. Then think about the causes of the causes all the way
down to the real root of the problem. Write a list of the possibilities, placing
them in order of perceived likelyhood and rule them out in turn, one by one. The
process of clarifying thoughts to write them down can be valuable.
24
• Until you have some experience, you may need to use the debugger to try some
ideas out on the fly by changing variable values etc.
Note that we are using the debugger here as a tool to stimulate rational, logical thought
on what is going on in the code. As you get more experienced at tracking bugs down,
you will be able to use the debugger less; you won’t need as many clues before you see
the problem. If you use the debugger wisely, you’ll gain the expertise of the ”hardcore”
kernel hackers, but in less time. In summary then, some do’s and don’ts: Do:
• Study the code before you set to with the debugger; you will be more productive
if you have thought about the code first.
• Use the debugger to test your assumptions; bugs often come about as a result
of incorrect assumptions (have you ever seen those ”we should never get here”
debugging messages?).
• If your assumptions are proved wrong, make it your business to get to the bottom
of why you got it wrong and make a mental note for next time.
• Discard the debugger as soon as possible; think of Luke Skywalker again here!
Don’t:
• Ignore the real cause of problems that come to light. When you find something
amiss, don’t just shrug your shoulders and fix things up so that everything looks
OK. A classic example is adding another section to a switch statement to cover
an eventuality you hadn’t thought of. Do you understand why that scenario is
occuring? Consider the possibility that your approach and/or assumptions are
flawed. Don’t go ”wallpapering over the cracks”; you’ll only have to fix it prop-
erly later on (or worse, someone else will).
• Blindly use the debugger to narrow a problem down; you could well come to the
wrong conclusion and you won’t learn as much in the process.
Now that we have some idea of what debuggers are good for (and not so good for),
some of the options available to you will be presented in the next section.
6.3.2
Debugging Techniques
There are three different approaches to kernel debugging (apart from the printk()
method which doesn’t really count):
• Local kernel debugging (done by running gdb with /proc/kcore as the core file),
this approach is of very limited use; amongst other limitations, no breakpoints
can be set.
• User Mode Linux; a way to run Linux inside Linux. This is good for general
hacking as no extra hardware is required but not so good for troubleshooting
a hardware specific problem. For more information on User Mode Linux, see:
http://user-mode-linux.sourceforge.net
25
• Remote kernel debugging; the kernel under test runs on a physically separate
machine and communicates with the controlling machine running the debugger,
via a serial link. This is the approach described in this document.
If you have never used the gdb debugger, I strongly suggest you familiarise yourself
with it in userspace before delving into kernel debugging. This gdb reference card is
also very useful: http://www.refcards.net/about/gdb.html. While there are some graph-
ical front ends for gdb (xxgdb, ddd), it is a good idea to get used to using plain gdb on
the console; it is better not to be running X whilst kernel hacking/debugging; you may
miss crucial console output or worse, an oops. Kernel compiles can also be faster on a
machine without X running, especially on machines with less than 64MB RAM, as X
uses a significant amount of RAM.
6.3.3
Setting up the Hardware
The two machines in this example will be referred to as ”kernighan” (workstation/development
machine) and “ritchie” (testing machine that actually runs the development kernels).
• First of all, you need a machine to be your testing box. Note that it does not
have to have a monitor, keyboard and mouse. My testing machine was cobled
together from spare parts! A keyboard can be useful though, in order to be able
to use the magic SysRq key.
• Set up ethernet between the two machines. Check you can ping ”ritchie” by
using it’s hostname.
• Build two serial cables using the pin out given below. One cable will be used by
gdb, the other will be used to give access to the console of testing. This saves
having another monitor on the desk and makes capturing Oopsen a lot easier. The
ethernet connection is useful for multiple remote logins and quickly transfering
kernel images etc. Here is the serial cable pin out: Din 9 pin to Din 9 pin:
Solder-side pins:
\-----------------------/
\
1
2
3
4
5
/
\
6
7
8
9
/
\-----------------/
Wiring: (use 7 or 10 wire foil screened cable)
1
|
6---------------4
2---------------3
3---------------2
26
4---------------6
|
1
5---------------5
7---------------8
8---------------7
Connect the cable screen to the chasis of one of the connectors, this will help
prevent an earth loop between the two machines. this pin out was taken from the
text-terminal-howto, i decided to reproduce it here to remove the confusion over
which of the many pin outs to use. a standard null-modem cable may work if
you don’t want to build the cable(s) yourself, however i had to modify mine to
the above configuration to make it work. if you go out and buy a null-modem
cable especially, i’d advise getting one with connectors that can be disassembled
(i.e. not moulded on connectors) to make modification possible.
If you have a Radio Shack store nearby, here are some part numbers:
– Serial cable: 26-152B (Female DB9 - Female DB9)
– Null Modem adapter: 26-264B (Female DB9 - Male DB9)
Thank you Pragnesh Sampat for providing this information.
• connect the two machines together using the two serial cables, com1 to com1,
com2 to com2.
6.3.4
setting up the software
• install ssh on ”kernighan”.
• install sshd on ”ritchie” (this is part of the ssh package for some distributions,
including debian).
• check you can log in via ssh to ”ritchie” (note: you won’t be able to log in as
root, but you can su to root once logged in as a user, should you need to).
• give yourself read/write access to /dev/ttys0 and /dev/ttys1 on both machines.
• install minicom on both machines.
6.3.5
preparing the kernel source and compilation
• download and unpack the kernel source to your home directory on devmach.
• download the kgdb patch to the top level kernel source directory: 2.2.18, 2.4.3.
27
• apply the patch. example:
bash-2.03$ cat kgdb 2.2.18.diff |patch
-p2
• make menuconfig (or copy your standard .config file and make oldconfig).
• select the usual configuration options, adding:
o support for console on serial port under character devices
o kernel support for gdb (new) under kernel hacking.
I would advise compiling all additional options directly into the kernel, rather
than as modules to start with.
• if you have a keyboard directly attached to your testing machine, you may also
want to add magic sysrq key, be sure to read (and maybe print out) documenta-
tion/sysrq.txt.
• proceed with a normal compilation; make dep bzimage
• copy the image over to ”ritchie”; scp arch/i386/boot/bzimage ritchie:bzimage-2.2.18-kgdb
(note: you will need to have the same user name on “kernighan” and “ritchie”
for this command to work). alternatively you could setup and use ftp.
• ssh into ”ritchie”, su to root and move the image into /boot.
• create a new entry in /etc/lilo.conf (on ”ritchie”):
image=/boot/bzimage-2.2.18-kgdb
label=kgdb
root=/dev/hda1
read-only
append=" gdb gdbttys=0 console=ttys1"
The extra command line options tell the gdb debugging stub to listen on /dev/ttys0,
and to tell the kernel to use /dev/ttys1 as a serial console. You can also control lilo
(i.e. choose which image to boot) from the serial console if you add this serial =
0,9600n8
to the top of your lilo configuration file.
• Run lilo (on testing).
• If you decided not to make the debugging kernel the default kernel, run lilo -R
kgdb
to make the new image boot as a ”once-off”. (that way if the debugging
kernel fails to boot for whatever reason, the machine will boot a working kernel
next time to enable you to resolve the problem).
• Create a file named .gdbinit in the top level directory of the kernel source tree
on ”kernighan”, containing the following:
28
define rmt
set remotebaud 38400
target remote /dev/ttyS0
end
• Run (as root) minicom -s on ”kernighan”, go to serial port setup and select
these options:
Serial Device
: /dev/ttyS1
Lockfile Location
: /var/lock
Callin Program
:
Callout Program
:
Bps/Par/Bits
: 9600 8N1
Hardware Flow Control
: No
Software Flow Control
: No
• Go to "Save setup as dfl", and save settings as default before going to Exit,
leaving minicom awaiting input.
6.4
The Debugging Session
• Shutdown and reboot ”ritchie”.
• After the BIOS Power On Self Test the debug kernel should load on testing,
giving the following output in minicom on ”kernighan”:
Linux version 2.2.18serialgdb (jfreak@kernighan) (gcc version 2.95.2 20000220 (Debian GNU/Linux)) #6 Fri Jun 15 17:02:55 BST 2001
Detected 167046 kHz processor.
Console: colour VGA+ 80x25
Calibrating delay loop... 333.41 BogoMIPS
Memory: 63556k/65536k available (704k kernel code, 408k reserved, 824k data, 44k init)
Dentry hash table entries: 8192 (order 4, 64k)
Buffer cache hash table entries: 65536 (order 6, 256k)
Page cache hash table entries: 16384 (order 4, 64k)
CPU: Intel Pentium 75 - 200 stepping 0c
Checking 386/387 coupling... OK, FPU using exception 16 error reporting.
Checking ’hlt’ instruction... OK.
Intel Pentium with F0 0F bug - workaround enabled.
POSIX conformance testing by UNIFIX
Trying to free free IRQ4
Waitng for connection from remote gdb on ttyS0
• On ”kernighan”, type gdb vmlinux (in the top level kernel source directory).
gdb
will start and you should see a license notice followed by a (gdb) prompt.
29
• Type rmt at the gdb prompt (and press enter). This reads the rmt command from
your .gdbinit file. If the serial link is working correctly, gdb should give the
following output:
(gdb) rmt
0xc010da29 in breakpoint () at gdb.c:701
701
if (initialized) BREAKPOINT();
(gdb)
Note: the hex address will be different for each kernel image. At this point, gdb
is paused awaiting user input. You may set breakpoints, watch expressions etc.
here before giving the continue command; type c.
• The debug kernel will continue to boot (giving further output in minicom on
”kernighan”).
• Once booting is finished, check that you can log into ”ritchie” using ssh.
• You may then set up a test case to cause the debug kernel to run the code to be
debugged.
• You may add breakpoints by using Ctrl-C to get a gdb prompt.
6.5
Troubleshooting
If the serial link does not appear to be working, try going down this check list:
• Check the serial ports have correct baud rate, parity settings etc.
• Double check cable wiring against pin out diagram.
• Check the continuity of the serial cable with a multimeter if you have one to
hand.
• ssh into ”ritchie” and set up a minicom to minicom session between the two
machines. Typing in one minicom session should produce output on the other.
If you have a Palm Pilot to hand, it can be used as a serial console in conjunction with
the cradle (a very useful tool for trouble shooting). You’ll need a copy of ptelnet, avail-
able from: http://netpage.em.com.br/mmand/ptelnet.htm. Use the following options in
Options-¿Terminal menu:
• Mode: Serial
• Return: CR
And in Options-¿Serial:
• Baud: 9600
30
• Parity: N
• Word: 8
• StopBits: 1
• Xon/Xoff: not checked
• RTS/CRS: not checked
If you are unable to ssh into ”ritchie” when running the debug kernel, check the fol-
lowing:
• You compiled in support for your ethernet card (not as a module!).
• You compiled in sufficient network support to enable ethernet use.
• Make sure you aren’t root (or use ssh -l username to stop ssh from trying to log
in as root).
6.6
Misc. Notes
• The gdb interface defaults to 38400bps, this can be increased right up to 115200bps
which may speed up debugging sessions, at the expense of stability.
• The serial console defaults to 9600bps, this too can be increased up to 115200bps
which is useful if you want to run curses based programs on the console, as
redraws are painfully slow at 9600bps!
6.7
Using the logs
Of course, it is possible debug the kernel without a debugger, simply by printing out
lots of tracing to the log files and carrying out a post-mortem afterwards. One problem
with this approach is that if your code is called alot, you can end up flooding the log
files to the extent that the machine becomes unusable and/or you fill your disk up! One
way to avoid this problem is to put your printk statements inside conditional code that
is only executed by a ”special” user (e.g. a user with a UID of 9999), then call/excercise
that code as that special user:
...
if (current->uid == 9999)
printk(<useful debug info...>);
...
Like most techniques, this one is good in some situations and not so good in others; try
it out and find how/when it works best for you. Many thanks to Andrew Morton for
posting this tip to the Linux Kernel Mailing list.
31
Chapter 7
Profiling & Benchmarking
7.1
Why profile?
Naturally, you should aim for efficiency by design and by writing tight code. Profiling
is done to locate sections of code are called most often and/or consume the most CPU
time. Optimisation effort can then be focussed on those rountines, giving the best
return on the time invested. If you have never profiled code before, it would be an idea
to get acquainted with gprof, a typical userspace command line profiler. Just as it is
poor practice to rely on a debugger to catch bad code, so it is bad practice to rely on a
profiler to catch inefficient code.
7.2
Basic profiling: /proc/profile
The kernel has some built in profiling functionality. If you add profile=1 to your ker-
nel command line arguments, then you get a file: /proc/profile which can be used
by readprofile to print profiling information to standard output. Example output
from readprofile:
$readprofile
2 stext 0.0500
514867
default_idle 12871.6750
1
copy_thread 0.0071
1
restore_sigcontext 0.0030
2 system_call 0.0312
2 handle_IRQ_event 0.0227
13
do_page_fault 0.0111
1
schedule 0.0012
1 wake_up_process 0.0132
1 copy_mm 0.0014
-------8<---Snip---8<-----------------
32
The first column contains the number of clock ticks spent in a function (second col-
umn), whilst the third column gives the normalised load; the number of clock ticks
divided by the length of the function. See man readprofile for details and use-
ful examples. readprofile is usually part of the util-linux pacakge. Sometimes
readprofile
is kept in /usr/sbin, so you may need to add that directory to your
PATH or softlink it to some directory that is in your path, e.g. /usr/bin. Note that if
you want to carry out profiling on a remote machine, you will need to copy System.map
and vmlinux across to /usr/src/linux from the top level source directory where the
running kernel was compiled.
7.3
Advanced profiling: Oprofile
7.4
Linux Trace Toolkit
7.5
Benchmarking
http://euclid.nmu.edu/ benchmark/
33
Chapter 8
Tips
8.1
Some Basic Rules
This section is taken from the old kernel hacking HOWTO by Rusty Russell.
8.1.1
No memory protection
If you corrupt memory the whole machine will crash. Are you sure you can’t do what
you want in userspace?
8.1.2
No floating point or MMX
The FPU context is not saved; you would mess with some user process’ FPU. If you
really want to do this, you would have to explicitely save/restore the full FPU state (and
avoid context switches). It is generally a bad idea; use fixed point arithmetic first.
8.1.3
A rigid stack limit
The stack is about 8K in 2.4, some is used by the process descriptor and the rest is
shared with interrups so you can’t use it all. Avoid deep recursion and huge local
arrays on the stack (allocate them dynamically instead).
8.1.4
Portable Code
The Linux kernel is portable; let’s keep it that way. Your code should be 64-bit clean,
and endian-independent (FIXME: expand on what this means in practice). You should
also minimize CPU specific code, so inline assembly should be cleanly encapsulated
and minimized (hence put in the architecture specific parts of the source tree) to ease
porting.
34
8.2
Proving Code
Kernel programming is by nature challenging; you have few of the ”safety nets” that
user space programmers take for granted. The write-test cycle is also longer, meaning
each line of kernel code takes longer to produce. You can save yourself a lot of time
by writing, debuging and testing as much of your code as possible in userspace before
incorporating it into the kernel. Obviously, not all code can be developed this way, but
a significant proportion can.
8.3
Coding Style
Hopefully, you will be aiming to get some of your code into the kernel at some stage.
Even if you don’t, there is a good chance you will post some of your code to the Linux
Kernel Mailing List at some stage, perhaps as part of a request for help, after you have
done your homework of course. You can improve the chances of a successful submis-
sion/help request by using the ”right” coding style. Understandably, Linus wants a con-
sistent coding style throughout the source tree, so expects contributions that follow his
preferred conventions. Although you may not agree with them, you had better get used
to using them! Take a good at and inwardly digest /Documentation/CodingStyle.
8.4
Commenting Code
Your emphasis here should be on explaining why your code does what it does, not how.
If you feel the need to explain how your code works, consider re-writing it! You may
find it helpful to deliberately over comment your code during development/debugging
then trim it back before submission. The idea here is that the comments may jog
your memory during the debugging phase, concerning logic, approach, rationale and
assumptions used as the code was written.
8.5
Protect Data Structures
During development it is often a good idea to protect important data structures with so
called ”magic numbers”. This is done by adding a field in the data structure specifi-
cally to hold some random number, of your choice. When accessing or modifying the
data structure, the magic number field can be checked for the expected value. If the
correct value is not found, the data structure may have been corrupted, or the pointer
to that data structure may be corrupted. Magic numbers can be used to detect critical
conditions and take appropriate action e.g. invoke an ”Oops”. Suppose that the copy
of a filesystems super block in memory has become corrupted by an array over-run.
Writing that corrupted data structure back to the disk would be a disaster. If the data
structure had been protected by a magic number, the corruption would probably have
been detected and the disastrous write prevented. For super critical data structures, it
may be worth placing two different magic numbers, one at the start and one at the end.
35
8.6
Keep a log book
It is good practice to keep a step by step record of your work. Eventually, you will
need all the juicy details of something you did two months ago and unless you happen
to be gifted with an exceptional memory, recalling everything correctly will be near
on impossible. A detailed work log can also be very useful when it comes to helping
another member of the kernel community with a problem that you have encountered
before. Whether you maintain a hard, written copy or an electronic copy is up to you;
both approaches have their own advantages.
36
Chapter 9
Choosing Your First Project
Some people will be reading this document simply out of interest or to find out if ker-
nel programming is of interest. Others may have a particular goal in mind; writing a
new device driver or filesystem perhaps. Whichever scenario is true of you, this chap-
ter contains information relevant to those thinking of embarking on their first kernel
programming project.
9.1
Does it have to be done in the kernel?
This should the first question to ask when considering a kernel programming project.
¡to do¿
37
Chapter 10
More Information
10.1
Source code docs
- Documentation/kernel-docs.txt - Other kernel docs
10.2
Links
- http://www.kernelnewbies.org - http://www.kerneltrap.com - http://www.lwn.net - http://www.dit.upm.es/ jmseyas/linux/kernel/hackers-
docs.html
10.3
Books
10.3.1
C Programming
10.3.2
OS concepts
38
Chapter 11
Getting Help
11.1
mailing lists
• Kernelnewbies: http://www.kernelnewbies.org. Consists of a webpage, IRC
channel (#kernelnewbies on irc.openprojects.net) and a mailling list.
• Linux kernel Mailing list: http://www.tux.org/lkml. A high volume list, post to
it only after all other options are exhausted. It certainly should not be your ”first
port of call”. Please go down this check list before posting.
– Read the FAQ at this address: http://www.tux.org/lkml
– Search the archives at: http://www.uwsg.indiana.edu/hypermail/linux/kernel/index.html.
– Read the latest kernel-traffic at: http://kt.zork.net/, to see if the problem
you are experiencing is a known issue.
– Do your homework first (e.g. read appropriate man pages, guides and
HOWTO documents)
– Read & follow this posting guide: http://www.tuxedo.org/ esr/faqs/smart-
questions.html
– Polish your message before posting; keep it short but informative.
11.2
IRC
#kernelnewbies on irc.openprojects.net Please add to this list!
39
Chapter 12
FAQ
Send me questions with [KH-HOWTO QUESTION] in the subject.
40
Chapter 13
References & Acknowledgments
¡to do¿
41