Other major changes from the previous architecture
OS & Software Upgrade
on the EECS Research Clusters
Daniel Andrzejewski
Electrical Engineering & Computer Science Department
University of Tennessee
December 3, 2008
Itstaff
Other major changes from the previous architecture
Outline
1
What this presentation is about
2
3
4
5
6
7
8
Other major changes from the previous architecture
Itstaff
Other major changes from the previous architecture
What this presentation is about
Reasons for the upgrade
Basic clusters infrastructure
What this presentation is about
Installing a node
Installing software
Managing modules
Using modules to compile software
Running an MPI job
Itstaff
Other major changes from the previous architecture
What this presentation is about
Introduction
Why the OS upgrade
- too much time spent on managing software
Why CentOS
- it matches our server infrastructure
Itstaff
Other major changes from the previous architecture
What this presentation is about
Reasons for the upgrade
Basic clusters infrastructure
affected systems:
greedo - cfengine server
slyder - package, tftp, proxy server
not affected systems:
mira, hanharr, dengar - LDAP
alex, kril, shire
- NFS
dns
- DNS, DHCP
zam
- NTP, Nagios, Syslog
Itstaff
Other major changes from the previous architecture
Some of the important files
Example
kickstart
PXE booting
How it is set up
an entry in the DHCP server points to slyder
tftp server on slyder
don’t forget to delete cfengine keys on greedo
example
greedo:˜# ./delete.keys.sh frodotest2
removed /var/cfengine/ppkeys/root-172.16.0.2.pub
Itstaff
Other major changes from the previous architecture
Some of the important files
/tftpboot
/tftpboot/pxelinux.cfg/frodo-head
/tftpboot/pxelinux.cfg/frodo-compute
/tftpboot/pxelinux.cfg/frodo-compile
/export/kickstart/centos - CentOS 5.2 repository
/export/kickstart/centos/ks.frodo-head.cfg
/export/kickstart/centos/ks.frodo-compute.cfg
/export/kickstart/centos/ks.frodo-compile.cfg
Itstaff
Other major changes from the previous architecture
Installing a node
Some of the important files
Example
slyder /tftpboot/pxelinux.cfg> ./gethostip frodo2
frodo2.sinrg.local 172.16.0.2 AC100002
slyder /tftpboot/pxelinux.cfg> ls -l AC100002
lrwxrwxrwx 1 root root 13 Oct 17 11:40 AC100002 -> frodo-compute
slyder /tftpboot/pxelinux.cfg> cat frodo-compute
default 0
timeout 50
prompt 1
display msgs/boot.frodo-compute.msg
F1 msgs/boot.frodo-compute.msg
F2 msgs/general.msg
F3 msgs/expert.msg
F4 msgs/param.msg
F5 msgs/rescue.msg
F7 msgs/snake.msg
label 0
kernel centos5.2/vmlinuz
append initrd=centos5.2/initrd.img ramdisk_size=6801
ks=nfs:172.16.0.72://export/kickstart/centos/ks.frodo-compute.cfg
ksdevice=eth0
Itstaff
Other major changes from the previous architecture
Installing software from an admin stand point
Two ways:
using package management (yum, rpm)
[root@frodo-head]# dsh -g frodo ’yum -y install lapack-3.1.1-1.el5.rf’
[root@frodo-head]# dsh -g frodo ’rpm -qa | grep lapack’
compiling from scratch and putting in /pkgs
what if there are other versions (e.g. mpicc from mpich and
openmpi)
set prefix to /pkgs/your-software-x.y.z, where x.y.z indicates a version
after installation create a diretory in /usr/local and duplicate the above directory tree using symbolic
links
example:
mkdir /usr/local/mpich
graft -i -t /usr/local/mpich /pkgs/mpich-1.2.7..2
add a module (only if created new directory under /usr/local)
Install software on the package server and from the head node update the
nodes using cfagent
Itstaff
Other major changes from the previous architecture
Modules provide an easy mechanism for updating a user’s environment
How to add a module
create a file in /pkgs/Modules/modulefiles
example:
slyder ˜> cat /pkgs/Modules/modulefiles/mpich-1.2.7p1
proc ModulesHelp { } {
puts stderr "Sets up environment to use mpich."
}
module-whatis "adds mpich paths to your environment variables"
set apppath /usr/local/mpich
prepend-path PATH $apppath/bin
prepend-path LD_LIBRARY_PATH $apppath/lib
prepend-path LIBRARY_PATH $apppath/lib
prepend-path MANPATH $apppath/man
append-path MANPATH /usr/local/man:/usr/share/man
Itstaff
Other major changes from the previous architecture
Example
frodo2> . /usr/local/modules/3.2.6/init/tcsh
frodo2> module avail
————————————— /pkgs/Modules/modulefiles —————————————
j2sdk1.4.2_18
modules
mpich-mx-1.2.7..7
mpich2-mx-1.0.7..2
openmpi-1.2.8
module-info
mpich-1.2.7p1
mpich2-1.0.8
mx-1.2.7
openmpi-1.2.8-mx
frodo2> module load openmpi-1.2.8-mx
frodo2> module list
Currently Loaded Modulefiles:
1) mx-1.2.7
2) openmpi-1.2.8-mx
frodo2> which mpicc
/usr/local/openmpi-mx/bin/mpicc
Itstaff
Other major changes from the previous architecture
Compiling software from a user standpoint
frodo-head> ssh frodo-compile
frodo-compile> . /usr/local/modules/3.2.6/init/tcsh
frodo-compile> module load openmpi-1.2.8-mx
frodo-compile> mpicc -o hello hello.c
Itstaff
Other major changes from the previous architecture
interactive
qsub -I -l nodes=64:ppn=2
. /usr/local/modules/3.2.6/init/tcsh
module load openmpi-1.2.8-mx
mpirun -np 64 hello
batch
example of a batch script hello.sh
#/bin/bash
#PBS -l nodes=64:ppn=1,pmem=1gb
NODES=‘cat $PBS_NODEFILE | wc -l‘
. /usr/local/modules/3.2.6/init/tcsh
module load openmpi-1.2.8-mx
mpirun -np $NODES $PBS_O_WORKDIR/hello
command to submit a job
qsub hello.sh
Itstaff
Other major changes from the previous architecture
Other major changes from the previous architecture
proxy server - yum updates can be done on the compute
nodes (private LAN)
mpich2 with mx drivers
Itstaff