Imagine you would like to perform a large series of calculation. Obviously, you would not run the complete series of calculations at the same time. In principle, you would like to start as many jobs as the number of processors on your computer can handle and start the next job in the series when the previous job has finished. You could write your own program for this, but there are many of such programs available. In this blog post, I will show you how to install and use Torque. Although Torque can be set up to run in a computer cluster, I will show you how you can install and use it on just a single machine. This tutorial is written for Linux Debian, but should in principle also work for Linux Ubuntu and with (hopefully) small modifications for the other distributions.
Download the tarball from the website. Extract, compile and install it on your machine.
wget http://www.adaptivecomputing.com/index.php?wpfb_dl=2868 -O torque-5.1.0.tar.gz cd torque-5.1.0 ./configure --prefix=/opt/gcc-4.7.2/torque-5.1.0 make -j5 sudo make install
Torque uses four daemons that have to be loaded at boot time. Copy their
init.d scripts to the
/etc/init.d folder like so
sudo cp contrib/init.d/debian.trqauthd /etc/init.d/trqauthd sudo cp contrib/init.d/debian.pbs_mom /etc/init.d/pbs_mom sudo cp contrib/init.d/debian.pbs_server /etc/init.d/pbs_server sudo cp contrib/init.d/debian.pbs_sched /etc/init.d/pbs_sched
and add these to the boot procedure
sudo update-rc.d pbs_mom defaults sudo update-rc.d pbs_server defaults sudo update-rc.d pbs_sched defaults sudo update-rc.d trqauthd defaults
Next, we would like to set our machine as both the server as well as the (only) node.
sudo /etc/init.d/trqauthd start
Log in as root and add the binary folders of Torque to the path.
su export PATH=/opt/gcc-4.7.2/torque-5.1.0/sbin:/opt/gcc-4.7.2/torque-5.1.0/bin:$PATH ./torque.setup root
If you get an error like the following
qmgr obj= svr=default: Bad ACL entry in host list MSG=First bad host:
/etc/hosts file and change the directives in there. Torque only reads the first two columns to match the hostname with an IP adress.
Add your machine to the list of nodes by editing
/var/spool/torque/server_priv/nodes. You have to specify the number of cores in your machine after the
/var/spool/torque/mom_priv/config and set your machine as the
$pbsserver. Also configure the bitmap for the logging events.
$pbsserver ST-A1771 $logevent 225
Please note that in the above file,
ST-A1771 should be replaced by the name of your local machine. Moreover, this name should match an IP address which can be configured in
/etc/hosts. (thanks to
danielmejia55_at_gmail_dot_com for mentioning this, see comments below)
In order for every user to submit files to the queuing system and check the current status of the queue, you would like that every user has the
bin folder of Torque in their
$PATH variable. As such, add the Torque binaries folder to the PATH in
echo 'export PATH=/opt/gcc-4.7.2/torque-5.1.0/bin:$PATH' >> /etc/profile
Finally, log out as root (CTRL+D)
To check that everything is correctly configured, run
If you do not get something like this, you can try to reset
pbs_server. (see below)
state = free power_state = Running np = 6 ntype = cluster status = rectime=1424867568,cpuclock=OnDemand:1998MHz,varattr=,jobs=,state=free,netload=6890398942,gres=,loadave=0.00,ncpus=4,physmem=8066840kb,availmem=11324404kb,totmem=11970324kb,idletime=240,nusers=1,nsessions=2,sessions=3336 27395,uname=Linux ST-A1771 3.2.0-4-amd64 #1 SMP Debian 3.2.65-1+deb7u1 x86_64,opsys=linux mom_service_port = 15002 mom_manager_port = 15003
sudo /etc/init.d/pbs_server restart
Also start the scheduler
sudo /etc/init.d/pbs_sched start
And test a job by running
echo "sleep 30" | qsub
When you run
you should see something like
Job ID Name User Time Use S Queue ------------------------- ---------------- --------------- -------- - ----- 0.ST-A1771 STDIN ivo 0 R batch
danielmejia55_at_gmail_dot_com has noted (see comments below) that if you encounter an error that certain queue directives are missing, that you need to set these first. He has kindly provided the settings he has used.
Below, an example submission file for a multiprocessor job is given. In the submission file, you specify the name of the job after the
PBS -N directive, the number of nodes and the number of processors per node and finally the maximum time the job is allowed to run. Typically, you would like to run the job in the same folder as where the jobfile is residing. To do so, you can use the
$PBS_O_WORKDIR variable. Furthermore, you can use the
$PBS_NP variable to pass the number of processes to the
#!/bin/bash # #This is an example script example.sh # #These commands set up the Torque Environment for your job: #PBS -N TestJob #PBS -l nodes=1:ppn=4,walltime=00:12:00 pwd cd $PBS_O_WORKDIR pwd #print the time and date date mpirun -np $PBS_NP ./testjob #print the time and date again date