A Guide to Running Jobs on the Bio-X Cluster

Many answers to questions can be found in the Bio-X Cluster FAQ.

  1. The qsub command submits jobs to the queue.
    To run a program from a script:
      qsub "script name"
    To enter an interactive session:
      qsub -I

    In practice, I use "qsub -I" when I am compiling, and other than that rarely if ever type in the qsub command directly: it is contained in scripts.

  2. Scripts
    Some
    example scripts for job submission exist in the FAQ. They are instructive to look at. But you can probably skip that for now.

    Here are the shell scripts that I use. I recommend that you make copies of each of these:

    $HOME/util/myqsub.sh

      #!/bin/bash _SUB=$HOME/util/_qsub.sh ARGVPL=$HOME/util/argv.pl PWD=`pwd` ARGS=`$ARGVPL --pack $*` qsub -o $HOME/tmp -e $HOME/tmp -v JOBDIR=$PWD,ARGS=$ARGS $_SUB

    $HOME/util/_qsub.sh
    (Note: the 'walltime' argument in this scripts is for 100 hours. You can change this easily yourself. In my own $HOME/util directory, there are a number of these scripts that run for different durations, e.g. _qsub_12.sh called by myqsub12.sh for 12 hour runs.)
      #!/bin/bash #PBS -l ncpus=1,walltime=100:00:00 #variables using the -v option for qsub #JOBDIR the directory where all the files are ARGVPL=$HOME/util/argv.pl ARGS=`$ARGVPL --unpack $ARGS` if [ -z $JOBDIR ] ; then echo "$PBS_JOBID: No JOBDIR" exit 1; fi cd $JOBDIR $ARGS

    These scripts require the following perl script, which handles argument packing:
    $HOME/util/argv.pl
      #!/usr/bin/perl -w unless ($ARGV[0]) { exit; } $mode = shift(@ARGV); if ($mode eq "--pack") { print $ARGV[0]; for ($i=1; $i<=$#ARGV; $i++) { print "?$ARGV[$i]"; } } elsif ($mode eq "--unpack") { @list = split(/\?/, $ARGV[0]); foreach $i (@list) { print "$i "; } } elsif ($mode eq "--print") { foreach $i (@ARGV) { print "$i "; } }

  3. You run these scripts with the following command from the shell:
      $HOME/util/myqsub.sh $HOME/util/your_script.sh whatever

    Usually is it most convenient to put this command inside another script. In general, you will be running many similar jobs at once. It is common practice to put this call (using the 'system' command in a perl script, for example) into a loop in a script that calls all your jobs at once.

  4. There is one more script, called 'your_script.sh' in the previous section.

    A template for this script follows. The template assumes that you have created two files:

    1. an input file, and
    2. a .tgz compressed fileof all files needed for the run.

    For example, if your input file were 'whatever', it would look for the file 'whatever.tgz', copy it over to the temporary space, unpack it, run the program 'delphi whatever', then repack it as 'results.whatever.tgz' and copy it back to the folder you started in. You need to run the script from the folder containing the .tgz files.

    To run it, you would enter the command:

      your_script.sh whatever

    The script your_script.sh should be something like this:

      #!/bin/bash # THIS SCRIPT TAKES ONE ARGUMENT: THE NAME OF THE INPUT FILE # IT ASSUMES THAT THE NAME OF THE .TGZ FILE TO MOVE AND UNPACK IS ${1}.tgz # # set variables for data directories # # PWD is defined in myqsub.sh export tempdir=$TMPDIR # temp space on local compute node export inputdir=$PWD # location of input files export outputdir=$PWD # location to put results into export output=$TMPDIR/$1.outfile # file for redirected standard output export error=$TMPDIR/$1.errfile # file for redirected standard error # copy input datafiles to temporary space on compute node # it is best to copy a single tar file instead of many files # NOTE: I've used tar-zipped files throughout - you can change this to tar if you like cp $inputdir/${1}.tgz $tempdir # cd to your working directory cd $tempdir tar xzf ${1}.tgz # put your commands to run here # write output to temp space # 1> $output 2> $error $HOME/bin/delphi $1 1> $output 2> $error # # copy results back to fileserver # tar cfz result.$1.tgz * cp result.$1.tgz $outputdir/ cd $outputdir
  • Getting files to and from the cluster

    If you are going to run opt.pl from the computer on your desk, follow these instructions:

    1. Getting files from the cluster:
      1. run opt.pl in a terminal window, if you haven't already
      2. in another window,
        scp -P #### username@bioxcluster.stanford.edu:path/to/files new/local/path/to/files
        (#### = 1000 + your user number)
    2. Putting files on the cluster:
      1. run opt.pl in a terminal window, if you haven't already
      2. in another window,
        scp -P #### local/path/to/files username@bioxcluster.stanford.edu:new/path/to/file
        (#### = 1000 + your user number)

    If you are going to run opt.pl from tree1 or cmgm or some other host, follow these instructions:
    --> substitute your hostname (e.g. tree1.stanford.edu) for all occurrences of 'host' below
    --> remember to use the right username with the right host

    1. Getting files from the cluster:
      1. ssh into your host and run opt.pl in a terminal window, if you haven't already
      2. in another window, ssh into your host, then
        scp -P #### username@bioxcluster.stanford.edu:path/to/files new/local/path/to/files
        (#### = 1000 + your user number)
      3. in another window,
        scp username@host:path/to/files new/local/path/to/files
    2. Putting files on the cluster:
      1. ssh into your host and run opt.pl in a terminal window, if you haven't already
      2. in another window,
        scp local/path/to/files username@host:new/path/to/files
      3. in another window, ssh into your host, then
        scp -P #### local/path/to/files username@bioxcluster.stanford.edu:new/path/to/file
        (#### = 1000 + your user number)