Optimizing bowtie2 read alignment on a computer cluster

More cores doesn’t always mean higher performance. Aligning paired end illumina reads to a reference genome can be performed in bowtie2 with multiple cores using the -p option. Other people haven’t found much of a performance gain above 8 cores, but results can differ between systems. Here, I’ll take a look at what seems to work fastest on the cluster I have access to; 4, 8, 16 or 32 cores (anything above 32 will probably take too long to queue on the cluster as it’s usually quite busy).

  1. Start by making a bash script to time and run bowtie2 to align some reads to a reference genome. Paste the following into bowtietest.sh
            # usage: bowtietest.sh /path/to/RefereceIndexed /path/to/forwardreads.fastq.gz /path/to/reversereads.fastq.gz /path/to/output.sam <number of threads>
            (time /panhome/shearn/bin/bowtie2-2.2.7/bowtie2 -p $nodes -x $reference -1 $forward -2 $reverse -S $output."$nodes") > "$output"."$nodes"time.log

    Here, we specify a number of arguments during execution of the script, including nodes/cores. The bowtie2 alignment will be output as usual, alongside a logfile that will tell us how long the alignment took. Note that the reference has already been indexed, and one must specify the location of bowtie2. Make sure to make the script executable with:

            chmod +x bowtietest.sh
  2. The cluster I have access to handles job submission with Torque and Maui. So next we can write a PBS script that will allow us to submit multiple jobs that utilize varying numbers of cores. Paste the following into bowtietest.pbs
            #PBS -l walltime=24:00:00
            #PBS -o /path/to/output/log/
            #PBS -e /path/to/error/log/
            /path/to/bowtietest.sh $reference $forward $reverse $output $nodes

    Make sure to make the script executable with:

            chmod +x bowtietest.pbs
  3. Now we can run bowtietest.pbs a number of times, passing different parameters to the variables we specified. In particular we want to increase the number of cores, trying to keep the total memory quota about the same (at the time I had two .fastq.gz files that were about 24gb each.
            qsub -v forward=/path/to/forwardreads.fastq.gz,reverse=/path/to/reversereads.fastq.gz,reference=/path/to/indexedreference,output=/path/to/reads.sam,nodes=4 -l nodes=1:ppn=4,mem=6gb bowtietest.pbs
            qsub -v forward=/path/to/forwardreads.fastq.gz,reverse=/path/to/reversereads.fastq.gz,reference=/path/to/indexedreference,output=/path/to/reads.sam,nodes=8 -l nodes=1:ppn=8,mem=4gb bowtietest.pbs
            qsub -v forward=/path/to/forwardreads.fastq.gz,reverse=/path/to/reversereads.fastq.gz,reference=/path/to/indexedreference,output=/path/to/reads.sam,nodes=16 -l nodes=1:ppn=16,mem=3gb bowtietest.pbs
            qsub -v forward=/path/to/forwardreads.fastq.gz,reverse=/path/to/reversereads.fastq.gz,reference=/path/to/indexedreference,output=/path/to/reads.sam,nodes=32 -l nodes=1:ppn=32,mem=3gb bowtietest.pbs
  4. Once the jobs have finished, there should be four lots of output files in the directory specified. The files for each job should be stamped with the number of cores. For example if we want to know how long the job using 4 cores took we could open reads.sam.4time.log . If we plot all of the results we can see that time is still decreasing even at 32 cores.
    Keep in mind that if the cluster is really busy, the queue time for 32 cores may outweigh this improvement and it could be better to go with 16 cores.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s