Cancel range of job numbers with Torque

If you use Torque to submit jobs to a cluster, you will need to cancel jobs sometimes. In a previous post I showed how to cancel all jobs, but perhaps you want to leave some running. If this is the case, you could list each job number to cancel:

qdel 12240967 12240968 12240969 12240970 12240971 12240972 12240973 12240974 12240975 12240976 12240977 12240978

Not very elegant, especially if you have hundreds of jobs. Instead do something like this:

qdel {12240967..12240978}

Note: you get the job numbers by doing:

qstat -u <username>

add common commands as aliases

Got this idea from Riinu’s scripting diary. You can add/modify common commands as aliases in your .bashrc file if you are getting tired of typing them all the time

  • open the file with

            vi /path/to/.bashrc #often in ~/.bashrc
            
  • There should be a section within the file similar to this:

            # enable color support of ls and also add handy aliases
            if [ -x /usr/bin/dircolors ]; then
                test -r ~/.dircolors && eval "$(dircolors -b ~/.dircolors)" || eval "$(dircolors -b)"
                alias ls='ls -aoh --color=auto'
                alias qstat='qstat -u shearn'
                #alias dir='dir --color=auto'
                #alias vdir='vdir --color=auto'
    
                alias grep='grep --color=auto'
                alias fgrep='fgrep --color=auto'
                alias egrep='egrep --color=auto'
            fi
            

    Here I have added aliases for for qstat -u shearn (gives qstat user information after only typing qstat) and ls -aoh, where -a lists hidden files, -o is long format without group info and -h gives human readable file size information (eg gb instead of bytes).

  • Something else I usually do is to colour the prompt. To do this, remove the hashtag # from the following line

            #force_color_prompt=yes
            

LaTeX: using fancyhdr with the memoir class

The fancyhdr LaTeX package allows you to put a fancy header or footer on your book, like this:

One of the nicest things about this package is that you can easily set the header to show the chapter name on odd pages, section title on even pages, and page numbers everywhere (or anything else you want to display). If you use this with the memoir class however, you can get compilation errors because both the memoir class and fancyhdr attempt to define the \footruleskip command when loading. So if you want to use fancyhdr with memoir, make sure to undefine the \footruleskip command before fancyhdr is loaded, but after memoir is loaded, ie:

\documentclass[11pt,twoside]{memoir}
\let\footruleskip\undefined %undefine footruleskip
\usepackage{fancyhdr}

Autocomplete BibLaTeX citations in TexStudio

Autocompletion of BibLaTeX citations in TeXstudio is a convenient feature, but I found it stopped working today. Doing the following seems to have fixed the problem:

  1. As with most bibliography related issues, it’s a good idea to start by deleting all the files in your compilation directory except the .tex and .bib files and recompiling everything
  2. Make sure you actually have autocompletion for BibLaTeX turned on. Go to options>configure TeXstudio>Completion and under “use following completion files”, make sure the box next to biblatex.cwl is checked.
  3. Sometimes to be able to keep track of changes to the bibliography style, we separate the arguments to \usepackage[…]{biblatex} with line breaks, for example:

    \usepackage[
    		style=authoryear-comp,
    		sorting=nyt, %sort intext by name year then title
    		maxcitenames=2, %max intext authors
    		maxbibnames=30, %max endtext authors
    		url=false, %Dont print URL
    		sortcites=true, %sorts intext citations
    		backend=biber
    	]{biblatex}
            

    Apparently this can cause problems with the autocomplete feature. So try putting the whole command on one line, for example:

    \usepackage[style=authoryear-comp,sorting=nyt,maxcitenames=2,maxbibnames=30,url=false,sortcites=true,backend=biber]{biblatex}
            

    Recompile and you should get some citations popping up when you start typing \parencite{ for example.

Export to bibtex format with special characters in Zotero

I use Zotero to manage my citations when writing in LaTeX. To do this, you need to export your library to a .bib file. However, if you have special characters (for example é) in your citation entries, they can cause problems when you are compiling the final document. In the past I solved this by running a find-and-replace on the exported .bib file, but I found a much better way from David Meyer. You tell Zotero to display a character encoding option when exporting the library, then export the library with Western character encoding.

Display character encoding option on export

  1. In Zotero, go to edit>preferences
  2. Click on the export tab
  3. Make sure the “Display character encoding option on export” option is checked

Export library with Western encoding

  1. In Zotero, go to file>export library…
  2. Enter the location that you want the .bib file to be saved to
  3. There should be a drop-down menu entitled “Character Encoding” where you should select “Western”

Now you shouldn’t have any issues compiling with cited authors named Thérèse for example.

Automated flow diagrams in yED

Came across yED today, a java based flowchart creator/editor. The things that caught my eye about this were 1) its ability to auto arrange diagrams and 2) its ability to import data from spreadsheets. This makes things much easier if you are making slight alterations to a diagram over time.

Installation


download, give permissions, and run the installation script

wget http://www.yworks.com/resources/yed/demo/yEd-3.14.4_64-bit_setup.sh
chmod +x yEd-3.14.4_64-bit_setup.sh
./yEd-3.14.4_64-bit_setup.sh

If you have another os, there are more instructions on the yED webpage

Usage

I had a go at importing data and auto arranging it into a diagram using an example they had on their website

  1. example data (load into open office and save as .xls)

    Source Target
    Anna Ben
    Anna Chris
    Anna Dan
    Ben Chris
    Ben Dan
    Chris Dan
    Chris Elena
    Dan Elena
  2. Open yED
  3. select file>open (or ctrl+o)
  4. select the .xls file and click open
  5. select the following settings (tip: you can click and drag on the spreadsheet window instead of typing in some of the fields) and click ok
  6. You should now see the output in the editing window.

    You can still make changes to the arrangement and appearances of the boxes etc. I find importing and auto arranging is a lot less work than starting from scratch in the editor.

Optimizing bowtie2 read alignment on a computer cluster

More cores doesn’t always mean higher performance. Aligning paired end illumina reads to a reference genome can be performed in bowtie2 with multiple cores using the -p option. Other people haven’t found much of a performance gain above 8 cores, but results can differ between systems. Here, I’ll take a look at what seems to work fastest on the cluster I have access to; 4, 8, 16 or 32 cores (anything above 32 will probably take too long to queue on the cluster as it’s usually quite busy).

  1. Start by making a bash script to time and run bowtie2 to align some reads to a reference genome. Paste the following into bowtietest.sh
            #!/bin/bash
            # usage: bowtietest.sh /path/to/RefereceIndexed /path/to/forwardreads.fastq.gz /path/to/reversereads.fastq.gz /path/to/output.sam <number of threads>
    
            reference=$1
            forward=$2
            reverse=$3
            output=$4
            nodes=$5
    
            (time /panhome/shearn/bin/bowtie2-2.2.7/bowtie2 -p $nodes -x $reference -1 $forward -2 $reverse -S $output."$nodes") > "$output"."$nodes"time.log
    
            

    Here, we specify a number of arguments during execution of the script, including nodes/cores. The bowtie2 alignment will be output as usual, alongside a logfile that will tell us how long the alignment took. Note that the reference has already been indexed, and one must specify the location of bowtie2. Make sure to make the script executable with:

            chmod +x bowtietest.sh
            
  2. The cluster I have access to handles job submission with Torque and Maui. So next we can write a PBS script that will allow us to submit multiple jobs that utilize varying numbers of cores. Paste the following into bowtietest.pbs
            #PBS -l walltime=24:00:00
            #PBS -o /path/to/output/log/
            #PBS -e /path/to/error/log/
            /path/to/bowtietest.sh $reference $forward $reverse $output $nodes
            

    Make sure to make the script executable with:

            chmod +x bowtietest.pbs
            
  3. Now we can run bowtietest.pbs a number of times, passing different parameters to the variables we specified. In particular we want to increase the number of cores, trying to keep the total memory quota about the same (at the time I had two .fastq.gz files that were about 24gb each.
            qsub -v forward=/path/to/forwardreads.fastq.gz,reverse=/path/to/reversereads.fastq.gz,reference=/path/to/indexedreference,output=/path/to/reads.sam,nodes=4 -l nodes=1:ppn=4,mem=6gb bowtietest.pbs
            qsub -v forward=/path/to/forwardreads.fastq.gz,reverse=/path/to/reversereads.fastq.gz,reference=/path/to/indexedreference,output=/path/to/reads.sam,nodes=8 -l nodes=1:ppn=8,mem=4gb bowtietest.pbs
            qsub -v forward=/path/to/forwardreads.fastq.gz,reverse=/path/to/reversereads.fastq.gz,reference=/path/to/indexedreference,output=/path/to/reads.sam,nodes=16 -l nodes=1:ppn=16,mem=3gb bowtietest.pbs
            qsub -v forward=/path/to/forwardreads.fastq.gz,reverse=/path/to/reversereads.fastq.gz,reference=/path/to/indexedreference,output=/path/to/reads.sam,nodes=32 -l nodes=1:ppn=32,mem=3gb bowtietest.pbs
            
  4. Once the jobs have finished, there should be four lots of output files in the directory specified. The files for each job should be stamped with the number of cores. For example if we want to know how long the job using 4 cores took we could open reads.sam.4time.log . If we plot all of the results we can see that time is still decreasing even at 32 cores.
    bowtiecal
    Keep in mind that if the cluster is really busy, the queue time for 32 cores may outweigh this improvement and it could be better to go with 16 cores.