Remove unused references from .bib file

Sometimes in LaTeX you are working from a .bib file that has a large number of reference entries, by the time you have finished working on your document, you might not need all of them. The following bash code generates a list of references cited with biblatex syntax in your .tex file (if you used bibtex syntax the code should be easily adjustable). In a unix terminal start with:

grep -oP '\\parencite\{\K[^}]+' introduction.tex > citations.txt #list contents of parencite commands
grep -oP '\\cite\{\K[^}]+' introduction.tex >> citations.txt #append contents of cite commands

At this stage you might like to go through the .tex document to find other citations with prefixes etc that would not match the grep call (ie search for “parencite[” and add results to citations.txt, not the most perfect solution – I know…).

replace commas with new lines, sort, print only unique entries, then remove empty lines with:

tr ',' '\n' < $citations | sort | uniq -u | grep -v -e '^$' > citationsSorted.txt

Then use this list to cut down your original .bib file to only the required entries:

for i in $(cat citationsSorted.txt); do grep -A 30 "$i" main.bib | sed '/^}$/q0'; done > mainCut.bib

or do the whole shebang:

 citations=$(mktemp) && grep -oP '\\parencite\{\K[^}]+' introduction.tex > $citations && grep -oP '\\cite\{\K[^}]+' introduction.tex >> $citations && tr ',' '\n' < $citations | sort | uniq -u | grep -v -e '^$' > citationsSorted.txt && for i in $(cat citationsSorted.txt); do grep -A 30 "$i" main.bib | sed '/^}$/q0'; done > mainCut.bib 
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s