sed: Append text string to select sequences

Often in biodiversity research, one starts with genetic data before knowing the species. So sequences are labelled with specimen numbers or something similar. When the species name is later known/defined, it’s useful to append the species name to the sequence. Given a fasta alignment file with sequences named >RS followed by a sequential number, this can be done in sed with a few basic commands.

>RS530
CTAGCTGACTGACTAGCTGCAGCTGACTAGCTAG
>RS533
CTAGCTGACTGACTAGCTGCAGCTGACTAGCTAG
>RS535
CTAGCTGACTGACTAGCTGCAGCTGACTAGCTAG
>RS596
CTAGCTGACTGACTAGCTGCAGCTGACTAGCTAG
##searches for '>RS530-533' AND '>RS596' and appends species name
sed '/>RS53[0-3]\|^>RS596/ s/$/_Ilyodromus_sp/' input.fasta > output.fasta
>RS530_Ilyodromus_sp
CTAGCTGACTGACTAGCTGCAGCTGACTAGCTAG
>RS533_Ilyodromus_sp
CTAGCTGACTGACTAGCTGCAGCTGACTAGCTAG
>RS535
CTAGCTGACTGACTAGCTGCAGCTGACTAGCTAG
>RS596_Ilyodromus_sp
CTAGCTGACTGACTAGCTGCAGCTGACTAGCTAG
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s