Sliding window in awk

AWK is good for running a sliding window/moving average on your data because it’s fast and you can pipe the output of samtools or most other cli programs straight into it. Say you have the following data:

mCoord	chr	coord	samplo5	samplo10	corrlo5	corrlo10
1	X	1	41	4	7.42585e-07	7.5852e-08
2	X	2	41	4	7.42585e-07	7.5852e-08
3	X	3	41	5	7.42585e-07	9.48149e-08
4	X	4	41	5	7.42585e-07	9.48149e-08
5	X	5	41	5	7.42585e-07	9.48149e-08
6	X	6	41	5	7.42585e-07	9.48149e-08
7	X	7	41	5	7.42585e-07	9.48149e-08
8	X	8	40	5	7.24473e-07	9.48149e-08
9	X	9	40	5	7.24473e-07	9.48149e-08
10	X	10	39	5	7.06362e-07	9.48149e-08
11	X	11	38	5	6.8825e-07	9.48149e-08

where corrsamplo5 and corrsamplo10 are depth coverage values at each coordinate (corrected by the number of reads for each sample) for two different samples. You can make a sliding window for these columns with the following awk one-liner:

awk -v OFS="\t" 'BEGIN{window=4;slide=2} { if(NR==1) print "coord",$6,$7 } {mod=NR%window; if(NR<=window){count++}else{sum-=array[mod];sum2-=array2[mod]}sum+=$6;sum2+=$7;array[mod]=$6;array2[mod]=$7;} (NR%slide)==0{print NR,sum/count,sum2/count}' file.txt

change the values in BEGIN{window=4;slide=2} to change the window size and the number of bases to slide. This should output:

coord	corrlo5	corrlo10
2	3.71293e-07	3.7926e-08
4	5.56939e-07	6.16297e-08
6	7.42585e-07	9.00742e-08
8	7.42585e-07	9.48149e-08
10	7.33529e-07	9.48149e-08
12	7.10889e-07	9.48149e-08
Advertisements

2 thoughts on “Sliding window in awk

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s