BedTotalSize: Difference between revisions

From genomewiki
Jump to navigationJump to search
No edit summary
(deleted the useless python thing, hiram's version if of course better and faster as well)
 
(One intermediate revision by one other user not shown)
Line 1: Line 1:
you can do this in awk with the single line statement:
<pre>
<pre>
#!/usr/bin/env python
awk '{sum += $3-$2}END{printf "total size: %d\n",sum}' file.bed
 
</pre>
 
from sys import *
import sys
from re import *


if len(argv)==2:
        print " Will read bed-style features from stdin"
        print " Will add all features-lengths together"
        print ""
        print " SYNTAX: "
        print " totalSize "
        exit()


 
[[Category:User Developed Scripts]]
 
line = sys.stdin.readline()
sum = 0
while line!="":        
    fields = line.split()
    start = int(fields[1])
    stop = int(fields[2])
    sum += (stop-start+1)
    line = sys.stdin.readline()
 
print "Total length of all features: "+str(sum)
</pre>
 
<pre>
#  you could also do this in awk with the single line statement:
#
#  awk '{sum += $3-$2}END{printf "total size: %d\n",sum}' file.bed
#
#  Plus, I don't think you want to add 1 to your stop-start calculation.
#  This relates to the subtle nature of the "0-relative" vs. "1-relative"
#  coordinate systems.  When in 0-relative you don't need the + or - 1's anywhere.
</pre>

Latest revision as of 09:16, 15 September 2006

you can do this in awk with the single line statement:

awk '{sum += $3-$2}END{printf "total size: %d\n",sum}' file.bed