TextReplace: Difference between revisions

From genomewiki
Jump to navigationJump to search
No edit summary
(No difference)

Revision as of 12:53, 18 October 2006

Sometimes you want to get these darn AK.... refseq names translated to something more readable. Using a file of <refseq>tab<name> (from the kgxref-table?) I translate them to normal names with the following script.

Unlike an sql query this appends ALL names for given refseq and can be used on virtually any text file where you want to translate anything into something else. Isn't there a Unix-Command for this somewhere?

from sys import *
from optparse import OptionParser
import re

# === COMMAND LINE INTERFACE, OPTIONS AND HELP ===
parser = OptionParser("%prog [options] replaceList textfile: split lines into wo
parser.add_option("-s", "--splitChars", dest="splitChars", action="store", help=

(options, args) = parser.parse_args()
splitChars = options.splitChars
splitCharsRe = re.compile(splitChars)

# ----------- MAIN --------------
if args==[]: 
    parser.print_help()
    exit(1)

replFName = args[0]
txtFName = args[1]

# read repl file into dict
replFile = open(replFName,"r")
repl = {}
for l in replFile:
    if l.startswith("#"):
        continue
    (fromStr, toStr) = l.strip().split("\t")
    if fromStr not in repl:
        repl[fromStr] = toStr
    else:
        repl[fromStr] += "," + toStr

replFile.close()

# iterate over lines of textfile and replace
if txtFName!="stdin":
    txtFile = open(txtFName, "r")
else: 
    txtFile = stdin
for l in txtFile:
    if l.startswith("#"):
        continue
    # fs = l.split()
    fs = splitCharsRe.split(l.strip())
    for field in fs:
        if field in repl:
            l = l.replace(field, repl[field])
    print l,