TextReplace: Difference between revisions
No edit summary |
No edit summary |
||
Line 1: | Line 1: | ||
Sometimes you want to get these darn AK.... refseq names translated to something more readable. Using a file of <refseq>tab<name> (from the kgxref-table?) I translate them to normal names with the following script. | Sometimes you want to get these darn AK.... refseq names translated to something more readable. Using a file of <refseq>tab<name> (from the kgxref-table?) I translate them to normal names with the following script. | ||
Unlike an sql query this appends ALL names for given refseq and can be used on virtually any text file where you want to translate anything into something else. Isn't there a Unix-Command for this somewhere? (I think you are referring to the sed command --Hiram) | Unlike an sql query this appends ALL names for given refseq and can be used on virtually any text file where you want to translate anything into something else. Isn't there a Unix-Command for this somewhere? (I think you are referring to the sed command --Hiram ) | ||
---- | |||
Hmm. You mean I could rewrite the replacement-file with gawk into something like "s/from1/from1,to1/g s/from2/from2,to2/g s/from3/from3,to3/g" etc... that should replace and append all possible replacements. That's right, I haven't thought of that...kind of forgot about the possibily to generate long sed scripts and then hand them over to sed with -f... :-( | |||
---- | |||
<pre> | <pre> | ||
#!/usr/bin/python | #!/usr/bin/python |
Revision as of 16:51, 24 October 2006
Sometimes you want to get these darn AK.... refseq names translated to something more readable. Using a file of <refseq>tab<name> (from the kgxref-table?) I translate them to normal names with the following script.
Unlike an sql query this appends ALL names for given refseq and can be used on virtually any text file where you want to translate anything into something else. Isn't there a Unix-Command for this somewhere? (I think you are referring to the sed command --Hiram )
Hmm. You mean I could rewrite the replacement-file with gawk into something like "s/from1/from1,to1/g s/from2/from2,to2/g s/from3/from3,to3/g" etc... that should replace and append all possible replacements. That's right, I haven't thought of that...kind of forgot about the possibily to generate long sed scripts and then hand them over to sed with -f... :-(
#!/usr/bin/python from sys import * from optparse import OptionParser import re # === COMMAND LINE INTERFACE, OPTIONS AND HELP === parser = OptionParser("%prog [options] replaceList textfile: split lines from textfile into words and try to replace words using a replacement list-textfile (format: from tab to).") parser.add_option("-s", "--splitChars", dest="splitChars", action="store", help="use these ch aracters to split textfile when searching for matches", default="\t ") (options, args) = parser.parse_args() splitChars = options.splitChars splitCharsRe = re.compile(splitChars) # ----------- MAIN -------------- if args==[]: parser.print_help() exit(1) replFName = args[0] txtFName = args[1] # read repl file into dict replFile = open(replFName,"r") repl = {} for l in replFile: if l.startswith("#"): continue (fromStr, toStr) = l.strip().split("\t") if fromStr not in repl: repl[fromStr] = toStr else: repl[fromStr] += "," + toStr replFile.close() # iterate over lines of textfile and replace if txtFName!="stdin": txtFile = open(txtFName, "r") else: txtFile = stdin for l in txtFile: if l.startswith("#"): continue # fs = l.split() fs = splitCharsRe.split(l.strip()) for field in fs: if field in repl: l = l.replace(field, repl[field]) print l,