Kent source utilities: Difference between revisions

From genomewiki
Jump to navigationJump to search
No edit summary
No edit summary
Line 1: Line 1:
More information about these tools can be found here [[Kent_source_utilities]].
More information about these tools can be found here [Kent_source_utilities].


<H3>Last updated: 2009-11-09 - 09 November 2009</H3>
<H3>Last updated: 2009-11-09 - 09 November 2009</H3>

Revision as of 22:44, 14 February 2011

More information about these tools can be found here [Kent_source_utilities].

Last updated: 2009-11-09 - 09 November 2009

aNotB:List symbols that are in a but not b
addAveMedScoreToPsls:Combines unigene pslFile and sage file into bed file
addCols:Sum columns in a text file.
affyPairsToSample:Takes a 'pairs' format file from the Affy transcriptome

data set and combines it with the Affy offset.txt file to output a 'sample' file

which has the contig coordinates of the result.
agpAllToFaFile:Convert all sequences in an .agp file to a .fa file
agpCloneCheck:Check that have all clones in an agp file (and the right version too)
agpCloneList:Make simple list of all clones in agp file to stdout
agpToFa:Convert a .agp file to a .fa file
agpToGl:Convert AGP file to GL file. Some fakery involved.
agxToBed:Utility program that condenses an altGraphX record into a bed record.
agxToIntronBeds:Program to output all introns from altGraphX

records as beds. Designed for use in MGC project looking for novel

introns from altGraphX records transferred over from mouse.
agxToTxg:Convert from old altGraphX format to newer txGraph format.
ali2alx:produces an index file for each chromosome into an ali file.
aliGlue:tell where a cDNA is located quickly.
allenCollectSeq:Collect probe sequences for Allen Brain Atlas from a variety of sources
altAnalysis:Analyze the altSplicing in a series of altGraphX's
altPaths:Examin altGraphX graphs and discover alternative splicing events and their paths.
altSplice:-help -- Display this message.

-db -- Database (i.e. hg15) to load psl records from.

-beds -- Coordinate file to base clustering on in bed format.
altSummary:Summarize the altSplicing in a series of altGraphX's
ameme:find common patterns in DNA

usage

ameme good=goodIn.fa [bad=badIn.fa] [numMotifs=2] [background=m1] [maxOcc=2] [motifOutput=fileName] [html=output.html] [gif=output.gif] [rcToo=on] [controlRun=on] [startScanLimit=20] [outputLogo] [constrainer=1]
apacheMonitor:check for error 500s in the last minutes
assessLibs:Make table that assesses the percentage of library that covers 5' and 3' ends
autoDtd:Give this a XML document to look at and it will come up with a DTD to describe it.
autoSql:create SQL and C code for permanently storing

a structure in database and loading it back into memory

based on a specification file
autoXml:Generate structures code and parser for XML file from DTD-like spec
ave:Compute average and basic stats
aveCols:average together columns
averagExp:Average expression data within a cluster
averageZoomLevels:takes a sorted sample file and creates averaged

'zoomed-out' summaries for a few different levels.

Basic idea is to get the size of a chromosome, divide it by 2000 as that is
avgTranscriptomeExps:Averages together replicates of the affy transcriptome data set.

Will skip certain experiments unless directed otherwise as they

were not in the original data set.
axtAndBed:Intersect an axt with a bed file and output axt.
axtBest:Remove second best alignments
axtCalcMatrix:Calculate substitution matrix and make indel histogram
axtChain:Chain together axt alignments.
axtDropOverlap:deletes all overlapping self alignments.
axtDropSelf:Drop alignments that just align same thing to itself
axtFilter:Filter axt files. Output goes to standard out.
axtForEst:Generate file of mouse/human alignments corresponding to MGC EST's
axtIndex:build index of axt file
axtPretty:Convert axt to more human readable format.
axtQueryCount:Count bases covered on each query sequence
axtRecipBest:create file for dot plot using recip best
axtRescore:Recalculate scores in axt.
axtSort:Sort axt files
axtSplitByTarget:Split a single axt file into one file per target
axtSwap:Swap source and query in an axt file
axtToBed:Convert axt alignments to simple bed format
axtToChain:Convert axt to chain format
axtToMaf:Convert from axt to maf format
axtToPsl:Convert axt to psl format
bedClip:Remove lines from bed file that refer to off-chromosome places.
bedCons:Look at conservation of a BED track vs. a refence (nonredundant) alignment track
bedCoverage:Analyse coverage by bed files - chromosome by chromosome and genome-wide.
bedDown:Make stuff to find a BED format submission in a new version
bedExtendRanges:extend length of entries in bed 6+ data to be at least the given length, taking strand directionality into account.
bedGraphToBigWig:Convert a bedGraph program to bigWig.
bedInGraph:Program to determine if bed exons are contained in

a splice graph. Original motivation to see which alt-spliced

probe sets are also alt in human (in addition to mouse).
bedIntersect:Intersect two bed files
bedItemOverlapCount:count number of times a base is overlapped by the items in a bed file. Output is bedGraph 4 to stdout.
bedOrBlocks:Create a bed that is the union of all blocks of a list of beds.
bedSort:Sort a .bed file by chrom,chromStart
bedSplitOnChrom:Split bed into a directory with one file per chromosome.
bedToBigBed:Convert bed file to bigBed.
bedToExons:Split a bed up into individual beds. One for each internal exon.
bedToFrames:Makes html files for browsing custom bed track using frames. Use -pad for padding
bedToGenePred:Too few arguments: convert bed format files to genePred format
bedToTxEdges:Convert a bed file into txEdgeBed, which can be used with txgAddEvidence.
bedUp:Load bed submissions after conversion back into new database.
bedWeedOverlapping:Filter out beds that overlap a 'weed.bed' file.
bigBedSummary:Extract summary information from a bigBed file.
bigBedToBed:Convert from bigBed to ascii bed format.
bigWigInfo:Print out information about bigWig file.
bigWigSummary:Extract summary information from a bigWig file.
bigWigToBedGraph:Convert from bigWig to bedGraph format.
binGood:convert text format alignment file to binary format
blastRecipBest:Pick out just the reciprocal best alignments.
blastToPsl:Convert blast alignments to PSLs.
blat:Standalone BLAT v. 34x5 fast sequence search command line tool
blatz:blatz version 1 - Align dna across species
blatzClient:blatzClient version 1 - Ask server to do cross-species DNA alignments and save results.
blatzServer:blatzServer version 1 - Set up in-memory server for cross-species DNA alignments
borfBig:Run Victor Solovyev's bestorf repeatedly
bottleneck:bottleneck v2 - A server that helps slow down hyperactive web robots
bptForTwoBit:Create a b+ tree index for a .2bit file. Key is the sequence name. Value is the position of the start of the compressed DNA in the .2bit file.
bptLookupStringToBits64:Given a string value look up and return associated 64 bit value if any.
bptMakeStringToBits64:Create a B+ tree index with string keys and unsigned 64-bit-integer values. In practice the 64-bit values are often offsets in a file.
bwana:do batch coarse alignment of C. briggsae and C. elegans genomes.
calc:Little command line calculator
calcGap:calculate gap scores
catDir:concatenate files in directory to stdout. For those times when too many files for cat to handle.
catUncomment:Concatenate input removing lines that start with '#' Output goes to stdout
ccCp:copy a file to cluster.usage:

ccCp sourceFile destFile [hostList]

This will copy sourceFile to destFile for all machines in
cdnaOff:creates sorted offset files that position cDNAs in chromosome.
chainAntiRepeat:Get rid of chains that are primarily the results of repeats and degenerate DNA
chainDbToFile:translate a chain's db representation back to file
chainFilter:Filter chain files. Output goes to standard out.
chainMergeSort:Combine sorted files into larger sorted file
chainNet:Make alignment nets out of chains
chainPreNet:Remove chains that don't have a chance of being netted
chainSort:Sort chains. By default sorts by score.

Note this loads all chains into memory, so it is not

suitable for large sets. Use chainMergeSort for that
chainSplit:Split chains up by target or query sequence
chainStats:Stitch psls into chains
chainStitchId:Join chain fragments with the same chain ID into a single

chain per ID. Chain fragments must be from same original chain but

must not overlap. Chain fragment scores are summed.
chainSwap:Swap target and query in chain
chainToAxt:Convert from chain to axt file
chainToPsl:Convert chain file to psl format
checkAgpAndFa:takes a .agp file and .fa file and ensures that they are in synch
checkCardinality:reviewIndexes - check indexes
checkChain:read all chains and report if duplicate ids
checkHgFindSpec:test and describe search specs in hgFindSpec tables.
checkSgdSync:Make sure that genes and sequence are in sync for SGD yeast database
checkTableCoords:check invariants on genomic coords in table(s).
checkableBorf:Convert borfBig orf-finder output to checkable form
chopFaLines:Read in FA file with long lines and rewrite it with shorter lines
chromGraphFromBin:Convert chromGraph binary to ascii format.
chromGraphToBin:Make binary version of chromGraph.
clusterGenes:Cluster genes from genePred tracks
clusterPsl:Make clusters of mRNA aligments
clusterRna:Make clusters of mRNA and ESTs
consForBed:Takes a bed file and a conservation file and outputs

conservation scores for each position in the bed file. Optionally

outputs a summary file which contains every conservation value seen
convolve:perform convolution of probabilities
countChars:Count the number of occurences of a particular char
crTreeIndexBed:Create an index for a bed file.
crTreeSearchBed:Search a crTree indexed bed file and print all items that overlap query.
ctgFaToFa:Convert from one big file with all NT contigs to one contig per file.
ctgToChromFa:convert contig level fa files to chromosome level
dbFindFieldsWith:Look through database and find fields that have elements matching a certain regular expression in the first N rows.
dbSnoop:Produce an overview of a database.
dbTrash:drop tables from a database older than specified N hours
detab:remove tabs from program
dnaMotifFind:Locate preexisting motifs in DNA sequence
eisenInput:Create input for Eisen-style cluster program
emblMatrixToMotif:Convert transfac matrix in EMBL format to dnaMotif
embossToPsl:Convert EMBOSS pair alignments to PSL format
endsInLf:Check that last letter in files is end of line
estLibStats:Calculate some stats on EST libraries given file from polyInfo
estOrient:estOrient [options] db estTable outPsl
exonAli:This program aligns cDNA with genomic sequence. Usage:

exonAli named output cdnaName(s)

exonAli in output listFile
expToRna:Make a little two column table that associates rnaClusters with expression info
faAlign:Align two fasta files
faCmp:Compare two .fa files
faCount:count base statistics and CpGs in FA files.
faFilter:Filter fa records, selecting ones that match the specified conditions
faFilterN:Get rid of sequences with too many N's
faFlyBaseToUcsc:Convert Flybase peptide fasta file to UCSC format
faFrag:Extract a piece of DNA from a .fa file.
faGapLocs:report location of gaps and sequences in a FASTA file
faGapSizes:report on gap size counts/statistics
faNcbiToUcsc:Convert FA file from NCBI to UCSC format.
faNoise:Add noise to .fa file
faOneRecord:Extract a single record from a .FA file
faPolyASizes:get poly A sizes
faRandomize:Program to create random fasta records using

same base frequency as seen in original fasta records.

Use optional -seed flag to specify seed for random number
faRc:Reverse complement a FA file
faSimplify:Simplify fasta record headers
faSize:print total base count in fa files.
faSomeRecords:Extract multiple fa records
faSplit:Split an fa file into several files.
faToFastq:Convert fa to fastq format, just faking quality values.
faToNib:Convert from .fa to .nib format
faToTab:convert fa file to tab separated file
faToTwoBit:Convert DNA from fasta to 2bit format
faTrans:Translate DNA .fa file to peptide
faTrimPolyA:trim poly-A tails
faTrimRead:trim reads based on qual scores - change low scoring bases to N's
fakeFinContigs:Fake up contigs for a finished chromosome
fakeOut:fake a RepeatMasker .out file based on a N's in .fa file
fastqToFa:Convert from fastq to fasta format.
fatont4:fato4nt - a program to convert .fa files to .4nt files
featureBits:Correlate tables via bitmap projections.
ffaToFa:ffaToFa convert Greg Schuler .ffa fasta files to UCSC .fa fasta files
findMotif:find specified motif in sequence
findStanAlignments:takes a stanford microarray experiment file and

tries to look up an alignment for the relevant clone in the database.

Starts by trying to look up the longest genbank clone from image id,
fixCr:strip <CR>s from ends of lines
fixHarbisonMotifs:Trim motifs that have beginning or ending columns that are degenerate.
fqToQa:convert from fq format with one big file to format with one file per clone.
fqToQac:convert from fq format with one big file to compressed format with one file per clone.
fragPart:get part of a fragment's sequence
gadPos: generate genomic positions for GAD entries
gapSplit:split sequence on gaps of size >= N
gapToLift:create lift file from gap table(s)
gb2cdi:convert GeneBank (GB) files to .fa and cDna Info (CDI) file.
gbGetEntries:retrieve records from a GenBank flat file.
gbOneAcc:retrieve one or a few records from a GenBank flat file.
gbSeqCheck:check that extFile references in gbSeq table are valid
gbToFaRa:Convert GenBank flat format file to an fa file containing

the sequence data, an ra file containing other relevant info and

a ta file containing summary statistics.
gbtofa:gbtofa converts from GeneBank to fa format.
gcForBed:Calculate g/c percentage and other stats for regions covered by bed
genePredCheck:validate genePred files or tables
genePredHisto:wrong number of arguments get data for generating histograms from a genePred file.
genePredSingleCover:wrong # args create single-coverage genePred files
genePredToFakePsl:Create a psl of fake-mRNA aligned to gene-preds from a file or table.
genePredToGtf:Convert genePred table or file to gtf.
genePredToMafFrames:wrong # args create mafFrames tables from a genePreds
genePredToPsl:Program to create fake psl alignments from genePred records. Originally designed for use with altSplice.
geniegff:makes up a gdf file from Genie gene predictions
getChroms:print chrom names
getFeatDna:Get dna for a type of feature
getRna:Get mrna for GenBank or RefSeq sequences found in a database
getRnaPred:Get virtual RNA for gene predictions
gfClient:gfClient v. 34x5 - A client for the genomic finding program that produces a .psl file
gfPcr:In silico PCR version 34x5 using gfServer index.
gfServer:gfServer v 34x5 - Make a server to quickly find where DNA occurs in genome.

To set up a server:

gfServer start host port file(s)
gff3ToGenePred:convert a GFF3 file to a genePred file
gffPeek:Look at a gff file and report some basic stats
gffgenes:creates files that store extents of genes for intronerator
gmtime:convert unix timestamp to date string
gpStats:Figure out some stats on the golden path.
gpToGtf:Convert gp table to GTF
gpcrParser:Create xml files for gpcr snakeplots.
groupSamples:Group samples together into one sample.

Samples must be sorted by chromosome position (you can

use bedSort first if they are not).
gsBig:Run Genscan on big input and produce GTF files and other parsed output
gtfToGenePred:convert a GTF file to a genePred
hapmapPhaseIIISummary:Make hapmapPhaseIIISummary.bed from hapmap*.bed.
headRest:Return all *but* the first N lines of a file.
hgAddLiftOverChain:Add a liftOver chain to the central database
hgAvidShortBed:Convert short form of AVID alignments to BED
hgBbiDbLink:Add table that just contains a pointer to a bbiFile to database. This program is used to add bigWigs and bigBeds.
hgBioCyc:bioCyc - Creates bioCycPathway.tab for Known Genes to link to SRI BioCyc pathways
hgCeOrfToGene:Make orfToGene table for C.elegans from GENE_DUMPS/gene_names.txt
hgChroms:print chromosomes for a genome.
hgClonePos:create clonePos table in browser database
hgClusterGenes:Cluster overlapping gene predictions
hgCountAlign:count overlaping or non-overlaping windows in an alignment.
hgCtgPos:Store contig positions ( from lift files ) in database.
hgDeleteChrom:output SQL commands to delete a chrom from the database
hgDropSplitTable:Drop a table, or drop all tables in a split table
hgEmblProtLinks:Parse EMBL flat file into protein link table
hgExonerate:Convert Exonerate modified GFF files to BED format and load in database.
hgExpDistance:Create table that measures expression distance between pairs
hgExperiment:Load data from a BED of region positions, an experiment file containing <name> [<description>]
hgExtFileCheck:check extFile or gbExtFile tables against file system
hgFakeAgp:Create fake AGP file by looking at N's
hgFiberglass:Turn Fiberglass Annotations into a BED and load into database
hgFindSpec:Create hgFindSpec table from trackDb.ra files.
hgFlyBase:Parse FlyBase genes.txt file and turn it into a couple of tables
hgGcPercent:Calculate GC Percentage in 20kb windows
hgGeneBands:Find bands for all genes
hgGenericMicroarray:Load generic microarray file into database. A generic microarray file has the following format:
hgGetAnn:get chromosome annotation rows from database tables using browser-style position specification.
hgGnfMicroarray:Load data from (2003-style) GNF Affy Microarrays
hgGoAssociation:Load bits we care about in GO association table
hgGoldGapGl:Put chromosome .agp and .gl files into browser database.
hgJaxQtl:generate bed file for jaxQTL3 table

using the table jaxQtlRaw as input

output file is jaxQTL3.tab.
hgKegg:creates keggPathway.tab and keggMapDesc.tab files for KG links to KEGG Pathway Mapusage:

hgKegg xxxx

xxxx is the genome database name
hgKegg2:creates keggPathway.tab and keggMapDesc.tab files for KG links to KEGG Pathway Mapusage:

hgKegg2 kgTempDb roDb

kgTempDb is the KG build temp database name
hgKegg3:creates keggPathway.tab and keggMapDesc.tab files for KG links to KEGG Pathway Mapusage:

hgKegg3 kgTempDb roDb

kgTempDb is the KG build temp database name
hgKgGetText:Get text from known genes into a file.

The file will be line oriented with the known gene ID as

the first word, and the rest of the word being a conglomaration
hgKgMrna:Load mRNA alignments and other info into refGene tables into a TEMPORARY database to build Known Genes track.
hgKnownGeneList:Generate Known Genes List HTML pages to be indexed by Google
hgKnownMore:Create the knownMore table from a variety of sources.
hgKnownToSuper:Load knownToSuperfamily table
hgLoadBed:Load a generic bed file into database
hgLoadBlastTab:Load blast table into database
hgLoadChain:Load a generic Chain file into database
hgLoadChromGraph:Load up chromosome graph.
hgLoadEranModules:Load regulatory modules from Eran Segal
hgLoadGap:Load gap table from AGP-style file containing only gaps
hgLoadGenePred:wrong # args Load up a mySQL database genePred table
hgLoadItemAttr:load an itemAttr table
hgLoadMaf:Load a maf file index into the database
hgLoadMafFrames:wrong # args load an mafFrames table
hgLoadNet:Load a generic net file into database
hgLoadNetDist:GS loader for interaction network path lengths.
hgLoadOut:load RepeatMasker .out files into database
hgLoadPsl:Load up a mySQL database with psl alignment tables
hgLoadRnaFold:Load a directory full of RNA fold files into database
hgLoadSample:Load a sample 9 (wiggle) file into database
hgLoadSeq:load browser database with sequence file info.
hgLoadSqlTab:Load table into database from SQL and text files.
hgLoadWiggle:Load a wiggle track definition into database
hgLsSnpPdbLoad:fetch data from LS-SNP/PDB mysql server or load an lsSnpPdb format table or file
hgMapMicroarray:Make mapped version of microarray data, merging psl in.
hgMapToGene:Map a track to a genePred track.
hgMapViaSwissProt:Make table that maps to external database via SwissProt
hgMedianMicroarray:Create a copy of microarray database that contains the median value of replicas
hgMrnaRefseq:creates xref data between mRNAand RefSeq from LocusLink data contained in 2 tables from a temporary DBusage:

hgMrnaRefseq xxxx

xxxx is the genome database name
hgNearTest:Test hgNear web page
hgNetDist:GS loader for gene/protein interaction network distances.
hgNibSeq:convert DNA to nibble-a-base and store location in database
hgPepPred:Load peptide predictions from Ensembl or Genie
hgPhMouse:Load phMouse track
hgProtIdToGenePred:Add proteinID column to genePrediction
hgRatioMicroarray:Create a ratio form of microarray data.
hgRenameSplitTable:Rename a table, or rename all tables in a split table
hgRnaGenes:Turn RNA genes from GFF into database format (BED variant)
hgSanger20:Load extra info from Sanger Chromosome 20 annotations.
hgSanger22:Load up database with Sanger 22 annotations
hgSelect:select from genome tables, handling split tables and bin column
hgSgdGff3:Parse out SGD gff3 file into components
hgSgdGfp:Parse localization files from SGD and Load Database
hgSgdPep:Parse yeast protein fasta files into format we can load
hgSoftPromoter:Slap Softberry promoter file into database.
hgSoftberryHom:Make table storing Softberry protein homology information
hgSpeciesRna:Create fasta file with RNA from one species
hgStanfordMicroarray:Load up from Stanford Microarray Database files
hgStsAlias:Make table of STS aliases
hgStsMarkers:Load STS markers into database
hgSuperfam:Generate supfamily table for the Superfamily track.
hgTablesTest:Test hgTables web page
hgTpf:Make TPF table
hgTraceInfo:import subset of mouse trace ancillary information parsed from FASTA files
hgTrackDb:Create trackDb table from text files. Note that the browser supports multiple trackDb tables, usually
hgTracksRandom:November 09, 2009 11:50 Time default view for random position of default genome
hgWaba:load Waba alignments into database
hgWiggle:fetch wiggle data from data base or file
hgWormLinks:Create table that links worm ORF name to description

and SwissProt. This works on a WormBase dump, in Ace format

I believe, from Lincoln Stein.
hgYeastRegCode:Load files from the regulatory code paper (large scale CHIP-CHIP on yeast) into database
hgsql:Execute some sql code using passwords in .hg.conf
hgsqlLocal:Execute some sql code using localDb.XXX in .hg.conf
hgsqladmin:Wrapper around mysqladmin using passwords in .hg.conf
hgsqldump:Execute mysqldump using passwords from .hg.conf
hgsqldumpLocal:Execute mysqldump using passwords from .hg.conf
hgsqlimport:Execute mysqlimport using passwords from .hg.conf
hmmPfamToTab:Convert hmmPfam output to something simple and tab-delimited.
hprdP2p:Create hprd.p2p tab file using HPRD flat file for input to hgNetDist
htmlCheck:Do a little reading and verification of html file
htmlPics:create an html file from a list of pictures

usage

htmlPics picFile(s)
indexfa:This program makes an index file for a .fa file
indexgl:This program makes an index file for a .gl file
intronEnds:Gather stats on intron ends.
introns:Introns - finds the introns in a file and writes them to gff.
iriToControlTable:Convert improbizer run to simple list of control scores
iriToDnaMotif:Convert improbRunInfo to dnaMotif
isPcr:Standalone v 34x5 In-Situ PCR Program
ixIxx:Create indices for simple line-oriented file of format <symbol> <free text>
ixali:This program makes a name index file for an .ali file
ixword1:This program makes an index file for text file, indexing the first word of each line.
ixword3:This program makes an index file for text file, indexing the third word of each line.
jkUniq:remove duplicate lines from file. Lines need not be next to each other (plain Unix uniq works for that)
joinableFields:Return list of good join targets for a table
joinerCheck:Parse and check joiner file
kgAliasKgXref:create gene alias .tab file usage:

kgAliasKgXref xxxx

xxxx is genome database name
kgAliasM:create gene alias (mRNA part) .tab files usage:

kgAliasM xxxx yyyy

xxxx is genome database name
kgAliasP:create gene alias (protein part) .tab files usage:

kgAliasM xxxx yyyy zzzz

xxxx is genome database name
kgAliasRefseq:create gene alias .tab file usage:

kgAliasRefseq xxxx

xxxx is genome database name
kgCheck:from gene candidates, go through various criteria and keep the ones that pass the criteria
kgGetCds:create a gene candidate table with CDS info
kgGetPep:generate FASTA format protein sequence file to be used for Known Genes track build.
kgPepMrna:generate new .tab files with unused mRNA and protein sequences from known genes db tables removed.usage:

kgPepMrna tempKgDb roDb YYMMDD

tempKGDb is the temp KG build database name
kgPick:select the best repersentative mRNA/protein pair
kgProtAlias:create protein alias .tab files usage:

kgProtAlias xxxx yyyy

xxxx is genome database name
kgProtAliasNCBI:create gene alias (mRNA part) .tab files usage:

kgProtAliasNCBI <DB> <RO_DB>

<DB> is knownGene DB under construction
kgPutBack:from gene candidates, go through various criteria and keep the ones that pass the criteria
kgXref:create Known Gene cross reference table kgXref.tab file.usage:

kgXref <db> <proteinsYYMMDD> <ro_db>

<db> is known Genes database under construction
kgXref2:create new Known Gene cross reference table kgXref2.tab file.usage:

kgXref2 <tmpDb> <YYMMDD> <roDb>

<tmpDb> is temp KG database under construction
knownToHprd:Create knownToHprd table using HPRD flat file and kgXref
knownToVisiGene:Create knownToVisiGene table by riffling through various other knownTo tables
knownVsBlat:Categorize BLAT mouse hits to known genes
kvsSummary:Summarize output of a bunch of knownVsBlats
lavToAxt:Convert blastz lav file to an axt file (which includes sequence)
lavToPsl:Convert blastz lav to psl format
ldHgGene:load database with gene predictions from a gff file.
lfsOverlap:remove overlapping records from lfs file and retain the best

scoring lfs record for each set of overlapping records.

If scores are equal, the first record found is retained
libScan:Scan libraries to help find g' capped ones
liftAcross:convert one coordinate system to another, no overlapping items
liftAgp: Program to lift tracks that have nearly the same .agp file,

but slightly different. Initially designed for chr21 and chr22 which

are starting to accumulate ticky-tacky changes. Currently works for files
liftFrags:This program lifts annotations on clone fragments to FPC contig coordinates
liftOver:Move annotations from one assembly to another
liftOverMerge:Merge multiple regions in BED 5 files generated by liftOver -multiple
liftPromoHits:Lift motif hits from promoter to chromosome coordinates
liftUp:change coordinates of .psl, .agp, .gap, .gl, .out, .gff, .gtf .bscore

.tab .gdup .axt .chain .net, genePred, .wab, .bed, or .bed8 files to parent

coordinate system.
lineCount:Count lines in a file
linesToRa:generate .ra format from lines with pipe-separated fields
localtime:convert unix timestamp to date string
mafAddIRows:add 'i' rows to a maf
mafAddQRows:Add quality data to a maf
mafCoverage:Analyse coverage by maf files - chromosome by chromosome and genome-wide.
mafFetch:get overlapping records from an MAF using an index table
mafFilter:Filter out maf files. Output goes to standard out
mafFrag:Extract maf sequences for a region from database
mafFrags:Collect MAFs from regions specified in a 6 column bed file
mafGene:output protein alignments using maf and genePred
mafMeFirst:Move component to top if it is one of the named ones.

Useful in conjunction with mafFrags when you don't want the one with

the gene name to be in the middle.
mafOrder:order components within a maf file
mafRanges:Extract ranges of target (or query) coverage from maf and output as BED 3 (e.g. for processing by featureBits).
mafSpeciesList:Scan maf and output all species used in it.
mafSpeciesSubset:Extract a maf that just has a subset of species.
mafSplit:Split multiple alignment files
mafSplitPos:Pick positions to split multiple alignment input files
mafToAxt:Convert from maf to axt format
mafToPsl:Convert maf to psl format
mafsInRegion:Extract MAFS in a genomic region
makeTableDescriptions:Add table descriptions to database.
makepgo:Make Predicted Gene Offset files. One for each chromosome.
maskOutFa:Produce a masked .fa file given an unmasked .fa and a RepeatMasker .out file, or a .bed file to mask on.
maxTranscriptomeExps:cycle through a list of of affy transcriptome experiments and select the max for each position.
mdToNcbiLift:Convert seq_contig.md file to ncbi.lft
mgcFastaForBed:Take a bed file and return a fasta file with exons uppercase and introns lowercase.
mktime:convert date string to unix timestamp
moresyn:find more gene/ORF synonyms
motifLogo:Make a sequence logo out of a motif.
motifSig:Combine info from multiple control runs and main improbizer run
mousePoster:Search database info for making foldout
mrnaToGene:convert PSL alignments of mRNAs to gene annotations
netChainSubset:Create chain file with subset of chains that appear in the net
netClass:Add classification info to net
netFilter:Filter out parts of net. What passes

filter goes to standard output. Note a net is a

recursive data structure. If a parent fails to pass
netSplit:Split a genome net file into chromosome net files
netStats:Gather statistics on net
netSyntenic:Add synteny info to net.
netToAxt:Convert net (and chain) to axt.
netToBed:Convert target coverage of net to a bed file.
netToBedWithId:Convert net (and chain) to bed with base identity in score.
newProg:make a new C source skeleton.
nibFrag:Extract part of a nib file as .fa (all bases/gaps lower case by default)
nibSize:print size of nibs
nibbImageProbes:Collect image probes for NIBB Xenopus Laevis in-situs
nibbNameFix:Regularize format of NIBB sequence names
nibbParseImageDir:Look through nibb image directory and allowing for typos and the like create a table that maps a file name to clone name, developmental stage, and view of body part
nibbPrepImages:Set up NIBB frog images for VisiGene virtual microscope - copying them to a directory and makeing up pyramid scheme.
normalizeSampleFile:normalizeSampleFiles - calculates average value over a series of

sample files and sets the average of each sample file to the global

average. Optionally will also group together samples into larger groups.
nt4Frag:Extract a piece of a .nt4 file to .fa format
oligoMatch:find perfect matches in sequence.
orf:Find orf for cDNAs
orfStats:Collect stats on orfs
orthoEvaluate:Evaluate the coding potential of a bed.

(version: .c,v 1.13 2008/09/03 19:20:51 markd )

-help -- Display this message.
orthoMap:Map items from one organism to another. Must

specify one type of item using the -itemFile or -itemTable

flags. OrthoMap simply maps over the genomic coordinates discarding
orthoPickIntron:Pick best intron from orthoEval.

(version: 1.8 2008/09/03 19:20:52 markd )

-help -- Display this message.
orthoSplice:program to compare splicing in different organisms

initially human and mouse as they both have nice EST and cDNA data

still working out algorithm but options are:
orthologBySynteny:Find syntenic location for a list of gene predictions on a single chromosome
overlapSelect:wrong # args: overlapSelect [options] selectFile inFile outFile Select records based on overlapping chromosome ranges. The ranges are
patCount:counts up the number of occurences of each

oligo of a fixed size (up to 13) in input. Writes out

all patterns that are overrepresented by at least factor
pbCalDist:pbCalDist- Create tab delimited data files to be used by Proteome Browser stamps.
pbCalDistGlobal:pbCalDistGlobal- Create tab delimited data files to be used by Proteome Browser stamps.
pbCalPi:Calculate pI values from a list of protein IDs
pbCalResStd:pbCalResStd calculates the avg frequency and standard deviation of every AA residues of the proteins in a specific genome
pbCalResStdGlobal:pbCalResStd calculates the avg frequency and standard deviation of every AA residues of the proteins in a protein database
pbHgnc:process HGNC data
pepPredToFa:Convert a pepPred table to fasta format
pfamXref:create pfam xref .tab file usage:

pfamXref pn pfamInput pfamOutput pfamXref

pn is protein database name
phToPsl:Convert from Pattern Hunter to PSL format
polyInfo:Collect info on polyAdenylation signals etc
positionalTblCheck:check that positional tables are sorted
promoSeqFromCluster:Get promoter regions from cluster
pslCDnaFilter:Filter cDNA alignments in psl format. Filtering criteria are
pslCat:concatenate psl files
pslCheck:validate PSL files
pslCoverage:estimate coverage from alignments.usage: pslCoverage in.sizes in.psl minPercentId endTrim out.cov misAsm.out
pslDiff:Compare queries in two or more psl files
pslDropOverlap:deletes all overlapping self alignments.
pslFilter:filter out psl file pslFilter in.psl out.psl
pslGlue:reduce a psl mRNA alignment file to only the components that might be involved in gluing
pslHisto:pslHisto [options] what inPsl outHisto
pslHitPercent:Figure out percentage of reads in FA file that hit.
pslIntronsOnly:Filter psl files to only include those with introns
pslMap:map PSLs alignments to new targets using alignments of the old target to the new target. Given inPsl and mapPsl, where
pslMrnaCover:Make histogram of coverage percentage of mRNA in psl.
pslPartition:split PSL files into non-overlapping sets
pslPretty:Convert PSL to human readable output
pslRecalcMatch:Recalculate match,mismatch,repMatch columns in psl file.

This can be useful if the psl went through pslMap, or if you've added

lower-case repeat masking after the fact
pslReps:analyse repeats and generate genome wide best alignments from a sorted set of local alignments
pslSelect:select records from a PSL file.
pslSimp:create simplified version of psl file.
pslSort:merge and sort psCluster .psl output files
pslSortAcc:sort pslSort .psl output file by accession Make one output .psl file per accession.
pslSplitOnTarget:Split psl files into one per target.
pslStats:collect statistics from a psl file.
pslSwap:wrong # args: pslSwap [options] inPsl outPsl
pslToBed:pslToBed: tranform a psl format file to a bed format file.
pslToPslx:Convert from psl to pslx format, which includes sequences
pslToXa:Convert from psl to xa alignment format
pslUnpile:Removes huge piles of alignments from sorted psl files (due to unmasked repeats presumably).
pslxToFa:convert pslx (with sequence) to fasta file
qaToQac:convert from uncompressed to compressed quality score format.
qacAgpLift:Use AGP to combine per-scaffold qac into per-chrom qac.
qacToQa:convert from compressed to uncompressed quality score format.
qacToWig:convert from compressed quality score format to wiggle format.
raToCds:Extract CDS positions from ra file
raToLines:Output .ra file stanzas as single lines, with pipe-separated fields.
raToTab:Convert ra file to table.
randomLines:Pick out random lines from file
refiAli:This program turns rough alignments into fine ones.
refreshNamedSessionCustomTracks:refreshNamedSessionCustomTracks -- scan central database's namedSessionDb

contents for custom tracks and touch any that are found, to prevent

them from being removed by the custom track cleanup process.
regionPicker:Code to pick regions to annotate deeply.

Stratifies genome based on mouse non-transcribed homology

and spliced EST density.
relPairs:extract pairs from a big pair list file that actually occur in a .psl file
reviewIndexes:check indexes
reviewSanity:Look through sanity files and make sure things are ok.
rikenBestInCluster:Find best looking in Riken cluster
rmFaDups:rmFaDup - remove duplicate records in FA file

usage

rmFaDup oldName.fa newName.fa
rmKGPepMrna:generate new .tab files with unused mRNA and protein sequences from known genes db tables removed.usage:

rmKGPepMrna xxxx yyyy

xxxx is the genome database name
rnaFoldBig:Run RNAfold repeatedly
rowsToCols:Convert rows to columns and vice versa in a text file.
safePush:Push database tables from one machine to another. This is a

little more careful than mypush. It should be run on the machine that is

the source of the data
samHit:reads the SAM output .rdb file and produce .tab data for the protHomolog table. usage:

samHit proteinId rdbFN

proteinId is the protein ID
sanger22gtf:Convert Sanger chromosome 22 annotations to gtf
scaffoldFaToAgp:generate an AGP file, gap file, and lift file from a scaffold FA file.
scaleSampleFiles:scale all of the scores in a file by a scale factor.
scanRa:scan through ra files for info.
scopCollapse:Convert SCOP model to SCOP ID. Also make id/name converter file.
scrambleFa:scramble the order of records in an fa file
seqCheck:check that extFile references in seq table are valid
sequenceForBed:Writes sequence for beds to a fasta file. Requires database access.
sim4big:A wrapper for Sim4 that runs it repeatedly on a multi-sequence .fa file
simpleChain:Stitch psls into chains
sizeof:type bytes bits

char 1 8

unsigned char 1 8
snpException:Get exceptions to a snp invariant rule.
snpMaskAddInsertions:snpMaskAddInsertions -- Print genomic sequence plus insertion SNPs.
snpMaskCutDeletions:snpMaskCutDeletions -- Print genomic sequence with deletion SNPs removed.
snpMaskSingle:print sequence using IUPAC ambiguous nucleotide codes for single base substitutions
snpNcbiToUcsc:Reformat NCBI SNP field values into UCSC, and flag exceptions.
snpValid:Validate snp alignments
sortFilt:merge, sort, and filter patSpace .hit output.
spLoadPsiBlast:load swissprot PSL-BLAST table. This loads the results of all-against-all PSI-BLAST on Swissprot, which
spLoadRankProp:load swissprot rankProp table.
spOrganism:Extract taxonomy data from SWISS-PROT data file and produce a .tab file of SWISS-PROT display ID/NCBI taxonomy ID pairs.
spTest:Test out sp library.
spToDb:Create a relational database out of SwissProt/trEMBL flat files
spToProteins:spToProteins- Create tab delimited data files from spxxxx database for proteinsxxxx database.
spToProteinsVar:spToProteinsVar- Create tab delimited data file, spXrefVar.tab, from spYYMMDD database for proteinsYYMMDD database.
spToSpXref2:spToSpXref2- Create tab delimited data files for the spXref2 table in uniProt (spxxxxxx) database.
spXref3:get xref data of proteins in SWISS-PROT, TrEMBL, TrEMBL-NEW and HUGO. Output is placed in file spXref3.tab.
spacedToTab:Convert fixed width space separated fields to tab separated Note this requires two passes, so it can't be done on a pipe
spideyToPsl:Convert NCBI spidey pair alignments to PSL format
splitFa:split a big FA file into smaller ones.
splitFile:Split up a file
splitFileByColumn:Split text input into files named by column value
splitSim:Simulate gapless distribution size
spm3:from all mRNAs in a genome (e.g. rn3) referenced by SWISS-PROT generate a list of proteins and a list of protein/mRNA pairs.
spm6:generates sorted.lis and knownGene0.tab for further duplicates processing
spm7:Create sorted list of mRNA-SP data file for further duplicates processing
sqlToXml:dump out all or part of a relational database to XML, guided by a dump specification. See sqlToXml.doc for additional information.
stToXao:make indices into st file, one for each chromosome.
stageMultiz:Stage input directory for Webb's multiple aligner
stanToBedAndExpRecs:takes a pslFile of alignments and a list of stanfords

expression data files and converts them into a bed file with the scores and experiment

ids. Also creates a corresponding file of expRecords which idicate what the
stitchea:joins together EA files into one big one, throwing out overlaps. Will complain if there's any missing data.
stitcher:third pass of genomic/genomic alignment. Stitches together 2000x5000 base 7-state alignments into longer contigs.
stringify:Convert file to C strings
subChar:Substitute one character for another throughout a file.
subColumn:Substitute one column in a tab-separated file.
subs:Subs - a utility to perform massive string substitutions on source
subsetAxt:Rescore alignments and output those over threshold
subsetTraces:Build subset of mouse traces that actually align
tableSum:Summarize a table somehow
tailLines:add tail to each line of file
testSearch:test search functionality.
textHist2:Make two dimensional histogram table out of a list of 2-D points, one per line.
textHistogram:Make a histogram in ascii
tfbsConsSort:a utility to sort tfbsCons files before loading them
tickToDate:Convert seconds since 1970 to time and date
timePosTable:time access to a positional table
toDev64:A program that copies data from the old hgwdev database to the new hgwdev database.
toLower:Convert upper case to lower case in file. Leave other chars alone
toUpper:Convert lower case to upper case in file. Leave other chars alone
trackDbRaFormat:Format trackDb.ra canonically.
trackOverlap:trackOverlap- Overlap how much of a track is overlapped by

other tracks and vice versa. This is done by correlating

series of bitmap projections (i.e. featureBits multiple times).
trfBig:Mask tandem repeats on a big sequence file.
twinOrf:Predict open reading frame in cDNA given a cross species alignment
twinOrf2:Predict open reading frame in cDNA given a cross species alignment
twinOrf3:Predict open reading frame in cDNA given a cross species alignment
twinOrfStats:Collect stats on refSeq cDNAs aligned to another species via axtForEst
twinOrfStats2:Collect stats on refSeq cDNAs aligned to another species via axtForEst
twinOrfStats3:Collect stats on refSeq cDNAs aligned to another species via axtForEst
twoBitInfo:get information about sequences in a .2bit file
twoBitMask:apply masking to a .2bit file, creating a new .2bit file
twoBitToFa:Convert all or part of .2bit file to fasta
txAbFragFind:Search database for what are probably antibody fragments.
txBedToGraph:Cluster together beds from txPslToBed. Make transcript graphs out of clusters.

txBedToGraph in1.bed in1Type [in2.bed in2type ...] out.txg

options:
txCdsBadBed:Create a bed file with regions that don't really have CDS, but that might look like it.
txCdsCluster:Cluster transcripts purely in the CDS regions, only putting things together if they share same frame as well as a genomic region.
txCdsEvFromBed:Make a cds evidence file (.tce) from an existing bed file. Used mostly in transferring CCDS coding regions currently.
txCdsEvFromBorf:Convert borfBig format to txCdsEvidence (tce) in an effort to annotate the coding regions.
txCdsEvFromProtein:Convert transcript/protein alignments and other evidence into a transcript CDS evidence (tce) file
txCdsEvFromRna:Convert transcript/rna alignments, genbank CDS file, and other info to transcript CDS evidence (tce) file.
txCdsGoodBed:Create positive example training set for SVM. This is

based on the refSeq reviewed genes, but we fragment a certain percentage

of them so as not to end up with a SVM that *requires* a complete
txCdsOrfInfo:Given a sequence and a putative ORF, calculate some basic information on it.
txCdsOrtho:Figure out how CDS looks in other organisms.
txCdsPick:Pick best CDS if any for transcript given evidence.
txCdsPredict:Somewhat simple-minded ORF predictor using a weighting scheme.
txCdsRaExceptions:Mine exceptional things like selenocysteine out of genbank ra file.
txCdsRefBestEvOnly:Go through a cdsEvidence file, and extract only the bits that refer to the native orf for a RefSeqReviewed transcript.
txCdsRepick:OBSOLETE program. The scheme this implemented ended up

not working so well. It's still in the source tree because it may contain

some useful routines for other programs
txCdsSuspect:Flag cases where the CDS prediction is very suspicious, including

CDSs that lie entirely in an intron or in the 3' UTR of another, better looking

transcript.
txCdsSvmInput:Create input for svm_light, a nice support vector machine classifier.
txCdsToGene:Convert transcript bed and best cdsEvidence to genePred and protein sequence.
txCdsWeed:Remove bad CDSs including NMD candidates
txGeneAccession:Assign permanent accession number to genes.
txGeneAlias:Make kgAlias and kgProtAlias tables.
txGeneAltProt:Figure out statistics on number of alternative proteins produced by alt-splicing.
txGeneCanonical:Pick a canonical version of each gene - that is the form

to use when just interested in a single splicing varient. Produces final

transcript clusters as well.
txGeneCdsMap:Create mapping between CDS region of gene and genome. This is used to build the exon track in the proteome browser.
txGeneColor:Figure out color to draw gene in.
txGeneExplainUpdate1:Make table explaining correspondence between older known genes and ucsc genes.
txGeneFromBed:Convert from bed to knownGenes format table (genePred + uniProt ID)
txGeneProtAndRna:Create fasta files with our proteins and transcripts.

These echo RefSeq when gene is based on RefSeq. Otherwise they are taken from

the genome.
txGeneSeparateNoncoding:Separate genes into four piles - coding, non-coding that overlap coding, and independent non-coding.
txGeneXref:Make kgXref type table for genes.
txInfoAssemble:Assemble information from various sources into txInfo table.
txOrtho:Produce list of shared edges between two transcription graphs in two species.
txPslFilter:Do rna/rna filter.
txPslToBed:txPsltoBed - Convert a psl to a bed file by projecting it onto its target sequence. Optionally merge adjacent blocks and trim to splice sites.
txReadRa:Read ra files from genbank and parse out relevant info into some tab-separated files.
txWalk:Walk transcription graph and output transcripts.
txgAddEvidence:Add evidence from a bed file to existing transcript graph.
txgAnalyze:Analyse transcription graph for alt exons, alt 3', alt 5', retained introns, alternative promoters, etc.
txgGoodEdges:Get edges that are above a certain threshold.
txgToAgx:Convert from txg (txGraph) format to agx (altGraphX)
txgToXml:Convert txg to an XML format.
txgTrim:Trim out parts of txGraph that are not of sufficient weight.
udcCleanup:Clean up old unused files in udcCache.
undupFa:rename duplicate records in FA file

usage

undupFa faFile(s)
upper:strip numbers, spaces, and punctuation turn to upper case
utrFa:Get UTRs as fasta files
validateFiles:Validate format of different track input files

Program exits with non-zero status if any errors detected

otherwise exits with zero status
venn:Do venn diagram calculations
wabToSt:Convert WABA output to something Intronerator understands better
weedLines:Selectively remove lines from file
whyConserved:Try and analyse why a particular thing is conserved
wigEncode:convert Wiggle ascii data to binary format
wigTestMaker:Create test wig files.
wigToBigWig:Convert ascii format wig file (in fixedStep, variableStep or bedGraph format) to binary big wig format.
wordLine:chop up words by white space and output them with one word to each line.
xmlCat:Concatenate xml files together, stuffing all records inside a single outer tag.
xmlToSql:Convert XML dump into a fairly normalized relational database

in the form of a directory full of tab-separated files and table

creation SQL. You'll need to run autoDtd on the XML file first to