Kent source utilities
More information about these tools can be found here Kent source utilities.
Last updated: 2009-11-09 - 09 November 2009
aNotB: | List symbols that are in a but not b |
addAveMedScoreToPsls: | Combines unigene pslFile and sage file into bed file |
addCols: | Sum columns in a text file. |
affyPairsToSample: | Takes a 'pairs' format file from the Affy transcriptome
data set and combines it with the Affy offset.txt file to output a 'sample' file which has the contig coordinates of the result. |
agpAllToFaFile: | Convert all sequences in an .agp file to a .fa file |
agpCloneCheck: | Check that have all clones in an agp file (and the right version too) |
agpCloneList: | Make simple list of all clones in agp file to stdout |
agpToFa: | Convert a .agp file to a .fa file |
agpToGl: | Convert AGP file to GL file. Some fakery involved. |
agxToBed: | Utility program that condenses an altGraphX record into a bed record. |
agxToIntronBeds: | Program to output all introns from altGraphX
records as beds. Designed for use in MGC project looking for novel introns from altGraphX records transferred over from mouse. |
agxToTxg: | Convert from old altGraphX format to newer txGraph format. |
ali2alx: | produces an index file for each chromosome into an ali file. |
aliGlue: | tell where a cDNA is located quickly. |
allenCollectSeq: | Collect probe sequences for Allen Brain Atlas from a variety of sources |
altAnalysis: | Analyze the altSplicing in a series of altGraphX's |
altPaths: | Examin altGraphX graphs and discover alternative splicing events and their paths. |
altSplice: | -help -- Display this message.
-db -- Database (i.e. hg15) to load psl records from. -beds -- Coordinate file to base clustering on in bed format. |
altSummary: | Summarize the altSplicing in a series of altGraphX's |
ameme: | find common patterns in DNA
usage ameme good=goodIn.fa [bad=badIn.fa] [numMotifs=2] [background=m1] [maxOcc=2] [motifOutput=fileName] [html=output.html] [gif=output.gif] [rcToo=on] [controlRun=on] [startScanLimit=20] [outputLogo] [constrainer=1] |
apacheMonitor: | check for error 500s in the last minutes |
assessLibs: | Make table that assesses the percentage of library that covers 5' and 3' ends |
autoDtd: | Give this a XML document to look at and it will come up with a DTD to describe it. |
autoSql: | create SQL and C code for permanently storing
a structure in database and loading it back into memory based on a specification file |
autoXml: | Generate structures code and parser for XML file from DTD-like spec |
ave: | Compute average and basic stats |
aveCols: | average together columns |
averagExp: | Average expression data within a cluster |
averageZoomLevels: | takes a sorted sample file and creates averaged
'zoomed-out' summaries for a few different levels. Basic idea is to get the size of a chromosome, divide it by 2000 as that is |
avgTranscriptomeExps: | Averages together replicates of the affy transcriptome data set.
Will skip certain experiments unless directed otherwise as they were not in the original data set. |
axtAndBed: | Intersect an axt with a bed file and output axt. |
axtBest: | Remove second best alignments |
axtCalcMatrix: | Calculate substitution matrix and make indel histogram |
axtChain: | Chain together axt alignments. |
axtDropOverlap: | deletes all overlapping self alignments. |
axtDropSelf: | Drop alignments that just align same thing to itself |
axtFilter: | Filter axt files. Output goes to standard out. |
axtForEst: | Generate file of mouse/human alignments corresponding to MGC EST's |
axtIndex: | build index of axt file |
axtPretty: | Convert axt to more human readable format. |
axtQueryCount: | Count bases covered on each query sequence |
axtRecipBest: | create file for dot plot using recip best |
axtRescore: | Recalculate scores in axt. |
axtSort: | Sort axt files |
axtSplitByTarget: | Split a single axt file into one file per target |
axtSwap: | Swap source and query in an axt file |
axtToBed: | Convert axt alignments to simple bed format |
axtToChain: | Convert axt to chain format |
axtToMaf: | Convert from axt to maf format |
axtToPsl: | Convert axt to psl format |
bedClip: | Remove lines from bed file that refer to off-chromosome places. |
bedCons: | Look at conservation of a BED track vs. a refence (nonredundant) alignment track |
bedCoverage: | Analyse coverage by bed files - chromosome by chromosome and genome-wide. |
bedDown: | Make stuff to find a BED format submission in a new version |
bedExtendRanges: | extend length of entries in bed 6+ data to be at least the given length, taking strand directionality into account. |
bedGraphToBigWig: | Convert a bedGraph program to bigWig. |
bedInGraph: | Program to determine if bed exons are contained in
a splice graph. Original motivation to see which alt-spliced probe sets are also alt in human (in addition to mouse). |
bedIntersect: | Intersect two bed files |
bedItemOverlapCount: | count number of times a base is overlapped by the items in a bed file. Output is bedGraph 4 to stdout. |
bedOrBlocks: | Create a bed that is the union of all blocks of a list of beds. |
bedSort: | Sort a .bed file by chrom,chromStart |
bedSplitOnChrom: | Split bed into a directory with one file per chromosome. |
bedToBigBed: | Convert bed file to bigBed. |
bedToExons: | Split a bed up into individual beds. One for each internal exon. |
bedToFrames: | Makes html files for browsing custom bed track using frames. Use -pad for padding |
bedToGenePred: | Too few arguments: convert bed format files to genePred format |
bedToTxEdges: | Convert a bed file into txEdgeBed, which can be used with txgAddEvidence. |
bedUp: | Load bed submissions after conversion back into new database. |
bedWeedOverlapping: | Filter out beds that overlap a 'weed.bed' file. |
bigBedSummary: | Extract summary information from a bigBed file. |
bigBedToBed: | Convert from bigBed to ascii bed format. |
bigWigInfo: | Print out information about bigWig file. |
bigWigSummary: | Extract summary information from a bigWig file. |
bigWigToBedGraph: | Convert from bigWig to bedGraph format. |
binGood: | convert text format alignment file to binary format |
blastRecipBest: | Pick out just the reciprocal best alignments. |
blastToPsl: | Convert blast alignments to PSLs. |
blat: | Standalone BLAT v. 34x5 fast sequence search command line tool |
blatz: | blatz version 1 - Align dna across species |
blatzClient: | blatzClient version 1 - Ask server to do cross-species DNA alignments and save results. |
blatzServer: | blatzServer version 1 - Set up in-memory server for cross-species DNA alignments |
borfBig: | Run Victor Solovyev's bestorf repeatedly |
bottleneck: | bottleneck v2 - A server that helps slow down hyperactive web robots |
bptForTwoBit: | Create a b+ tree index for a .2bit file. Key is the sequence name. Value is the position of the start of the compressed DNA in the .2bit file. |
bptLookupStringToBits64: | Given a string value look up and return associated 64 bit value if any. |
bptMakeStringToBits64: | Create a B+ tree index with string keys and unsigned 64-bit-integer values. In practice the 64-bit values are often offsets in a file. |
bwana: | do batch coarse alignment of C. briggsae and C. elegans genomes. |
calc: | Little command line calculator |
calcGap: | calculate gap scores |
catDir: | concatenate files in directory to stdout. For those times when too many files for cat to handle. |
catUncomment: | Concatenate input removing lines that start with '#' Output goes to stdout |
ccCp: | copy a file to cluster.usage:
ccCp sourceFile destFile [hostList] This will copy sourceFile to destFile for all machines in |
cdnaOff: | creates sorted offset files that position cDNAs in chromosome. |
chainAntiRepeat: | Get rid of chains that are primarily the results of repeats and degenerate DNA |
chainDbToFile: | translate a chain's db representation back to file |
chainFilter: | Filter chain files. Output goes to standard out. |
chainMergeSort: | Combine sorted files into larger sorted file |
chainNet: | Make alignment nets out of chains |
chainPreNet: | Remove chains that don't have a chance of being netted |
chainSort: | Sort chains. By default sorts by score.
Note this loads all chains into memory, so it is not suitable for large sets. Use chainMergeSort for that |
chainSplit: | Split chains up by target or query sequence |
chainStats: | Stitch psls into chains |
chainStitchId: | Join chain fragments with the same chain ID into a single
chain per ID. Chain fragments must be from same original chain but must not overlap. Chain fragment scores are summed. |
chainSwap: | Swap target and query in chain |
chainToAxt: | Convert from chain to axt file |
chainToPsl: | Convert chain file to psl format |
checkAgpAndFa: | takes a .agp file and .fa file and ensures that they are in synch |
checkCardinality: | reviewIndexes - check indexes |
checkChain: | read all chains and report if duplicate ids |
checkHgFindSpec: | test and describe search specs in hgFindSpec tables. |
checkSgdSync: | Make sure that genes and sequence are in sync for SGD yeast database |
checkTableCoords: | check invariants on genomic coords in table(s). |
checkableBorf: | Convert borfBig orf-finder output to checkable form |
chopFaLines: | Read in FA file with long lines and rewrite it with shorter lines |
chromGraphFromBin: | Convert chromGraph binary to ascii format. |
chromGraphToBin: | Make binary version of chromGraph. |
clusterGenes: | Cluster genes from genePred tracks |
clusterPsl: | Make clusters of mRNA aligments |
clusterRna: | Make clusters of mRNA and ESTs |
consForBed: | Takes a bed file and a conservation file and outputs
conservation scores for each position in the bed file. Optionally outputs a summary file which contains every conservation value seen |
convolve: | perform convolution of probabilities |
countChars: | Count the number of occurences of a particular char |
crTreeIndexBed: | Create an index for a bed file. |
crTreeSearchBed: | Search a crTree indexed bed file and print all items that overlap query. |
ctgFaToFa: | Convert from one big file with all NT contigs to one contig per file. |
ctgToChromFa: | convert contig level fa files to chromosome level |
dbFindFieldsWith: | Look through database and find fields that have elements matching a certain regular expression in the first N rows. |
dbSnoop: | Produce an overview of a database. |
dbTrash: | drop tables from a database older than specified N hours |
detab: | remove tabs from program |
dnaMotifFind: | Locate preexisting motifs in DNA sequence |
eisenInput: | Create input for Eisen-style cluster program |
emblMatrixToMotif: | Convert transfac matrix in EMBL format to dnaMotif |
embossToPsl: | Convert EMBOSS pair alignments to PSL format |
endsInLf: | Check that last letter in files is end of line |
estLibStats: | Calculate some stats on EST libraries given file from polyInfo |
estOrient: | estOrient [options] db estTable outPsl |
exonAli: | This program aligns cDNA with genomic sequence. Usage:
exonAli named output cdnaName(s) exonAli in output listFile |
expToRna: | Make a little two column table that associates rnaClusters with expression info |
faAlign: | Align two fasta files |
faCmp: | Compare two .fa files |
faCount: | count base statistics and CpGs in FA files. |
faFilter: | Filter fa records, selecting ones that match the specified conditions |
faFilterN: | Get rid of sequences with too many N's |
faFlyBaseToUcsc: | Convert Flybase peptide fasta file to UCSC format |
faFrag: | Extract a piece of DNA from a .fa file. |
faGapLocs: | report location of gaps and sequences in a FASTA file |
faGapSizes: | report on gap size counts/statistics |
faNcbiToUcsc: | Convert FA file from NCBI to UCSC format. |
faNoise: | Add noise to .fa file |
faOneRecord: | Extract a single record from a .FA file |
faPolyASizes: | get poly A sizes |
faRandomize: | Program to create random fasta records using
same base frequency as seen in original fasta records. Use optional -seed flag to specify seed for random number |
faRc: | Reverse complement a FA file |
faSimplify: | Simplify fasta record headers |
faSize: | print total base count in fa files. |
faSomeRecords: | Extract multiple fa records |
faSplit: | Split an fa file into several files. |
faToFastq: | Convert fa to fastq format, just faking quality values. |
faToNib: | Convert from .fa to .nib format |
faToTab: | convert fa file to tab separated file |
faToTwoBit: | Convert DNA from fasta to 2bit format |
faTrans: | Translate DNA .fa file to peptide |
faTrimPolyA: | trim poly-A tails |
faTrimRead: | trim reads based on qual scores - change low scoring bases to N's |
fakeFinContigs: | Fake up contigs for a finished chromosome |
fakeOut: | fake a RepeatMasker .out file based on a N's in .fa file |
fastqToFa: | Convert from fastq to fasta format. |
fatont4: | fato4nt - a program to convert .fa files to .4nt files |
featureBits: | Correlate tables via bitmap projections. |
ffaToFa: | ffaToFa convert Greg Schuler .ffa fasta files to UCSC .fa fasta files |
findMotif: | find specified motif in sequence |
findStanAlignments: | takes a stanford microarray experiment file and
tries to look up an alignment for the relevant clone in the database. Starts by trying to look up the longest genbank clone from image id, |
fixCr: | strip <CR>s from ends of lines |
fixHarbisonMotifs: | Trim motifs that have beginning or ending columns that are degenerate. |
fqToQa: | convert from fq format with one big file to format with one file per clone. |
fqToQac: | convert from fq format with one big file to compressed format with one file per clone. |
fragPart: | get part of a fragment's sequence |
gadPos: | generate genomic positions for GAD entries |
gapSplit: | split sequence on gaps of size >= N |
gapToLift: | create lift file from gap table(s) |
gb2cdi: | convert GeneBank (GB) files to .fa and cDna Info (CDI) file. |
gbGetEntries: | retrieve records from a GenBank flat file. |
gbOneAcc: | retrieve one or a few records from a GenBank flat file. |
gbSeqCheck: | check that extFile references in gbSeq table are valid |
gbToFaRa: | Convert GenBank flat format file to an fa file containing
the sequence data, an ra file containing other relevant info and a ta file containing summary statistics. |
gbtofa: | gbtofa converts from GeneBank to fa format. |
gcForBed: | Calculate g/c percentage and other stats for regions covered by bed |
genePredCheck: | validate genePred files or tables |
genePredHisto: | wrong number of arguments get data for generating histograms from a genePred file. |
genePredSingleCover: | wrong # args create single-coverage genePred files |
genePredToFakePsl: | Create a psl of fake-mRNA aligned to gene-preds from a file or table. |
genePredToGtf: | Convert genePred table or file to gtf. |
genePredToMafFrames: | wrong # args create mafFrames tables from a genePreds |
genePredToPsl: | Program to create fake psl alignments from genePred records. Originally designed for use with altSplice. |
geniegff: | makes up a gdf file from Genie gene predictions |
getChroms: | print chrom names |
getFeatDna: | Get dna for a type of feature |
getRna: | Get mrna for GenBank or RefSeq sequences found in a database |
getRnaPred: | Get virtual RNA for gene predictions |
gfClient: | gfClient v. 34x5 - A client for the genomic finding program that produces a .psl file |
gfPcr: | In silico PCR version 34x5 using gfServer index. |
gfServer: | gfServer v 34x5 - Make a server to quickly find where DNA occurs in genome.
To set up a server: gfServer start host port file(s) |
gff3ToGenePred: | convert a GFF3 file to a genePred file |
gffPeek: | Look at a gff file and report some basic stats |
gffgenes: | creates files that store extents of genes for intronerator |
gmtime: | convert unix timestamp to date string |
gpStats: | Figure out some stats on the golden path. |
gpToGtf: | Convert gp table to GTF |
gpcrParser: | Create xml files for gpcr snakeplots. |
groupSamples: | Group samples together into one sample.
Samples must be sorted by chromosome position (you can use bedSort first if they are not). |
gsBig: | Run Genscan on big input and produce GTF files and other parsed output |
gtfToGenePred: | convert a GTF file to a genePred |
hapmapPhaseIIISummary: | Make hapmapPhaseIIISummary.bed from hapmap*.bed. |
headRest: | Return all *but* the first N lines of a file. |
hgAddLiftOverChain: | Add a liftOver chain to the central database |
hgAvidShortBed: | Convert short form of AVID alignments to BED |
hgBbiDbLink: | Add table that just contains a pointer to a bbiFile to database. This program is used to add bigWigs and bigBeds. |
hgBioCyc: | bioCyc - Creates bioCycPathway.tab for Known Genes to link to SRI BioCyc pathways |
hgCeOrfToGene: | Make orfToGene table for C.elegans from GENE_DUMPS/gene_names.txt |
hgChroms: | print chromosomes for a genome. |
hgClonePos: | create clonePos table in browser database |
hgClusterGenes: | Cluster overlapping gene predictions |
hgCountAlign: | count overlaping or non-overlaping windows in an alignment. |
hgCtgPos: | Store contig positions ( from lift files ) in database. |
hgDeleteChrom: | output SQL commands to delete a chrom from the database |
hgDropSplitTable: | Drop a table, or drop all tables in a split table |
hgEmblProtLinks: | Parse EMBL flat file into protein link table |
hgExonerate: | Convert Exonerate modified GFF files to BED format and load in database. |
hgExpDistance: | Create table that measures expression distance between pairs |
hgExperiment: | Load data from a BED of region positions, an experiment file containing <name> [<description>] |
hgExtFileCheck: | check extFile or gbExtFile tables against file system |
hgFakeAgp: | Create fake AGP file by looking at N's |
hgFiberglass: | Turn Fiberglass Annotations into a BED and load into database |
hgFindSpec: | Create hgFindSpec table from trackDb.ra files. |
hgFlyBase: | Parse FlyBase genes.txt file and turn it into a couple of tables |
hgGcPercent: | Calculate GC Percentage in 20kb windows |
hgGeneBands: | Find bands for all genes |
hgGenericMicroarray: | Load generic microarray file into database. A generic microarray file has the following format: |
hgGetAnn: | get chromosome annotation rows from database tables using browser-style position specification. |
hgGnfMicroarray: | Load data from (2003-style) GNF Affy Microarrays |
hgGoAssociation: | Load bits we care about in GO association table |
hgGoldGapGl: | Put chromosome .agp and .gl files into browser database. |
hgJaxQtl: | generate bed file for jaxQTL3 table
using the table jaxQtlRaw as input output file is jaxQTL3.tab. |
hgKegg: | creates keggPathway.tab and keggMapDesc.tab files for KG links to KEGG Pathway Mapusage:
hgKegg xxxx xxxx is the genome database name |
hgKegg2: | creates keggPathway.tab and keggMapDesc.tab files for KG links to KEGG Pathway Mapusage:
hgKegg2 kgTempDb roDb kgTempDb is the KG build temp database name |
hgKegg3: | creates keggPathway.tab and keggMapDesc.tab files for KG links to KEGG Pathway Mapusage:
hgKegg3 kgTempDb roDb kgTempDb is the KG build temp database name |
hgKgGetText: | Get text from known genes into a file.
The file will be line oriented with the known gene ID as the first word, and the rest of the word being a conglomaration |
hgKgMrna: | Load mRNA alignments and other info into refGene tables into a TEMPORARY database to build Known Genes track. |
hgKnownGeneList: | Generate Known Genes List HTML pages to be indexed by Google |
hgKnownMore: | Create the knownMore table from a variety of sources. |
hgKnownToSuper: | Load knownToSuperfamily table |
hgLoadBed: | Load a generic bed file into database |
hgLoadBlastTab: | Load blast table into database |
hgLoadChain: | Load a generic Chain file into database |
hgLoadChromGraph: | Load up chromosome graph. |
hgLoadEranModules: | Load regulatory modules from Eran Segal |
hgLoadGap: | Load gap table from AGP-style file containing only gaps |
hgLoadGenePred: | wrong # args Load up a mySQL database genePred table |
hgLoadItemAttr: | load an itemAttr table |
hgLoadMaf: | Load a maf file index into the database |
hgLoadMafFrames: | wrong # args load an mafFrames table |
hgLoadNet: | Load a generic net file into database |
hgLoadNetDist: | GS loader for interaction network path lengths. |
hgLoadOut: | load RepeatMasker .out files into database |
hgLoadPsl: | Load up a mySQL database with psl alignment tables |
hgLoadRnaFold: | Load a directory full of RNA fold files into database |
hgLoadSample: | Load a sample 9 (wiggle) file into database |
hgLoadSeq: | load browser database with sequence file info. |
hgLoadSqlTab: | Load table into database from SQL and text files. |
hgLoadWiggle: | Load a wiggle track definition into database |
hgLsSnpPdbLoad: | fetch data from LS-SNP/PDB mysql server or load an lsSnpPdb format table or file |
hgMapMicroarray: | Make mapped version of microarray data, merging psl in. |
hgMapToGene: | Map a track to a genePred track. |
hgMapViaSwissProt: | Make table that maps to external database via SwissProt |
hgMedianMicroarray: | Create a copy of microarray database that contains the median value of replicas |
hgMrnaRefseq: | creates xref data between mRNAand RefSeq from LocusLink data contained in 2 tables from a temporary DBusage:
hgMrnaRefseq xxxx xxxx is the genome database name |
hgNearTest: | Test hgNear web page |
hgNetDist: | GS loader for gene/protein interaction network distances. |
hgNibSeq: | convert DNA to nibble-a-base and store location in database |
hgPepPred: | Load peptide predictions from Ensembl or Genie |
hgPhMouse: | Load phMouse track |
hgProtIdToGenePred: | Add proteinID column to genePrediction |
hgRatioMicroarray: | Create a ratio form of microarray data. |
hgRenameSplitTable: | Rename a table, or rename all tables in a split table |
hgRnaGenes: | Turn RNA genes from GFF into database format (BED variant) |
hgSanger20: | Load extra info from Sanger Chromosome 20 annotations. |
hgSanger22: | Load up database with Sanger 22 annotations |
hgSelect: | select from genome tables, handling split tables and bin column |
hgSgdGff3: | Parse out SGD gff3 file into components |
hgSgdGfp: | Parse localization files from SGD and Load Database |
hgSgdPep: | Parse yeast protein fasta files into format we can load |
hgSoftPromoter: | Slap Softberry promoter file into database. |
hgSoftberryHom: | Make table storing Softberry protein homology information |
hgSpeciesRna: | Create fasta file with RNA from one species |
hgStanfordMicroarray: | Load up from Stanford Microarray Database files |
hgStsAlias: | Make table of STS aliases |
hgStsMarkers: | Load STS markers into database |
hgSuperfam: | Generate supfamily table for the Superfamily track. |
hgTablesTest: | Test hgTables web page |
hgTpf: | Make TPF table |
hgTraceInfo: | import subset of mouse trace ancillary information parsed from FASTA files |
hgTrackDb: | Create trackDb table from text files. Note that the browser supports multiple trackDb tables, usually |
hgTracksRandom: | November 09, 2009 11:50 Time default view for random position of default genome |
hgWaba: | load Waba alignments into database |
hgWiggle: | fetch wiggle data from data base or file |
hgWormLinks: | Create table that links worm ORF name to description
and SwissProt. This works on a WormBase dump, in Ace format I believe, from Lincoln Stein. |
hgYeastRegCode: | Load files from the regulatory code paper (large scale CHIP-CHIP on yeast) into database |
hgsql: | Execute some sql code using passwords in .hg.conf |
hgsqlLocal: | Execute some sql code using localDb.XXX in .hg.conf |
hgsqladmin: | Wrapper around mysqladmin using passwords in .hg.conf |
hgsqldump: | Execute mysqldump using passwords from .hg.conf |
hgsqldumpLocal: | Execute mysqldump using passwords from .hg.conf |
hgsqlimport: | Execute mysqlimport using passwords from .hg.conf |
hmmPfamToTab: | Convert hmmPfam output to something simple and tab-delimited. |
hprdP2p: | Create hprd.p2p tab file using HPRD flat file for input to hgNetDist |
htmlCheck: | Do a little reading and verification of html file |
htmlPics: | create an html file from a list of pictures
usage htmlPics picFile(s) |
indexfa: | This program makes an index file for a .fa file |
indexgl: | This program makes an index file for a .gl file |
intronEnds: | Gather stats on intron ends. |
introns: | Introns - finds the introns in a file and writes them to gff. |
iriToControlTable: | Convert improbizer run to simple list of control scores |
iriToDnaMotif: | Convert improbRunInfo to dnaMotif |
isPcr: | Standalone v 34x5 In-Situ PCR Program |
ixIxx: | Create indices for simple line-oriented file of format <symbol> <free text> |
ixali: | This program makes a name index file for an .ali file |
ixword1: | This program makes an index file for text file, indexing the first word of each line. |
ixword3: | This program makes an index file for text file, indexing the third word of each line. |
jkUniq: | remove duplicate lines from file. Lines need not be next to each other (plain Unix uniq works for that) |
joinableFields: | Return list of good join targets for a table |
joinerCheck: | Parse and check joiner file |
kgAliasKgXref: | create gene alias .tab file usage:
kgAliasKgXref xxxx xxxx is genome database name |
kgAliasM: | create gene alias (mRNA part) .tab files usage:
kgAliasM xxxx yyyy xxxx is genome database name |
kgAliasP: | create gene alias (protein part) .tab files usage:
kgAliasM xxxx yyyy zzzz xxxx is genome database name |
kgAliasRefseq: | create gene alias .tab file usage:
kgAliasRefseq xxxx xxxx is genome database name |
kgCheck: | from gene candidates, go through various criteria and keep the ones that pass the criteria |
kgGetCds: | create a gene candidate table with CDS info |
kgGetPep: | generate FASTA format protein sequence file to be used for Known Genes track build. |
kgPepMrna: | generate new .tab files with unused mRNA and protein sequences from known genes db tables removed.usage:
kgPepMrna tempKgDb roDb YYMMDD tempKGDb is the temp KG build database name |
kgPick: | select the best repersentative mRNA/protein pair |
kgProtAlias: | create protein alias .tab files usage:
kgProtAlias xxxx yyyy xxxx is genome database name |
kgProtAliasNCBI: | create gene alias (mRNA part) .tab files usage:
kgProtAliasNCBI <DB> <RO_DB> <DB> is knownGene DB under construction |
kgPutBack: | from gene candidates, go through various criteria and keep the ones that pass the criteria |
kgXref: | create Known Gene cross reference table kgXref.tab file.usage:
kgXref <db> <proteinsYYMMDD> <ro_db> <db> is known Genes database under construction |
kgXref2: | create new Known Gene cross reference table kgXref2.tab file.usage:
kgXref2 <tmpDb> <YYMMDD> <roDb> <tmpDb> is temp KG database under construction |
knownToHprd: | Create knownToHprd table using HPRD flat file and kgXref |
knownToVisiGene: | Create knownToVisiGene table by riffling through various other knownTo tables |
knownVsBlat: | Categorize BLAT mouse hits to known genes |
kvsSummary: | Summarize output of a bunch of knownVsBlats |
lavToAxt: | Convert blastz lav file to an axt file (which includes sequence) |
lavToPsl: | Convert blastz lav to psl format |
ldHgGene: | load database with gene predictions from a gff file. |
lfsOverlap: | remove overlapping records from lfs file and retain the best
scoring lfs record for each set of overlapping records. If scores are equal, the first record found is retained |
libScan: | Scan libraries to help find g' capped ones |
liftAcross: | convert one coordinate system to another, no overlapping items |
liftAgp: | Program to lift tracks that have nearly the same .agp file,
but slightly different. Initially designed for chr21 and chr22 which are starting to accumulate ticky-tacky changes. Currently works for files |
liftFrags: | This program lifts annotations on clone fragments to FPC contig coordinates |
liftOver: | Move annotations from one assembly to another |
liftOverMerge: | Merge multiple regions in BED 5 files generated by liftOver -multiple |
liftPromoHits: | Lift motif hits from promoter to chromosome coordinates |
liftUp: | change coordinates of .psl, .agp, .gap, .gl, .out, .gff, .gtf .bscore
.tab .gdup .axt .chain .net, genePred, .wab, .bed, or .bed8 files to parent coordinate system. |
lineCount: | Count lines in a file |
linesToRa: | generate .ra format from lines with pipe-separated fields |
localtime: | convert unix timestamp to date string |
mafAddIRows: | add 'i' rows to a maf |
mafAddQRows: | Add quality data to a maf |
mafCoverage: | Analyse coverage by maf files - chromosome by chromosome and genome-wide. |
mafFetch: | get overlapping records from an MAF using an index table |
mafFilter: | Filter out maf files. Output goes to standard out |
mafFrag: | Extract maf sequences for a region from database |
mafFrags: | Collect MAFs from regions specified in a 6 column bed file |
mafGene: | output protein alignments using maf and genePred |
mafMeFirst: | Move component to top if it is one of the named ones.
Useful in conjunction with mafFrags when you don't want the one with the gene name to be in the middle. |
mafOrder: | order components within a maf file |
mafRanges: | Extract ranges of target (or query) coverage from maf and output as BED 3 (e.g. for processing by featureBits). |
mafSpeciesList: | Scan maf and output all species used in it. |
mafSpeciesSubset: | Extract a maf that just has a subset of species. |
mafSplit: | Split multiple alignment files |
mafSplitPos: | Pick positions to split multiple alignment input files |
mafToAxt: | Convert from maf to axt format |
mafToPsl: | Convert maf to psl format |
mafsInRegion: | Extract MAFS in a genomic region |
makeTableDescriptions: | Add table descriptions to database. |
makepgo: | Make Predicted Gene Offset files. One for each chromosome. |
maskOutFa: | Produce a masked .fa file given an unmasked .fa and a RepeatMasker .out file, or a .bed file to mask on. |
maxTranscriptomeExps: | cycle through a list of of affy transcriptome experiments and select the max for each position. |
mdToNcbiLift: | Convert seq_contig.md file to ncbi.lft |
mgcFastaForBed: | Take a bed file and return a fasta file with exons uppercase and introns lowercase. |
mktime: | convert date string to unix timestamp |
moresyn: | find more gene/ORF synonyms |
motifLogo: | Make a sequence logo out of a motif. |
motifSig: | Combine info from multiple control runs and main improbizer run |
mousePoster: | Search database info for making foldout |
mrnaToGene: | convert PSL alignments of mRNAs to gene annotations |
netChainSubset: | Create chain file with subset of chains that appear in the net |
netClass: | Add classification info to net |
netFilter: | Filter out parts of net. What passes
filter goes to standard output. Note a net is a recursive data structure. If a parent fails to pass |
netSplit: | Split a genome net file into chromosome net files |
netStats: | Gather statistics on net |
netSyntenic: | Add synteny info to net. |
netToAxt: | Convert net (and chain) to axt. |
netToBed: | Convert target coverage of net to a bed file. |
netToBedWithId: | Convert net (and chain) to bed with base identity in score. |
newProg: | make a new C source skeleton. |
nibFrag: | Extract part of a nib file as .fa (all bases/gaps lower case by default) |
nibSize: | print size of nibs |
nibbImageProbes: | Collect image probes for NIBB Xenopus Laevis in-situs |
nibbNameFix: | Regularize format of NIBB sequence names |
nibbParseImageDir: | Look through nibb image directory and allowing for typos and the like create a table that maps a file name to clone name, developmental stage, and view of body part |
nibbPrepImages: | Set up NIBB frog images for VisiGene virtual microscope - copying them to a directory and makeing up pyramid scheme. |
normalizeSampleFile: | normalizeSampleFiles - calculates average value over a series of
sample files and sets the average of each sample file to the global average. Optionally will also group together samples into larger groups. |
nt4Frag: | Extract a piece of a .nt4 file to .fa format |
oligoMatch: | find perfect matches in sequence. |
orf: | Find orf for cDNAs |
orfStats: | Collect stats on orfs |
orthoEvaluate: | Evaluate the coding potential of a bed.
(version: .c,v 1.13 2008/09/03 19:20:51 markd ) -help -- Display this message. |
orthoMap: | Map items from one organism to another. Must
specify one type of item using the -itemFile or -itemTable flags. OrthoMap simply maps over the genomic coordinates discarding |
orthoPickIntron: | Pick best intron from orthoEval.
(version: 1.8 2008/09/03 19:20:52 markd ) -help -- Display this message. |
orthoSplice: | program to compare splicing in different organisms
initially human and mouse as they both have nice EST and cDNA data still working out algorithm but options are: |
orthologBySynteny: | Find syntenic location for a list of gene predictions on a single chromosome |
overlapSelect: | wrong # args: overlapSelect [options] selectFile inFile outFile Select records based on overlapping chromosome ranges. The ranges are |
patCount: | counts up the number of occurences of each
oligo of a fixed size (up to 13) in input. Writes out all patterns that are overrepresented by at least factor |
pbCalDist: | pbCalDist- Create tab delimited data files to be used by Proteome Browser stamps. |
pbCalDistGlobal: | pbCalDistGlobal- Create tab delimited data files to be used by Proteome Browser stamps. |
pbCalPi: | Calculate pI values from a list of protein IDs |
pbCalResStd: | pbCalResStd calculates the avg frequency and standard deviation of every AA residues of the proteins in a specific genome |
pbCalResStdGlobal: | pbCalResStd calculates the avg frequency and standard deviation of every AA residues of the proteins in a protein database |
pbHgnc: | process HGNC data |
pepPredToFa: | Convert a pepPred table to fasta format |
pfamXref: | create pfam xref .tab file usage:
pfamXref pn pfamInput pfamOutput pfamXref pn is protein database name |
phToPsl: | Convert from Pattern Hunter to PSL format |
polyInfo: | Collect info on polyAdenylation signals etc |
positionalTblCheck: | check that positional tables are sorted |
promoSeqFromCluster: | Get promoter regions from cluster |
pslCDnaFilter: | Filter cDNA alignments in psl format. Filtering criteria are |
pslCat: | concatenate psl files |
pslCheck: | validate PSL files |
pslCoverage: | estimate coverage from alignments.usage: pslCoverage in.sizes in.psl minPercentId endTrim out.cov misAsm.out |
pslDiff: | Compare queries in two or more psl files |
pslDropOverlap: | deletes all overlapping self alignments. |
pslFilter: | filter out psl file pslFilter in.psl out.psl |
pslGlue: | reduce a psl mRNA alignment file to only the components that might be involved in gluing |
pslHisto: | pslHisto [options] what inPsl outHisto |
pslHitPercent: | Figure out percentage of reads in FA file that hit. |
pslIntronsOnly: | Filter psl files to only include those with introns |
pslMap: | map PSLs alignments to new targets using alignments of the old target to the new target. Given inPsl and mapPsl, where |
pslMrnaCover: | Make histogram of coverage percentage of mRNA in psl. |
pslPartition: | split PSL files into non-overlapping sets |
pslPretty: | Convert PSL to human readable output |
pslRecalcMatch: | Recalculate match,mismatch,repMatch columns in psl file.
This can be useful if the psl went through pslMap, or if you've added lower-case repeat masking after the fact |
pslReps: | analyse repeats and generate genome wide best alignments from a sorted set of local alignments |
pslSelect: | select records from a PSL file. |
pslSimp: | create simplified version of psl file. |
pslSort: | merge and sort psCluster .psl output files |
pslSortAcc: | sort pslSort .psl output file by accession Make one output .psl file per accession. |
pslSplitOnTarget: | Split psl files into one per target. |
pslStats: | collect statistics from a psl file. |
pslSwap: | wrong # args: pslSwap [options] inPsl outPsl |
pslToBed: | pslToBed: tranform a psl format file to a bed format file. |
pslToPslx: | Convert from psl to pslx format, which includes sequences |
pslToXa: | Convert from psl to xa alignment format |
pslUnpile: | Removes huge piles of alignments from sorted psl files (due to unmasked repeats presumably). |
pslxToFa: | convert pslx (with sequence) to fasta file |
qaToQac: | convert from uncompressed to compressed quality score format. |
qacAgpLift: | Use AGP to combine per-scaffold qac into per-chrom qac. |
qacToQa: | convert from compressed to uncompressed quality score format. |
qacToWig: | convert from compressed quality score format to wiggle format. |
raToCds: | Extract CDS positions from ra file |
raToLines: | Output .ra file stanzas as single lines, with pipe-separated fields. |
raToTab: | Convert ra file to table. |
randomLines: | Pick out random lines from file |
refiAli: | This program turns rough alignments into fine ones. |
refreshNamedSessionCustomTracks: | refreshNamedSessionCustomTracks -- scan central database's namedSessionDb
contents for custom tracks and touch any that are found, to prevent them from being removed by the custom track cleanup process. |
regionPicker: | Code to pick regions to annotate deeply.
Stratifies genome based on mouse non-transcribed homology and spliced EST density. |
relPairs: | extract pairs from a big pair list file that actually occur in a .psl file |
reviewIndexes: | check indexes |
reviewSanity: | Look through sanity files and make sure things are ok. |
rikenBestInCluster: | Find best looking in Riken cluster |
rmFaDups: | rmFaDup - remove duplicate records in FA file
usage rmFaDup oldName.fa newName.fa |
rmKGPepMrna: | generate new .tab files with unused mRNA and protein sequences from known genes db tables removed.usage:
rmKGPepMrna xxxx yyyy xxxx is the genome database name |
rnaFoldBig: | Run RNAfold repeatedly |
rowsToCols: | Convert rows to columns and vice versa in a text file. |
safePush: | Push database tables from one machine to another. This is a
little more careful than mypush. It should be run on the machine that is the source of the data |
samHit: | reads the SAM output .rdb file and produce .tab data for the protHomolog table. usage:
samHit proteinId rdbFN proteinId is the protein ID |
sanger22gtf: | Convert Sanger chromosome 22 annotations to gtf |
scaffoldFaToAgp: | generate an AGP file, gap file, and lift file from a scaffold FA file. |
scaleSampleFiles: | scale all of the scores in a file by a scale factor. |
scanRa: | scan through ra files for info. |
scopCollapse: | Convert SCOP model to SCOP ID. Also make id/name converter file. |
scrambleFa: | scramble the order of records in an fa file |
seqCheck: | check that extFile references in seq table are valid |
sequenceForBed: | Writes sequence for beds to a fasta file. Requires database access. |
sim4big: | A wrapper for Sim4 that runs it repeatedly on a multi-sequence .fa file |
simpleChain: | Stitch psls into chains |
sizeof: | type bytes bits
char 1 8 unsigned char 1 8 |
snpException: | Get exceptions to a snp invariant rule. |
snpMaskAddInsertions: | snpMaskAddInsertions -- Print genomic sequence plus insertion SNPs. |
snpMaskCutDeletions: | snpMaskCutDeletions -- Print genomic sequence with deletion SNPs removed. |
snpMaskSingle: | print sequence using IUPAC ambiguous nucleotide codes for single base substitutions |
snpNcbiToUcsc: | Reformat NCBI SNP field values into UCSC, and flag exceptions. |
snpValid: | Validate snp alignments |
sortFilt: | merge, sort, and filter patSpace .hit output. |
spLoadPsiBlast: | load swissprot PSL-BLAST table. This loads the results of all-against-all PSI-BLAST on Swissprot, which |
spLoadRankProp: | load swissprot rankProp table. |
spOrganism: | Extract taxonomy data from SWISS-PROT data file and produce a .tab file of SWISS-PROT display ID/NCBI taxonomy ID pairs. |
spTest: | Test out sp library. |
spToDb: | Create a relational database out of SwissProt/trEMBL flat files |
spToProteins: | spToProteins- Create tab delimited data files from spxxxx database for proteinsxxxx database. |
spToProteinsVar: | spToProteinsVar- Create tab delimited data file, spXrefVar.tab, from spYYMMDD database for proteinsYYMMDD database. |
spToSpXref2: | spToSpXref2- Create tab delimited data files for the spXref2 table in uniProt (spxxxxxx) database. |
spXref3: | get xref data of proteins in SWISS-PROT, TrEMBL, TrEMBL-NEW and HUGO. Output is placed in file spXref3.tab. |
spacedToTab: | Convert fixed width space separated fields to tab separated Note this requires two passes, so it can't be done on a pipe |
spideyToPsl: | Convert NCBI spidey pair alignments to PSL format |
splitFa: | split a big FA file into smaller ones. |
splitFile: | Split up a file |
splitFileByColumn: | Split text input into files named by column value |
splitSim: | Simulate gapless distribution size |
spm3: | from all mRNAs in a genome (e.g. rn3) referenced by SWISS-PROT generate a list of proteins and a list of protein/mRNA pairs. |
spm6: | generates sorted.lis and knownGene0.tab for further duplicates processing |
spm7: | Create sorted list of mRNA-SP data file for further duplicates processing |
sqlToXml: | dump out all or part of a relational database to XML, guided by a dump specification. See sqlToXml.doc for additional information. |
stToXao: | make indices into st file, one for each chromosome. |
stageMultiz: | Stage input directory for Webb's multiple aligner |
stanToBedAndExpRecs: | takes a pslFile of alignments and a list of stanfords
expression data files and converts them into a bed file with the scores and experiment ids. Also creates a corresponding file of expRecords which idicate what the |
stitchea: | joins together EA files into one big one, throwing out overlaps. Will complain if there's any missing data. |
stitcher: | third pass of genomic/genomic alignment. Stitches together 2000x5000 base 7-state alignments into longer contigs. |
stringify: | Convert file to C strings |
subChar: | Substitute one character for another throughout a file. |
subColumn: | Substitute one column in a tab-separated file. |
subs: | Subs - a utility to perform massive string substitutions on source |
subsetAxt: | Rescore alignments and output those over threshold |
subsetTraces: | Build subset of mouse traces that actually align |
tableSum: | Summarize a table somehow |
tailLines: | add tail to each line of file |
testSearch: | test search functionality. |
textHist2: | Make two dimensional histogram table out of a list of 2-D points, one per line. |
textHistogram: | Make a histogram in ascii |
tfbsConsSort: | a utility to sort tfbsCons files before loading them |
tickToDate: | Convert seconds since 1970 to time and date |
timePosTable: | time access to a positional table |
toDev64: | A program that copies data from the old hgwdev database to the new hgwdev database. |
toLower: | Convert upper case to lower case in file. Leave other chars alone |
toUpper: | Convert lower case to upper case in file. Leave other chars alone |
trackDbRaFormat: | Format trackDb.ra canonically. |
trackOverlap: | trackOverlap- Overlap how much of a track is overlapped by
other tracks and vice versa. This is done by correlating series of bitmap projections (i.e. featureBits multiple times). |
trfBig: | Mask tandem repeats on a big sequence file. |
twinOrf: | Predict open reading frame in cDNA given a cross species alignment |
twinOrf2: | Predict open reading frame in cDNA given a cross species alignment |
twinOrf3: | Predict open reading frame in cDNA given a cross species alignment |
twinOrfStats: | Collect stats on refSeq cDNAs aligned to another species via axtForEst |
twinOrfStats2: | Collect stats on refSeq cDNAs aligned to another species via axtForEst |
twinOrfStats3: | Collect stats on refSeq cDNAs aligned to another species via axtForEst |
twoBitInfo: | get information about sequences in a .2bit file |
twoBitMask: | apply masking to a .2bit file, creating a new .2bit file |
twoBitToFa: | Convert all or part of .2bit file to fasta |
txAbFragFind: | Search database for what are probably antibody fragments. |
txBedToGraph: | Cluster together beds from txPslToBed. Make transcript graphs out of clusters.
txBedToGraph in1.bed in1Type [in2.bed in2type ...] out.txg options: |
txCdsBadBed: | Create a bed file with regions that don't really have CDS, but that might look like it. |
txCdsCluster: | Cluster transcripts purely in the CDS regions, only putting things together if they share same frame as well as a genomic region. |
txCdsEvFromBed: | Make a cds evidence file (.tce) from an existing bed file. Used mostly in transferring CCDS coding regions currently. |
txCdsEvFromBorf: | Convert borfBig format to txCdsEvidence (tce) in an effort to annotate the coding regions. |
txCdsEvFromProtein: | Convert transcript/protein alignments and other evidence into a transcript CDS evidence (tce) file |
txCdsEvFromRna: | Convert transcript/rna alignments, genbank CDS file, and other info to transcript CDS evidence (tce) file. |
txCdsGoodBed: | Create positive example training set for SVM. This is
based on the refSeq reviewed genes, but we fragment a certain percentage of them so as not to end up with a SVM that *requires* a complete |
txCdsOrfInfo: | Given a sequence and a putative ORF, calculate some basic information on it. |
txCdsOrtho: | Figure out how CDS looks in other organisms. |
txCdsPick: | Pick best CDS if any for transcript given evidence. |
txCdsPredict: | Somewhat simple-minded ORF predictor using a weighting scheme. |
txCdsRaExceptions: | Mine exceptional things like selenocysteine out of genbank ra file. |
txCdsRefBestEvOnly: | Go through a cdsEvidence file, and extract only the bits that refer to the native orf for a RefSeqReviewed transcript. |
txCdsRepick: | OBSOLETE program. The scheme this implemented ended up
not working so well. It's still in the source tree because it may contain some useful routines for other programs |
txCdsSuspect: | Flag cases where the CDS prediction is very suspicious, including
CDSs that lie entirely in an intron or in the 3' UTR of another, better looking transcript. |
txCdsSvmInput: | Create input for svm_light, a nice support vector machine classifier. |
txCdsToGene: | Convert transcript bed and best cdsEvidence to genePred and protein sequence. |
txCdsWeed: | Remove bad CDSs including NMD candidates |
txGeneAccession: | Assign permanent accession number to genes. |
txGeneAlias: | Make kgAlias and kgProtAlias tables. |
txGeneAltProt: | Figure out statistics on number of alternative proteins produced by alt-splicing. |
txGeneCanonical: | Pick a canonical version of each gene - that is the form
to use when just interested in a single splicing varient. Produces final transcript clusters as well. |
txGeneCdsMap: | Create mapping between CDS region of gene and genome. This is used to build the exon track in the proteome browser. |
txGeneColor: | Figure out color to draw gene in. |
txGeneExplainUpdate1: | Make table explaining correspondence between older known genes and ucsc genes. |
txGeneFromBed: | Convert from bed to knownGenes format table (genePred + uniProt ID) |
txGeneProtAndRna: | Create fasta files with our proteins and transcripts.
These echo RefSeq when gene is based on RefSeq. Otherwise they are taken from the genome. |
txGeneSeparateNoncoding: | Separate genes into four piles - coding, non-coding that overlap coding, and independent non-coding. |
txGeneXref: | Make kgXref type table for genes. |
txInfoAssemble: | Assemble information from various sources into txInfo table. |
txOrtho: | Produce list of shared edges between two transcription graphs in two species. |
txPslFilter: | Do rna/rna filter. |
txPslToBed: | txPsltoBed - Convert a psl to a bed file by projecting it onto its target sequence. Optionally merge adjacent blocks and trim to splice sites. |
txReadRa: | Read ra files from genbank and parse out relevant info into some tab-separated files. |
txWalk: | Walk transcription graph and output transcripts. |
txgAddEvidence: | Add evidence from a bed file to existing transcript graph. |
txgAnalyze: | Analyse transcription graph for alt exons, alt 3', alt 5', retained introns, alternative promoters, etc. |
txgGoodEdges: | Get edges that are above a certain threshold. |
txgToAgx: | Convert from txg (txGraph) format to agx (altGraphX) |
txgToXml: | Convert txg to an XML format. |
txgTrim: | Trim out parts of txGraph that are not of sufficient weight. |
udcCleanup: | Clean up old unused files in udcCache. |
undupFa: | rename duplicate records in FA file
usage undupFa faFile(s) |
upper: | strip numbers, spaces, and punctuation turn to upper case |
utrFa: | Get UTRs as fasta files |
validateFiles: | Validate format of different track input files
Program exits with non-zero status if any errors detected otherwise exits with zero status |
venn: | Do venn diagram calculations |
wabToSt: | Convert WABA output to something Intronerator understands better |
weedLines: | Selectively remove lines from file |
whyConserved: | Try and analyse why a particular thing is conserved |
wigEncode: | convert Wiggle ascii data to binary format |
wigTestMaker: | Create test wig files. |
wigToBigWig: | Convert ascii format wig file (in fixedStep, variableStep or bedGraph format) to binary big wig format. |
wordLine: | chop up words by white space and output them with one word to each line. |
xmlCat: | Concatenate xml files together, stuffing all records inside a single outer tag. |
xmlToSql: | Convert XML dump into a fairly normalized relational database
in the form of a directory full of tab-separated files and table creation SQL. You'll need to run autoDtd on the XML file first to |