Kent source utilities: Difference between revisions

Revision as of 15:24, 20 May 2006

aNotB:	List symbols that are in a but not b
addAveMedScoreToPsls:	Combines unigene pslFile and sage file into bed file
addCols:	Sum columns in a text file.
affyPairsToSample:	Takes a 'pairs' format file from the Affy transcriptome data set and combines it with the Affy offset.txt file to output a 'sample' file which has the contig coordinates of the result.
agpAllToFaFile:	Convert all sequences in an .agp file to a .fa file
agpCloneCheck:	Check that have all clones in an agp file (and the right version too)
agpCloneList:	Make simple list of all clones in agp file to stdout
agpToFa:	Convert a .agp file to a .fa file
agpToGl:	Convert AGP file to GL file. Some fakery involved.
agxToIntronBeds:	Program to output all introns from altGraphX records as beds. Designed for use in MGC project looking for novel introns from altGraphX records transferred over from mouse.
ali2alx:	produces an index file for each chromosome into an ali file.
aliGlue:	tell where a cDNA is located quickly.
ameme:	find common patterns in DNA usage ameme good=goodIn.fa [bad=badIn.fa] [numMotifs=2] [background=m1] [maxOcc=2] [motifOutput=fileName] [html=output.html] [gif=output.gif] [rcToo=on] [controlRun=on] [startScanLimit=20] [outputLogo] [constrainer=1]
assessLibs:	Make table that assesses the percentage of library that covers 5' and 3' ends
autoDtd:	Give this a XML document to look at and it will come up with a DTD to describe it.
autoSql:	create SQL and C code for permanently storing a structure in database and loading it back into memory based on a specification file
autoXml:	Generate structures code and parser for XML file from DTD-like spec
ave:	Compute average and basic stats
aveCols:	average together columns
averagExp:	Average expression data within a cluster
averageZoomLevels:	takes a sorted sample file and creates averaged 'zoomed-out' summaries for a few different levels. Basic idea is to get the size of a chromosome, divide it by 2000 as that is
avgTranscriptomeExps:	Averages together replicates of the affy transcriptome data set. Will skip certain experiments unless directed otherwise as they were not in the original data set.
axtAndBed:	Intersect an axt with a bed file and output axt.
axtBest:	Remove second best alignments
axtCalcMatrix:	Calculate substitution matrix and make indel histogram
axtChain:	Chain together axt alignments.
axtDropOverlap:	deletes all overlapping self alignments.
axtDropSelf:	Drop alignments that just align same thing to itself
axtFilter:	Filter axt files. Output goes to standard out.
axtForEst:	Generate file of mouse/human alignments corresponding to MGC EST's
axtIndex:	build index of axt file
axtPretty:	Convert axt to more human readable format.
axtQueryCount:	Count bases covered on each query sequence
axtRecipBest:	create file for dot plot using recip best
axtRescore:	Recalculate scores in axt.
axtSort:	Sort axt files
axtSplitByTarget:	Split a single axt file into one file per target
axtSwap:	Swap source and query in an axt file
axtToBed:	Convert axt alignments to simple bed format
axtToChain:	Convert axt to chain format
axtToMaf:	Convert from axt to maf format
axtToPsl:	Convert axt to psl format
bedCons:	Look at conservation of a BED track vs. a refence (nonredundant) alignment track
bedCoverage:	Analyse coverage by bed files - chromosome by chromosome and genome-wide.
bedDown:	Make stuff to find a BED format submission in a new version
bedIntersect:	Intersect two bed files
bedItemOverlapCount:	count number of times a base is overlapped by the items in a bed file. Output is bedGraph 4 to stdout.
bedSort:	Sort a .bed file by chrom,chromStart
bedToFrames:	Makes html files for browsing custom bed track using frames. Use -pad for padding
bedToGenePred:	Too few arguments: convert bed format files to genePred format
bedUp:	Load bed submissions after conversion back into new database.
binGood:	convert text format alignment file to binary format
blastToPsl:	Convert blast alignments to PSLs.
blat:	Standalone BLAT v. 33x3 fast sequence search command line tool
blatz:	blatz version 1 - Align dna across species
blatzClient:	blatzClient version 1 - Ask server to do cross-species DNA alignments and save results.
blatzServer:	blatzServer version 1 - Set up in-memory server for cross-species DNA alignments
borfBig:	Run Victor Solovyev's bestorf repeatedly
bwana:	do batch coarse alignment of C. briggsae and C. elegans genomes.
calc:	Little command line calculator
calcGap:	calculate gap scores
catDir:	concatenate files in directory to stdout. For those times when too many files for cat to handle.
catUncomment:	Concatenate input removing lines that start with '#' Output goes to stdout
ccCp:	copy a file to cluster.usage: ccCp sourceFile destFile [hostList] This will copy sourceFile to destFile for all machines in
cdnaOff:	creates sorted offset files that position cDNAs in chromosome.
chainAntiRepeat:	Get rid of chains that are primarily the results of repeats and degenerate DNA
chainDbToFile:	translate a chain's db representation back to file
chainFilter:	Filter chain files. Output goes to standard out.
chainMergeSort:	Combine sorted files into larger sorted file
chainNet:	Make alignment nets out of chains
chainPreNet:	Remove chains that don't have a chance of being netted
chainSort:	Sort chains. By default sorts by score. Note this loads all chains into memory, so it is not suitable for large sets. Use chainMergeSort for that
chainSplit:	Split chains up by target or query sequence
chainStats:	Stitch psls into chains
chainSwap:	Swap target and query in chain
chainToPsl:	Convert chain file to psl format
checkAgpAndFa:	takes a .agp file and .fa file and ensures that they are in synch
checkHgFindSpec:	test and describe search specs in hgFindSpec tables.
checkTableCoords:	check invariants on genomic coords in table(s).
checkableBorf:	Convert borfBig orf-finder output to checkable form
chopFaLines:	Read in FA file with long lines and rewrite it with shorter lines
cluster:
clusterGenes:	Cluster genes from genePred tracks
clusterRna:	Make clusters of mRNA and ESTs
convolve:	perform convolution of probabilities
countChars:	Count the number of occurences of a particular char
createSageSummary:
ctgFaToFa:	Convert from one big file with all NT contigs to one contig per file.
ctgToChromFa:	convert contig level fa files to chromosome level
dbSnoop:	Produce an overview of a database.
detab:	remove tabs from program
dnaMotifFind:	Locate preexisting motifs in DNA sequence
eisenInput:	Create input for Eisen-style cluster program
emblMatrixToMotif:	Convert transfac matrix in EMBL format to dnaMotif
embossToPsl:	Convert EMBOSS pair alignments to PSL format
endsInLf:	Check that last letter in files is end of line
est2genomeToPsl:	Convert EMBOSS est2genome and WUSTL pairgon alignments to PSL format
estLibStats:	Calculate some stats on EST libraries given file from polyInfo
estOrient:	estOrient [options] db estTable outPsl
exonAli:	This program aligns cDNA with genomic sequence. Usage: exonAli named output cdnaName(s) exonAli in output listFile
expToRna:	Make a little two column table that associates rnaClusters with expression info
faAlign:	Align two fasta files
faCmp:	Compare two .fa files
faCount:	count base statistics and CpGs in FA files.
faFilter:	Filter fa records, selecting ones that match the specified conditions
faFilterN:	Get rid of sequences with too many N's
faFlyBaseToUcsc:	Convert Flybase peptide fasta file to UCSC format
faFrag:	Extract a piece of DNA from a .fa file.
faGapSizes:	report on gap size counts/statistics
faNcbiToUcsc:	Convert FA file from NCBI to UCSC format.
faNoise:	Add noise to .fa file
faOneRecord:	Extract a single record from a .FA file
faPolyASizes:	get poly A sizes
faRc:	Reverse complement a FA file
faSimplify:	Simplify fasta record headers
faSize:	print total base count in fa files.
faSomeRecords:	Extract multiple fa records
faSplit:	Split an fa file into several files.
faToNib:	Convert from .fa to .nib format
faToTab:	hgFaToTab - convert fa file to tab separted file
faToTwoBit:	Convert DNA from fasta to 2bit format
faTrans:	Translate DNA .fa file to peptide
faTrimPolyA:	trim poly-A tails
faTrimRead:	trim reads based on qual scores - change low scoring bases to N's
fakeFinContigs:	Fake up contigs for a finished chromosome
fakeOut:	fake a RepeatMasker .out file based on a N's in .fa file
fatont4:	fato4nt - a program to convert .fa files to .4nt files
featureBits:	Correlate tables via bitmap projections.
ffaToFa:	ffaToFa convert Greg Schuler .ffa fasta files to UCSC .fa fasta files
findCdna:
findMotif:	find specified motif in sequence
findStanAlignments:	takes a stanford microarray experiment file and tries to look up an alignment for the relevant clone in the database. Starts by trying to look up the longest genbank clone from image id,
fishClones:
fixCr:	strip <CR>s from ends of lines
fixcr:	removes trailing carraige returns from files.
fqToQa:	convert from fq format with one big file to format with one file per clone.
fqToQac:	convert from fq format with one big file to compressed format with one file per clone.
fragPart:	get part of a fragment's sequence
gb2cdi:	convert GeneBank (GB) files to .fa and cDna Info (CDI) file.
gbGetEntries:	retrieve records from a GenBank flat file.
gbOneAcc:	retrieve one or a few records from a GenBank flat file.
gbToFaRa:	Convert GenBank flat format file to an fa file containing the sequence data, an ra file containing other relevant info and a ta file containing summary statistics.
gbtofa:	gbtofa converts from GeneBank to fa format.
gcForBed:	Calculate g/c percentage and other stats for regions covered by bed
genePredCheck:	validate genePred files or tables
genePredHisto:	wrong number of arguments get data for generating histograms from a genePred file.
genePredSingleCover:	wrong # args create single-coverage genePred files
genePredToFakePsl:	Create a psl of fake-mRNA aligned to gene-preds from a file or table.
genePredToGtf:	Convert genePred table or file to gtf.
genePredToMafFrames:	wrong # args create mafFrames tables from a genePreds
geniegff:	makes up a gdf file from Genie gene predictions
getFeatDna:	Get dna for a type of feature
getRna:	Get mrna for GenBank or RefSeq sequences found in a database
getRnaPred:	Get virtual RNA for gene predictions
gfClient:	gfClient v. 33x3 - A client for the genomic finding program that produces a .psl file
gfPcr:	In silico PCR version 33x3 using gfServer index.
gfServer:	gfServer v 33x3 - Make a server to quickly find where DNA occurs in genome. To set up a server: gfServer start host port file(s)
gffPeek:	Look at a gff file and report some basic stats
gffgenes:	creates files that store extents of genes for intronerator
gpStats:	Figure out some stats on the golden path.
gpToGtf:	Convert gp table to GTF
gpcrParser:	Create xml files for gpcr snakeplots.
groupSamples:	Group samples together into one sample. Samples must be sorted by chromosome position (you can use bedSort first if they are not).
gsBig:	Run Genscan on big input and produce GTF files and other parsed output
gtfToGenePred:	convert a GTF file to a genePred
headRest:	Return all but the first N lines of a file.
hgAddLiftOverChain:	Add a liftOver chain to the central database
hgAvidShortBed:	Convert short form of AVID alignments to BED
hgBioCyc:	bioCyc - Creates bioCycPathway.tab for Known Genes to link to SRI BioCyc pathways
hgCeOrfToGene:	Make orfToGene table for C.elegans from GENE_DUMPS/gene_names.txt
hgChroms:	print chromosomes for a genome.
hgClonePos:	create clonePos table in browser database
hgClusterGenes:	Cluster overlapping gene predictions
hgCountAlign:	count overlaping or non-overlaping windows in an alignment.
hgCtgPos:	Store contig positions ( from lift files ) in database.
hgDeleteChrom:	output SQL commands to delete a chrom from the database
hgEmblProtLinks:	Parse EMBL flat file into protein link table
hgExonerate:	Convert Exonerate modified GFF files to BED format and load in database.
hgExpDistance:	Create table that measures expression distance between pairs
hgExperiment:	Load data from a BED of region positions, an experiment file containing <name> [<description>]
hgExtFileCheck:	check extFile or gbExtFile tables against file system
hgFakeAgp:	Create fake AGP file by looking at N's
hgFiberglass:	Turn Fiberglass Annotations into a BED and load into database
hgFindSpec:	Create hgFindSpec table from trackDb.ra files.
hgFlyBase:	Parse FlyBase genes.txt file and turn it into a couple of tables
hgGcPercent:	Calculate GC Percentage in 20kb windows
hgGeneBands:	Find bands for all genes
hgGetAnn:	get chromosome annotation rows from database tables using browser-style position specification.
hgGnfMicroarray:	Load data from (2003-style) GNF Affy Microarrays
hgGoAssociation:	Load bits we care about in GO association table
hgGoldGapGl:	Put chromosome .agp and .gl files into browser database.
hgKegg:	creates keggPathway.tab and keggMapDesc.tab files for KG links to KEGG Pathway Mapusage: hgKegg xxxx xxxx is the genome database name
hgKegg2:	creates keggPathway.tab and keggMapDesc.tab files for KG links to KEGG Pathway Mapusage: hgKegg2 kgTempDb roDb kgTempDb is the KG build temp database name
hgKgGetText:	Get text from known genes into a file. The file will be line oriented with the known gene ID as the first word, and the rest of the word being a conglomaration
hgKgMrna:	Load mRNA alignments and other info into refGene tables into a TEMPORARY database to build Known Genes track.
hgKnownMore:	Create the knownMore table from a variety of sources.
hgKnownToSuper:	Load knownToSuperfamily table
hgLoadBed:	Load a generic bed file into database
hgLoadBlastTab:	Load blast table into database
hgLoadChain:	Load a generic Chain file into database
hgLoadGap:	Load gap table from AGP-style file containing only gaps
hgLoadGenePred:	wrong # args Load up a mySQL database genePred table
hgLoadItemAttr:	load an itemAttr table
hgLoadMaf:	Load a maf file index into the database
hgLoadMafFrames:	wrong # args load an mafFrames table
hgLoadMafSummary:	Load a summary table of pairs in a maf into a database
hgLoadNet:	Load a generic net file into database
hgLoadOut:	load RepeatMasker .out files into database
hgLoadPsl:	Load up a mySQL database with psl alignment tables
hgLoadSample:	Load a sample 9 (wiggle) file into database
hgLoadSeq:	load browser database with sequence file info.
hgLoadSqlTab:	Load table into database from SQL and text files.
hgLoadWiggle:	Load a wiggle track definition into database
hgMapMicroarray:	Make mapped version of microarray data, merging psl in.
hgMapToGene:	Map a track to a genePred track.
hgMapViaSwissProt:	Make table that maps to external database via SwissProt
hgMedianMicroarray:	Create a copy of microarray database that contains the median value of replicas
hgMrnaRefseq:	creates xref data between mRNAand RefSeq from LocusLink data contained in 2 tables from a temporary DBusage: hgMrnaRefseq xxxx xxxx is the genome database name
hgNearTest:	Test hgNear web page
hgNetDist:	GS loader for gene/protein interaction network distances.
hgNibSeq:	convert DNA to nibble-a-base and store location in database
hgPepPred:	Load peptide predictions from Ensembl or Genie
hgPhMouse:	Load phMouse track
hgProtIdToGenePred:	Add proteinID column to genePrediction
hgRatioMicroarray:	Create a ratio form of microarray data.
hgRnaGenes:	Turn RNA genes from GFF into database format (BED variant)
hgSanger20:	Load extra info from Sanger Chromosome 20 annotations.
hgSanger22:	Load up database with Sanger 22 annotations
hgSoftPromoter:	Slap Softberry promoter file into database.
hgSoftberryHom:	Make table storing Softberry protein homology information
hgSpeciesRna:	Create fasta file with RNA from one species
hgStanfordMicroarray:	Load up from Stanford Microarray Database files
hgStsAlias:	Make table of STS aliases
hgStsMarkers:	Load STS markers into database
hgTablesTest:	Test hgTables web page
hgTpf:	Make TPF table
hgTraceInfo:	import subset of mouse trace ancillary information parsed from FASTA files
hgTrackDb:	Create trackDb table from text files. Note that the browser supports multiple trackDb tables, usually
hgWaba:	load Waba alignments into database
hgWiggle:	fetch wiggle data from data base or file
hgWormLinks:	Create table that links worm ORF name to description and SwissProt. This works on a WormBase dump, in Ace format I believe, from Lincoln Stein.
hgsql:	Execute some sql code using passwords in .hg.conf
hgsqladmin:	Wrapper around mysqladmin using passwords in .hg.conf
hgsqldump:	Execute mysqldump using passwords from .hg.conf
hgsqlimport:	Execute mysqlimport using passwords from .hg.conf
htmlCheck:	Do a little reading and verification of html file
htmlPics:	create an html file from a list of pictures usage htmlPics picFile(s)
indexfa:	This program makes an index file for a .fa file
indexgl:	This program makes an index file for a .gl file
intronEnds:	Gather stats on intron ends.
introns:	Introns - finds the introns in a file and writes them to gff.
iriToControlTable:	Convert improbizer run to simple list of control scores
iriToDnaMotif:	Convert improbRunInfo to dnaMotif
isPcr:	Standalone v 33x3 In-Situ PCR Program
ixIxx:	Create indices for simple line-oriented file of format <symbol> <free text>
ixali:	This program makes a name index file for an .ali file
ixword1:	This program makes an index file for text file, indexing the first word of each line.
ixword3:	This program makes an index file for text file, indexing the third word of each line.
jkUniq:	remove duplicate lines from file. Lines need not be next to each other (plain Unix uniq works for that)
joinableFields:	Return list of good join targets for a table
joinerCheck:	Parse and check joiner file
jp2Info:
jp2ToJpg:
jp2ToJpgTiles:
kgAliasKgXref:	create gene alias .tab file usage: kgAliasKgXref xxxx xxxx is genome database name
kgAliasM:	create gene alias (mRNA part) .tab files usage: kgAliasM xxxx yyyy xxxx is genome database name
kgAliasP:	create gene alias (protein part) .tab files usage: kgAliasM xxxx yyyy zzzz xxxx is genome database name
kgAliasRefseq:	create gene alias .tab file usage: kgAliasRefseq xxxx xxxx is genome database name
kgCheck:	from gene candidates, go through various criteria and keep the ones that pass the criteria
kgGetCds:	create a gene candidate table with CDS info
kgGetPep:	generate FASTA format protein sequence file to be used for Known Genes track build.
kgPepMrna:	generate new .tab files with unused mRNA and protein sequences from known genes db tables removed.usage: kgPepMrna tempKgDb roDb YYMMDD tempKGDb is the temp KG build database name
kgPick:	select the best repersentative mRNA/protein pair
kgPrepBestMrna2:
kgPrepBestRef2:
kgProtAlias:	create protein alias .tab files usage: kgProtAlias xxxx yyyy xxxx is genome database name
kgProtAliasNCBI:	create gene alias (mRNA part) .tab files usage: kgProtAliasNCBI <DB> <RO_DB> <DB> is knownGene DB under construction
kgPutBack:	from gene candidates, go through various criteria and keep the ones that pass the criteria
kgResultBestMrna2:
kgResultBestRef2:
kgXref:	create Known Gene cross reference table kgXref.tab file.usage: kgXref <db> <proteinsYYMMDD> <ro_db> <db> is known Genes database under construction
kgXref2:	create new Known Gene cross reference table kgXref2.tab file.usage: kgXref2 <tmpDb> <YYMMDD> <roDb> <tmpDb> is temp KG database under construction
knownVsBlat:	Categorize BLAT mouse hits to known genes
kvsSummary:	Summarize output of a bunch of knownVsBlats
lavToAxt:	Convert blastz lav file to an axt file (which includes sequence)
lavToPsl:	Convert blastz lav to psl format
ldHgGene:	load database with gene predictions from a gff file.
lfsOverlap:	remove overlapping records from lfs file and retain the best scoring lfs record for each set of overlapping records. If scores are equal, the first record found is retained
libScan:	Scan libraries to help find g' capped ones
liftAgp:	Program to lift tracks that have nearly the same .agp file, but slightly different. Initially designed for chr21 and chr22 which are starting to accumulate ticky-tacky changes. Currently works for files
liftFrags:	This program lifts annotations on clone fragments to FPC contig coordinates
liftOver:	Move annotations from one assembly to another
liftOverMerge:	Merge multiple regions in BED 5 files generated by liftOver -multiple
liftPromoHits:	Lift motif hits from promoter to chromosome coordinates
liftUp:	change coordinates of .psl, .agp, .gap, .gl, .out, .gff, .gtf .bscore .tab .gdup .axt .chain .net, genePred, .wab or .bed files to parent coordinate system.
lineCount:	Count lines in a file
lineFileSplit:	Split up a line oriented file into parts
mafAddIRows:	add 'i' rows to a maf
mafCoverage:	Analyse coverage by maf files - chromosome by chromosome and genome-wide.
mafFetch:	get overlapping records from an MAF using an index table
mafFilter:	Filter out maf files. Output goes to standard out
mafFrag:	Extract maf sequences for a region from database
mafFrags:	Collect MAFs from regions specified in a 6 column bed file
mafOrder:	order components within a maf file
mafRanges:	Extract ranges of target (or query) coverage from maf and output as BED 3 (e.g. for processing by featureBits).
mafSplit:	Split multiple alignment files
mafSplitPos:	Pick positions to split multiple alignment input files
mafToAxt:	Convert from maf to axt format
mafsInRegion:	Extract MAFS in a genomic region
makeTableDescriptions:	Add table descriptions to database.
makepgo:	Make Predicted Gene Offset files. One for each chromosome.
maskOutFa:	Produce a masked .fa file given an unmasked .fa and a RepeatMasker .out file, or a .bed file to mask on.
maxTranscriptomeExps:	cycle through a list of of affy transcriptome experiments and select the max for each position.
mdToNcbiLift:	Convert seq_contig.md file to ncbi.lft
mgcFastaForBed:	Take a bed file and return a fasta file with exons uppercase and introns lowercase.
mmUnmix:	Help identify human contamination in mouse and vice versa.
moresyn:	find more gene/ORF synonyms
motifLogo:	Make a sequence logo out of a motif.
motifSig:	Combine info from multiple control runs and main improbizer run
mousePoster:	Search database info for making foldout
mrnaToGene:	convert PSL alignments of mRNAs to gene annotations
netChainSubset:	Create chain file with subset of chains that appear in the net
netClass:	Add classification info to net
netFilter:	Filter out parts of net. What passes filter goes to standard output. Note a net is a recursive data structure. If a parent fails to pass
netSplit:	Split a genome net file into chromosome net files
netStats:	Gather statistics on net
netSyntenic:	Add synteny info to net.
netToAxt:	Convert net (and chain) to axt.
netToBed:	Convert target coverage of net to a bed file.
netToBedWithId:	Convert net (and chain) to bed with base identity in score.
newProg:	make a new C source skeleton.
nibFrag:	Extract part of a nib file as .fa (all bases/gaps lower case by default)
nibSize:	print size of nibs
normalizeSampleFile:	normalizeSampleFiles - calculates average value over a series of sample files and sets the average of each sample file to the global average. Optionally will also group together samples into larger groups.
nt4Frag:	Extract a piece of a .nt4 file to .fa format
orf:	Find orf for cDNAs
orfStats:	Collect stats on orfs
orthoEvaluate:	Evaluate the coding potential of a bed. (version: .c,v 1.12 2003/09/14 23:15:02 sugnet ) -help -- Display this message.
orthoMap:	Map items from one organism to another. Must specify one type of item using the -itemFile or -itemTable flags. OrthoMap simply maps over the genomic coordinates discarding
orthoPickIntron:	Pick best intron from orthoEval. (version: 1.7 2003/09/14 23:15:02 sugnet ) -help -- Display this message.
orthologBySynteny:	Find syntenic location for a list of gene predictions on a single chromosome
overlapSelect:	wrong # args: overlapSelect [options] selectFile inFile outFile Select records based on overlaping chromosome ranges.
patCount:	counts up the number of occurences of each oligo of a fixed size (up to 13) in input. Writes out all patterns that are overrepresented by at least factor
pepPredToFa:	Convert a pepPred table to fasta format
pfamXref:	create pfam xref .tab file usage: pfamXref pn pfamInput pfamOutput pfamXref pn is protein database name
phToPsl:	Convert from Pattern Hunter to PSL format
polyInfo:	Collect info on polyAdenylation signals etc
promoSeqFromCluster:	Get promoter regions from cluster
pslCDnaFilter:	Filter cDNA alignments in psl format. Filtering criteria are
pslCat:	concatenate psl files
pslCheck:	validate PSL files
pslCoverage:	estimate coverage from alignments.usage: pslCoverage in.sizes in.psl minPercentId endTrim out.cov misAsm.out
pslDiff:	Compare queries in two or more psl files
pslDropOverlap:	deletes all overlapping self alignments.
pslFilter:	filter out psl file pslFilter in.psl out.psl
pslFilterPrimers:
pslGlue:	reduce a psl mRNA alignment file to only the components that might be involved in gluing
pslHisto:	pslHisto [options] what inPsl outHisto
pslHitPercent:	Figure out percentage of reads in FA file that hit.
pslIntronsOnly:	Filter psl files to only include those with introns
pslMap:	map PSLs alignments to new targets using alignments of the old target to the new target. Given inPsl and mapPsl, where
pslMrnaCover:	Make histogram of coverage percentage of mRNA in psl.
pslPairs:
pslPartition:	split PSL files into non-overlapping sets
pslPretty:	Convert PSL to human readable output
pslQuickFilter:
pslRecalcMatch:	Recalculate match,mismatch,repMatch columns in psl file. This can be useful if the psl went through pslMap, or if you've added lower-case repeat masking after the fact
pslReps:	analyse repeats and generate genome wide best alignments from a sorted set of local alignments
pslSelect:	select records from a PSL file.
pslSimp:	create simplified version of psl file.
pslSort:	merge and sort psCluster .psl output files
pslSortAcc:	sort pslSort .psl output file by accession Make one output .psl file per accession.
pslStats:	collect statistics from a psl file.
pslSwap:	wrong # args: pslSwap [options] inPsl outPsl
pslToBed:	pslToBed: tranform a psl format file to a bed format file.
pslToXa:	Convert from psl to xa alignment format
pslUniq:
pslUnpile:	Removes huge piles of alignments from sorted psl files (due to unmasked repeats presumably).
pslxToFa:	convert pslx (with sequence) to fasta file
qaToQac:	convert from uncompressed to compressed quality score format.
qacAgpLift:	Use AGP to combine per-scaffold qac into per-chrom qac.
qacToQa:	convert from compressed to uncompressed quality score format.
qacToWig:	convert from compressed quality score format to wiggle format.
raToCds:	Extract CDS positions from ra file
randomLines:	Pick out random lines from file
randomPlacement:	run placement trials on a set of elements
refiAli:	This program turns rough alignments into fine ones.
regionPicker:	Code to pick regions to annotate deeply. Stratifies genome based on mouse non-transcribed homology and spliced EST density.
relPairs:	extract pairs from a big pair list file that actually occur in a .psl file
reviewSanity:	Look through sanity files and make sure things are ok.
rikenBestInCluster:	Find best looking in Riken cluster
rmFaDups:	rmFaDup - remove duplicate records in FA file usage rmFaDup oldName.fa newName.fa
rmKGPepMrna:	generate new .tab files with unused mRNA and protein sequences from known genes db tables removed.usage: rmKGPepMrna xxxx yyyy xxxx is the genome database name
rowsToCols:	Convert rows to columns and vice versa in a text file.
safePush:	Push database tables from one machine to another. This is a little more careful than mypush. It should be run on the machine that is the source of the data
sanger22gtf:	Convert Sanger chromosome 22 annotations to gtf
scaffoldFaToAgp:	generate an AGP file, gap file, and lift file from a scaffold FA file.
scaleSampleFiles:	scale all of the scores in a file by a scale factor.
scanRa:	scan through ra files for info.
scrambleFa:	scramble the order of records in an fa file
selectTrainedHits:	Select only hits from sites that we've trained on
semiNorm:
sim4big:	A wrapper for Sim4 that runs it repeatedly on a multi-sequence .fa file
simpleChain:	Not supported on x86_64
snpException:	Get exceptions to a snp invariant rule.
snpValid:	Validate snp alignments
sortFilt:	merge, sort, and filter patSpace .hit output.
spLoadPsiBlast:	load swissprot PSL-BLAST table. This loads the results of all-against-all PSI-BLAST on Swissprot, which
spLoadRankProp:	load swissprot rankProp table.
spOrganism:	Extract taxonomy data from SWISS-PROT data file and produce a .tab file of SWISS-PROT display ID/NCBI taxonomy ID pairs.
spTest:	Test out sp library.
spToDb:	Create a relational database out of SwissProt/trEMBL flat files
spToProteins:	spToProteins- Create tab delimited data files from spxxxx database for proteinsxxxx database.
spToProteinsVar:	spToProteinsVar- Create tab delimited data file, spXrefVar.tab, from spYYMMDD database for proteinsYYMMDD database.
spXref3:	get xref data of proteins in SWISS-PROT, TrEMBL, TrEMBL-NEW and HUGO. Output is placed in file spXref3.tab.
spacedToTab:	Convert fixed width space separated fields to tab separated Note this requires two passes, so it can't be done on a pipe
spideyToPsl:	Convert NCBI spidey pair alignments to PSL format
splitFa:	split a big FA file into smaller ones.
splitFaIntoContigs:
splitFile:	Split up a file
splitSim:	Simulate gapless distribution size
spm3:	from all mRNAs in a genome (e.g. rn3) referenced by SWISS-PROT generate a list of proteins and a list of protein/mRNA pairs.
spm6:	generates sorted.lis and knownGene0.tab for further duplicates processing
spm7:	Create sorted list of mRNA-SP data file for further duplicates processing
sqlToXml:	dump out all or part of a relational database to XML, guided by a dump specification. See sqlToXml.doc for additional information.
stToXao:	make indices into st file, one for each chromosome.
stageMultiz:	Stage input directory for Webb's multiple aligner
stanToBedAndExpRecs:	takes a pslFile of alignments and a list of stanfords expression data files and converts them into a bed file with the scores and experiment ids. Also creates a corresponding file of expRecords which idicate what the
stitchea:	joins together EA files into one big one, throwing out overlaps. Will complain if there's any missing data.
stitcher:	third pass of genomic/genomic alignment. Stitches together 2000x5000 base 7-state alignments into longer contigs.
stringify:	Convert file to C strings
subChar:	Substitute one character for another throughout a file.
subs:	Subs - a utility to perform massive string substitutions on source
subsetAxt:	Rescore alignments and output those over threshold
subsetTraces:	Build subset of mouse traces that actually align
tableSum:	Summarize a table somehow
tailLines:	add tail to each line of file
textHist2:	Make two dimensional histogram table out of a list of 2-D points, one per line.
textHistogram:	Make a histogram in ascii
tfbsConsLoc:
tfbsConsSort:	a utility to sort tfbsCons files before loading them
tickToDate:	Convert seconds since 1970 to time and date
toDev64:	A program that copies data from the old hgwdev database to the new hgwdev database.
toLower:	Convert upper case to lower case in file. Leave other chars alone
toUpper:	Convert lower case to upper case in file. Leave other chars alone
trackOverlap:	trackOverlap- Overlap how much of a track is overlapped by other tracks and vice versa. This is done by correlating series of bitmap projections (i.e. featureBits multiple times).
trfBig:	Mask tandem repeats on a big sequence file.
twinOrf:	Predict open reading frame in cDNA given a cross species alignment
twinOrf2:	Predict open reading frame in cDNA given a cross species alignment
twinOrf3:	Predict open reading frame in cDNA given a cross species alignment
twinOrfStats:	Collect stats on refSeq cDNAs aligned to another species via axtForEst
twinOrfStats2:	Collect stats on refSeq cDNAs aligned to another species via axtForEst
twinOrfStats3:	Collect stats on refSeq cDNAs aligned to another species via axtForEst
twoBitInfo:	get information about sequences in a .2bit file
twoBitToFa:	Convert all or part of .2bit file to fasta
undupFa:	rename duplicate records in FA file usage undupFa faFile(s)
updateStsInfo:
upper:	strip numbers, spaces, and punctuation turn to upper case
vegaBuildInfo:
venn:	Do venn diagram calculations
vgPrepImage:	Create thumbnail and image pyramid scheme for image, also link in full sized image
wabToSt:	Convert WABA output to something Intronerator understands better
whyConserved:	Try and analyse why a particular thing is conserved
wigEncode:	convert Wiggle ascii data to binary format
wordLine:	chop up words by white space and output them with one word to each line.
xmlCat:	Concatenate xml files together, stuffing all records inside a single outer tag.
xmlToSql:	Convert XML dump into a fairly normalized relational database in the form of a directory full of tab-separated files and table creation SQL. You'll need to run autoDtd on the XML file first to

@@ Line 704: / Line 704: @@
     creation SQL.  You'll need to run autoDtd on the XML file first to</TD></TR>
 </TABLE>
+[[Category:Technical FAQ]]

Kent source utilities: Difference between revisions

Revision as of 15:24, 20 May 2006

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

related sites

hosted projects

Tools