Frequently asked mailing list questions: Difference between revisions
(changed cse to soe) |
|||
(12 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
This page is intended to be a collection of previously answered questions on the Genome and Genome-Mirror mailing lists that are useful for answering repeat questions. | This FAQ page is intended to be a collection of previously answered questions on the Genome and Genome-Mirror mailing lists that are useful for answering repeat questions. | ||
==Genome== | ==Genome== | ||
Line 43: | Line 43: | ||
**[http://www.soe.ucsc.edu/pipermail/genome/2007-April/013185.html http://www.soe.ucsc.edu/pipermail/genome/2007-April/013185.html] | **[http://www.soe.ucsc.edu/pipermail/genome/2007-April/013185.html http://www.soe.ucsc.edu/pipermail/genome/2007-April/013185.html] | ||
*''...with'' | *''...with'' | ||
**[http://www. | **[http://www.soe.ucsc.edu/pipermail/genome/2006-July/011100.html http://www.soe.ucsc.edu/pipermail/genome/2006-July/011100.html] | ||
**[http://www. | **[http://www.soe.ucsc.edu/pipermail/genome/2006-September/011672.html http://www.soe.ucsc.edu/pipermail/genome/2006-September/011672.html] | ||
**[http://www. | **[http://www.soe.ucsc.edu/pipermail/genome/2006-September/011712.html http://www.soe.ucsc.edu/pipermail/genome/2006-September/011712.html] | ||
**[http://www. | **[http://www.soe.ucsc.edu/pipermail/genome/2006-December/012312.html http://www.soe.ucsc.edu/pipermail/genome/2006-December/012312.html] | ||
**[http://www. | **[http://www.soe.ucsc.edu/pipermail/genome/2006-February/009847.html http://www.soe.ucsc.edu/pipermail/genome/2006-February/009847.html] | ||
**[http://www. | **[http://www.soe.ucsc.edu/pipermail/genome/2005-October/008822.html http://www.soe.ucsc.edu/pipermail/genome/2005-October/008822.html] | ||
''Is there a size limit for custom tracks?'' | ''Is there a size limit for custom tracks?'' | ||
Line 82: | Line 82: | ||
''Questions about SNPs?'' | ''Questions about SNPs?'' | ||
*[http://lists.soe.ucsc.edu/pipermail/genome/2009-September/020019.html Angie SNP mega-answer ] | *[http://lists.soe.ucsc.edu/pipermail/genome/2009-September/020019.html Angie SNP mega-answer ] | ||
*Downloading snp#.txt.gz and searching rs#s with command line, [https://groups.google.com/a/soe.ucsc.edu/forum/?hl=en#!searchin/genome/zcat$20snp137.txt.gz$20$7C/genome/vEPTdoRccas/i7u_6xCeVDUJ zcat snp137.txt.gz | grep -wf my_IDs.txt > my_SNPs.txt] | |||
''Instructions for downloading jksrc'' | ''Instructions for downloading jksrc'' | ||
Line 107: | Line 108: | ||
== ENCODE == | == ENCODE == | ||
Suggest checking out / referring person to [http://genome.ucsc.edu/ENCODE/FAQ/index.html ENCODE Resources and FAQ page] <br> http://genome.ucsc.edu/ENCODE/FAQ/index.html | |||
=== Helpful resources === | === Helpful resources === | ||
Line 118: | Line 121: | ||
=== Questions === | === Questions === | ||
''How do I display ENCODE data at GEO in the genome browser ?'' | 1) ''How do I display ENCODE data at GEO in the genome browser ?'' | ||
* Not by loading a custom track! Basically all ENCODE data at GEO data are already hosted in tracks at UCSC. Use Track Search and enter the GEO sample accession (GSM). | * Not by loading a custom track! Basically all ENCODE data at GEO data are already hosted in tracks at UCSC. Use Track Search and enter the GEO sample accession (GSM). A great answer by Pauline is here: [http://redmine.soe.ucsc.edu/issues/10037] | ||
''Which cell types are used by ENCODE ? Did XXX ENCODE track use standard ENCODE cell protocols ? What was the ENCODE growth protocol for cell type YYY ?'' | 2) ''Which cell types are used by ENCODE ? Did XXX ENCODE track use standard ENCODE cell protocols ? What was the ENCODE growth protocol for cell type YYY ?'' | ||
* See Cell Types page on portal. All ENCODE tracks use protocols registered on this page. Click 'Documents' link to see growth protocol. If you have further questions, contact lab that registered the protocol [http://redmine.soe.ucsc.edu/issues/9689] [http://redmine.soe.ucsc.edu/issues/9656] | * See Cell Types page on portal. All ENCODE tracks use protocols registered on this page. Click 'Documents' link to see growth protocol. If you have further questions, contact lab that registered the protocol [http://redmine.soe.ucsc.edu/issues/9689] [http://redmine.soe.ucsc.edu/issues/9656] | ||
''Has transcription factor XXX been mapped by ENCODE ? How do I find overlaps between my own ChIP-seq regions and ENCODE transcription factors ?'' | 3) ''Has transcription factor XXX been mapped by ENCODE ? How do I find overlaps between my own ChIP-seq regions and ENCODE transcription factors ?'' | ||
* Use ChIP-seq Experiment Matrix to show mapped TFs. Use Table Browser to intersect ENCODE Regulation Txn Factor clusters with custom track of user regions. [http://redmine.soe.ucsc.edu/issues/9660] | * Use ChIP-seq Experiment Matrix to show mapped TFs. Use Table Browser to intersect ENCODE Regulation Txn Factor clusters with custom track of user regions. [http://redmine.soe.ucsc.edu/issues/9660] | ||
''What is represented by field NN in ENCODE bed files ?'' | 4) ''What is represented by field NN in ENCODE bed files ?'' | ||
* See File Formats page for descriptions of ENCODE 'peak' file formats. See track descriptions for how scores and values were derived. | * See File Formats page for descriptions of ENCODE 'peak' file formats. See track descriptions for how scores and values were derived. | ||
''What is the difference between file XX and files XXV2 ? Why is file XX not displayed in the browser ?'' | 5) ''What is the difference between file XX and files XXV2 ? Why is file XX not displayed in the browser ?'' | ||
* Versioned files are often revoked and so not viewable in the browser, though still available for download. Revoke status is shown in metadata (files.txt). | * Versioned files are often revoked and so not viewable in the browser, though still available for download. Revoke status is shown in metadata (files.txt). | ||
''How do I extract information about an ENCODE experiment from the filename ?'' | 6) ''How do I extract information about an ENCODE experiment from the filename ?'' | ||
* Don't do it! Filenames have some metadata embedded, but can only be relied on to be unique. Use file metadata, available in the following places: | * Don't do it! Filenames have some metadata embedded, but can only be relied on to be unique. Use file metadata, available in the following places: | ||
** Downloads directories: files.txt file | ** Downloads directories: files.txt file | ||
Line 139: | Line 142: | ||
** Track/File search | ** Track/File search | ||
** genome-mysql, table browser: metaDb table | ** genome-mysql, table browser: metaDb table | ||
== Example Galaxy Answers == | |||
*[https://www.google.com/search?q=Galaxy+++site%3Alists.soe.ucsc.edu%2Fpipermail%2Fgenome&num=10&filter=0&qq=Galaxy Link to search archives about Galaxy.] | |||
*''Introduce Galaxy, Refer user to Galaxy Help'' | |||
The bioinformatics tools at Galaxy may be of help to you: http://usegalaxy.org. The "Galaxy 101" tutorial features getting data from the UCSC table browser: http://wiki.galaxyproject.org/Learn/Screencasts. If you plan to use Galaxy's tools, please address questions to Galaxy: http://wiki.galaxyproject.org/Support. They should be able to help you with whatever questions you may have about their website. | |||
*''Example of Galaxy Join'' | |||
If you don't have a good way to accomplish a join of the tables, you | |||
could use Galaxy: https://main.g2.bx.psu.edu/. You would need to first | |||
fetch each of the tables separately using the "UCSC Main table browser" | |||
link (under "Get Data"), and then join them on the | |||
xxxrefGene.name/gbStatus.accxxx fields using the "Join two Datasets" link | |||
(under "Join, Subtract and Group"). | |||
*''Using Galaxy to find SNPs near genes'' | |||
A method identify the distances of SNPs to a nearest gene would be to | |||
send the gene track of interest over to Galaxy via the table browser. | |||
Then use the "Fetch closest non-overlapping feature for every interval" | |||
under the "Operate on Genomic Intervals" menu. | |||
*''Using Galaxy to filtering RNA from MAF'' | |||
Galaxy has GTF/GFF tools under the "Filter and Sort" menu. If you | |||
have questions about these tools you will need to contact them. | |||
*''Using Galaxy to reverse-complementing MAF data'' | |||
According to this page: http://g2.trac.bx.psu.edu/wiki/MAFanalysis | |||
there is a tool to "Reverse complement a MAF file". | |||
https://lists.soe.ucsc.edu/pipermail/genome/2008-November/017512.html | |||
*''Using Galaxy to convert file from FASTA to analyze in EXCEL and ACCESS'' | |||
Galaxy (http://main.g2.bx.psu.edu/) has some data manipulation tools | |||
that should be of help. Go to their site, and on the left select "FASTA | |||
manipulation" and select "FASTA-to-Tabular converter": | |||
<http://main.g2.bx.psu.edu/tool_runner?tool_id=fasta2tab> | |||
*''Using Galaxy to create your own multiple sequence alignments'' | |||
Galaxy has several tools that look like they might be useful to you. | |||
See "Filter MAF blocks by Species," "Extract MAF blocks given a set of genomic | |||
intervals," and "Stitch Gene blocks given a set of coding exon | |||
intervals" on the left-hand side of the page under the "Fetch | |||
Alignments" header. | |||
https://lists.soe.ucsc.edu/pipermail/genome/2011-May/026067.html | |||
*''Using Galaxy to convert wigToBigWig'' | |||
We don't supply any executables for running wigToBigWig on Windows. An | |||
easier solution would be to use the Galaxy website (usegalaxy.org). In | |||
the Tools menu on the left-hand side of the page, select "Convert | |||
Formats" and then "Wig/BedGraph-to-bigWig converter." | |||
*''Using Galaxy to divide whole genome into 1 Mb regions'' | |||
One way you could do this would be to make a second custom track | |||
consisting of 1 Mb regions and then intersect that with your first | |||
custom track. Or select your custom track in the Table Browser | |||
and then check the box to "Send output to Galaxy". Galaxy | |||
(http://main.g2.bx.psu.edu/) has some additional tools to help | |||
manipulate data. The "Regional Variation" menu on the left-hand side of | |||
the page has a "Make windows" tool and a "Feature coverage" tool that | |||
look especially useful. | |||
*''Using Galaxy to join, example GTF files with geneSymbol''' | |||
There is not a way to alter GTF output in the Table Browser, but Galaxy | |||
is an extensive set of tools that work in conjunction with the Genome | |||
Browser that can help do manipulations of data just like this. | |||
Use the "Join two Queries side by side on a specified field" | |||
under the "Join, Subtract and Group" header on the | |||
left-hand side of the page, and perhaps the "Text Manipulation" tools. | |||
*''Using Galaxy to join, example derive Ensembl name from GNF atlas data'' | |||
Use Galaxy's "Get Data" to load knownToGnf1m table and then load the | |||
ensGene table. | |||
Once you have loaded those tables into Galaxy, click on the "Join, | |||
Subtract and Group" link on the left-hand side of the page. Then click | |||
on "join two queries" on column 1. | |||
[[Category:FAQs]] | [[Category:FAQs]] | ||
[[Category:Browser QA]] | [[Category:Browser QA]] | ||
[[Category:Browser QA Training]] | [[Category:Browser QA Training]] |
Latest revision as of 07:57, 1 September 2018
This FAQ page is intended to be a collection of previously answered questions on the Genome and Genome-Mirror mailing lists that are useful for answering repeat questions.
Genome
Helpful Items
If a user is looking for human or mouse genome updates, point them to:
To report errors in the human or mouse assemblies:
For users looking for help identifying effects of their novel SNPs, send them to (thanks Angie):
To check a user's Browser session in hgcentral:
- From a shell prompt, enter a command similar to the following (depending on the username or session name you are searching for):
$ hgsql -h genome-centdb -e "select userName,sessionName,shared,firstUse,lastUse,useCount from namedSessionDb where userName like 'Gunnar%'" hgcentral +------------------------+--------------------+--------+---------------------+---------------------+----------+ | userName | sessionName | shared | firstUse | lastUse | useCount | +------------------------+--------------------+--------+---------------------+---------------------+----------+ | Gunnar%20H. | 2Lfullgenom | 1 | 2011-07-19 23:37:37 | 2011-09-27 13:42:47 | 43 | | Gunnar%20H. | 2Rfullgenom | 1 | 2011-07-20 01:42:25 | 2011-08-20 05:17:27 | 28 | | Gunnar%20H. | 3Lfullgenom | 1 | 2011-07-20 03:43:01 | 2011-09-27 06:43:31 | 31 | | Gunnar%20H. | 3Rfullgenom | 1 | 2011-07-19 13:22:43 | 2011-08-20 05:18:21 | 31 | | Gunnar%20H. | 4fullgenom | 1 | 2011-07-20 03:49:44 | 2011-09-27 07:34:00 | 25 | | Gunnar%20H. | dm3cov | 1 | 2011-07-05 17:22:22 | 2011-08-25 05:02:24 | 22 | | Gunnar%20H. | dm3sub | 1 | 2011-07-19 11:12:51 | 2011-07-27 00:22:07 | 4 | | Gunnar%20H. | Split | 1 | 2011-09-03 05:00:42 | 2011-09-05 08:18:29 | 4 | | Gunnar%20H. | Xfullgenom | 1 | 2011-07-20 05:21:20 | 2011-09-27 07:33:21 | 23 | | Gunnar.thor.sigurdsson | mm9_test_session_1 | 1 | 2010-09-29 03:34:38 | 2010-09-29 03:34:38 | 0 | +------------------------+--------------------+--------+---------------------+---------------------+----------+
To save a user's session to a file:
- From a shell prompt, enter a command similar to the following:
$ hgsql -h genome-centdb -Ne "select contents from namedSessionDb where userName like 'Gunnar%' and sessionName='2Lfullgenom'" hgcentral > gunnarSession
- The session can then be restored from this file on the hgSession page in the "Restore Settings" section
Questions
I have a list of Gene Symbols and I would like to get corresponding sequences for them.
Sharing custom tracks
Help me create a Custom Track
- ...with shades of grey.
- ...with
- http://www.soe.ucsc.edu/pipermail/genome/2006-July/011100.html
- http://www.soe.ucsc.edu/pipermail/genome/2006-September/011672.html
- http://www.soe.ucsc.edu/pipermail/genome/2006-September/011712.html
- http://www.soe.ucsc.edu/pipermail/genome/2006-December/012312.html
- http://www.soe.ucsc.edu/pipermail/genome/2006-February/009847.html
- http://www.soe.ucsc.edu/pipermail/genome/2005-October/008822.html
Is there a size limit for custom tracks?
How do I find non-protein-coding genes?
I have a list of identifiers, how do I find the coordinates?
Format of chain, chainLink and net tables
How do I get a table of restriction enzymes?
Note that the utility findCutters is better than oligoMatch: oligoMatch has no good way of finding AsuI (G'GnC_C). It's possible but it would need to be run four times: GGACC, GGCCC, GGGCC, GGTCC and the output combined. findCutters does it all in one command.
GO
How do I find orthologous genes (using TransMap)
How do I find telomeres and centromeres?
Questions about SNPs?
- Angie SNP mega-answer
- Downloading snp#.txt.gz and searching rs#s with command line, zcat snp137.txt.gz | grep -wf my_IDs.txt > my_SNPs.txt
Instructions for downloading jksrc
I want to compare species A with species B
To tell a user we would be willing to add a permanent custom track
Is multiwig functionality available for custom tracks
Why do some gene have startCodon = stopCodon (thickStart = thickEnd)?
How do I get a list of SNPs that correspond to my gene?
How do I cross-reference UCSC gene names to RefSeq gene names?
Genome-Mirror
ENCODE
Suggest checking out / referring person to ENCODE Resources and FAQ page
http://genome.ucsc.edu/ENCODE/FAQ/index.html
Helpful resources
- Cell Types page
- Experiment matrix
- Track and File Search
- Publications page
- Nature microsite
- File Formats page
Questions
1) How do I display ENCODE data at GEO in the genome browser ?
- Not by loading a custom track! Basically all ENCODE data at GEO data are already hosted in tracks at UCSC. Use Track Search and enter the GEO sample accession (GSM). A great answer by Pauline is here: [1]
2) Which cell types are used by ENCODE ? Did XXX ENCODE track use standard ENCODE cell protocols ? What was the ENCODE growth protocol for cell type YYY ?
- See Cell Types page on portal. All ENCODE tracks use protocols registered on this page. Click 'Documents' link to see growth protocol. If you have further questions, contact lab that registered the protocol [2] [3]
3) Has transcription factor XXX been mapped by ENCODE ? How do I find overlaps between my own ChIP-seq regions and ENCODE transcription factors ?
- Use ChIP-seq Experiment Matrix to show mapped TFs. Use Table Browser to intersect ENCODE Regulation Txn Factor clusters with custom track of user regions. [4]
4) What is represented by field NN in ENCODE bed files ?
- See File Formats page for descriptions of ENCODE 'peak' file formats. See track descriptions for how scores and values were derived.
5) What is the difference between file XX and files XXV2 ? Why is file XX not displayed in the browser ?
- Versioned files are often revoked and so not viewable in the browser, though still available for download. Revoke status is shown in metadata (files.txt).
6) How do I extract information about an ENCODE experiment from the filename ?
- Don't do it! Filenames have some metadata embedded, but can only be relied on to be unique. Use file metadata, available in the following places:
- Downloads directories: files.txt file
- Track UI: down-arrow next to subtrack
- Track/File search
- genome-mysql, table browser: metaDb table
Example Galaxy Answers
- Introduce Galaxy, Refer user to Galaxy Help
The bioinformatics tools at Galaxy may be of help to you: http://usegalaxy.org. The "Galaxy 101" tutorial features getting data from the UCSC table browser: http://wiki.galaxyproject.org/Learn/Screencasts. If you plan to use Galaxy's tools, please address questions to Galaxy: http://wiki.galaxyproject.org/Support. They should be able to help you with whatever questions you may have about their website.
- Example of Galaxy Join
If you don't have a good way to accomplish a join of the tables, you could use Galaxy: https://main.g2.bx.psu.edu/. You would need to first fetch each of the tables separately using the "UCSC Main table browser" link (under "Get Data"), and then join them on the xxxrefGene.name/gbStatus.accxxx fields using the "Join two Datasets" link (under "Join, Subtract and Group").
- Using Galaxy to find SNPs near genes
A method identify the distances of SNPs to a nearest gene would be to send the gene track of interest over to Galaxy via the table browser. Then use the "Fetch closest non-overlapping feature for every interval" under the "Operate on Genomic Intervals" menu.
- Using Galaxy to filtering RNA from MAF
Galaxy has GTF/GFF tools under the "Filter and Sort" menu. If you have questions about these tools you will need to contact them.
- Using Galaxy to reverse-complementing MAF data
According to this page: http://g2.trac.bx.psu.edu/wiki/MAFanalysis there is a tool to "Reverse complement a MAF file". https://lists.soe.ucsc.edu/pipermail/genome/2008-November/017512.html
- Using Galaxy to convert file from FASTA to analyze in EXCEL and ACCESS
Galaxy (http://main.g2.bx.psu.edu/) has some data manipulation tools that should be of help. Go to their site, and on the left select "FASTA manipulation" and select "FASTA-to-Tabular converter": <http://main.g2.bx.psu.edu/tool_runner?tool_id=fasta2tab>
- Using Galaxy to create your own multiple sequence alignments
Galaxy has several tools that look like they might be useful to you. See "Filter MAF blocks by Species," "Extract MAF blocks given a set of genomic intervals," and "Stitch Gene blocks given a set of coding exon intervals" on the left-hand side of the page under the "Fetch Alignments" header. https://lists.soe.ucsc.edu/pipermail/genome/2011-May/026067.html
- Using Galaxy to convert wigToBigWig
We don't supply any executables for running wigToBigWig on Windows. An easier solution would be to use the Galaxy website (usegalaxy.org). In the Tools menu on the left-hand side of the page, select "Convert Formats" and then "Wig/BedGraph-to-bigWig converter."
- Using Galaxy to divide whole genome into 1 Mb regions
One way you could do this would be to make a second custom track consisting of 1 Mb regions and then intersect that with your first custom track. Or select your custom track in the Table Browser and then check the box to "Send output to Galaxy". Galaxy (http://main.g2.bx.psu.edu/) has some additional tools to help manipulate data. The "Regional Variation" menu on the left-hand side of the page has a "Make windows" tool and a "Feature coverage" tool that look especially useful.
- Using Galaxy to join, example GTF files with geneSymbol'
There is not a way to alter GTF output in the Table Browser, but Galaxy is an extensive set of tools that work in conjunction with the Genome Browser that can help do manipulations of data just like this. Use the "Join two Queries side by side on a specified field" under the "Join, Subtract and Group" header on the left-hand side of the page, and perhaps the "Text Manipulation" tools.
- Using Galaxy to join, example derive Ensembl name from GNF atlas data
Use Galaxy's "Get Data" to load knownToGnf1m table and then load the ensGene table. Once you have loaded those tables into Galaxy, click on the "Join, Subtract and Group" link on the left-hand side of the page. Then click on "join two queries" on column 1.