Talk:Browser Agreement Action Plan: Difference between revisions

From genomewiki
Jump to navigationJump to search
Line 34: Line 34:


[[User:Hiram|Hiram]] 21:02, 23 September 2008 (UTC)
[[User:Hiram|Hiram]] 21:02, 23 September 2008 (UTC)
== Comments from Jim Kent ==
In general I think it is a good thing to agree on a standard genome sequence format for submission to the international sequence databases, and to make sure that genome sequences are submitted before building annotation browsers on top of them.  It will help as many annotations as possible end up in the same place.  I generally like this proposal.  I think it is certainly complex enough though, and I would from here be more inclined to drop features than to add new features from here out.  Hiram's outlined the features that we consider essential.

Revision as of 22:02, 23 September 2008

Sign your comments with four tildes ~~~~ to provide a signature

comments from Hiram

The primary set of files we work with are the AGP files, the fasta files, and the quality files. Component to scaffold AGP, and scaffold to chromosome AGP are a good set of AGP files to have. Components are less important to UCSC than scaffolds.

The mouse and human assemblies also have cytogenetic map information. There has been confusion about who makes those. In recent times UCSC has become the default source for that. I don't know if want to keep that job. It is a bit of an obscure procedure involving ancient code from Terry and some other unusual files from NCBI.

Alternate alleles are most likely going to proliferate. They should certainly be supplied. UCSC needs to decide on how to display the alternate alleles.

Quality scores can be supplied in either component or scaffold coordinate systems. As long as there are AGP files to relate them to chromosomes, we can convert the quality scores to chromosome coordinates. We do not need multiple copies.

Updates (implying versioning) could be a big headache. Once we build a browser on any one particular version, we don't want to hear about it again until there is a significant release to the next version. Therefore, I would expect releases to happen on whole assemblies. I wouldn't want to deal with partial updates to an existing release. Way too much trouble with that.

For handshake communication, I would recommend a limited email list with the primary representatives at each center and browser build team on the email list.

Given XML data structures for assembly metadata UCSC will most likely be converting that XML into simple tag=value .ra text files which we find most convenient. As long as we can parse the XML we should be OK.

Hiram 21:02, 23 September 2008 (UTC)

Comments from Jim Kent

In general I think it is a good thing to agree on a standard genome sequence format for submission to the international sequence databases, and to make sure that genome sequences are submitted before building annotation browsers on top of them. It will help as many annotations as possible end up in the same place. I generally like this proposal. I think it is certainly complex enough though, and I would from here be more inclined to drop features than to add new features from here out. Hiram's outlined the features that we consider essential.