Track metadata handling
I have a topic prompted specifically by the ENCODE grant proposal, but it's one that I think could have broad applicability -- how to store and use track 'metadata'. What capabilities do you think we can/should provide relating to metadata ? Are there helpful examples at other bioinformatics sites (e.g. NIH DCC's) that you have seen ?
For ENCODE, metadata typically includes which cell lines were used for an experiment, which antibodies for chip/chip, sometimes timecourse of an experiment (e.g. at 0, 8, and 24 hrs). The ENCODE users may want to locate, for example, all datasets on HeLa cells. More generally at our site, we get ML questions asking if we have XX type experimental data on any organism/assembly. We currently keep metadata in trackDb settings and the track description, and have no explicit search mechanisms. --Kate 12:38, 12 February 2007 (PST)
The HapMap DCC site has great metadata examples. See the Downloads|Documentation section here (the 'Bulk Data Download' link from the main page): http://www.hapmap.org/downloads/index.html.en The Protocols (including versioning) maps directly to what we need. We'll also need a mechanism for tracking reagents -- individual cell lines, antibodies, etc. We should also keep track of chip designs in GEO/ArrayExpress. The HapMap DCC uses XML to communicate the metadata, and has gone through many updates of their formats (http://www.hapmap.org/downloads/xml_docs/). There are more metadata examples on the HapMap DCC internal site, but it is down at the moment. I can send the access info later. The main difference between the HapMap and ENCODE DCCs is going to be the expansion in data types. The output of the HapMap project was primarily diploid genotypes, so this provided a fixed point that allowed many inputs (different genotyping platforms and protocols, different populations and individual samples) and many outputs (analyses -- genotype/allele frequencies, phasing, LD, etc.) The ENCODE DCC will need to be quite a bit more flexible to handle all of the various data types.