Medical Sequencing Data

Some combination of phenotype and genotype ... Details unknown. Worthwhile to design ?

Major Types of information

Subjects
- Anonymous individuals
  - Family history
  - Age
  - Sex
  - Ethnicity
- Pools of people

Environmental data
- Location (zip code would be great)
- Medications they are taking
- Exercise, nutrition ...

Genotype info
- Microarray based
  - SNPs
  - Haplotype blocks
  - Copy number polymorphism
- Sequence based
  - Random reads
  - PCR products
  - larger clones
  - single haplotype vs. diploid

Phenotype
- Disease presence/absence or severity
- ADR - Adverse Drug Reaction
- Single physiological measure
  - Enzyme activity, measure of amount of substance
- Parallel Measures
  - Microarray measurements, etc...

Some Other Database Entities

GenotypeTest - records what regions of genome probed.
Study
- External URL
- Publications
- Contacts
- A group of subjects
- A set of phenotype and genotype tests

Existing Genotype/Phenotype Web Databases

http://www.pharmgkb.org/ - Requires registration for much data. Fan & Jim registered
http://globin.bx.psu.edu/genphen/ - Belinda, Ross and Webb's work, mostly covers hemoglobins

Mock Up of Phenotype Sorter

The Phenotype Sorter would be a web-based application aimed at presenting the full details of phenotype and genotype. The sorter has a line for each individual, and a column for each phenotype assayed in a study, and also a column for each genomic locus assayed in a study. The rows are sorted according the value of a selected phenotype, phe3 in the image above. The genotype columns are divided into a subcolumn for each allele, and at least for the simple nucleotide polymorphisms the alleles are labeled with the associated nucleotide. The number in the second row of the genotype label represents the strength of the locus as a marker for the phenotype. Possibly when sorting the rows by phenotype we should also sort the columns based on this number, though I was thinking of sorting the genotype columns just by position in genome initially.

Mock Up of Genotype/Phenotype Track

The Genotype/Phenotype track would show information from a variety of studies. It would not (at least in the default modes) show subject-by-subject information. Instead it would show the 'marker association probability' at each of the positions assayed in the study. As depicted here it is showing the probability in a little bar graph. The horizontal baseline of the graph serves to link together all the positions assayed. I've grouped studies together using background, but I'm not sure if this will actually work in the genome browser context, and perhaps we could dispense with this. I'm sure it will be a struggle, as it has been with ENCODE, to come up with good 16 letter labels, but if we are able to do this, it should be clear enough.

Medical Sequencing Data

Contents