Cell Browser filters

From Genecats
Jump to navigationJump to search

The Cell Browser utilizes a set of filters to allow people to narrow down the dataset list to only those of interest:

Cellbrowser filters.png

The values in these filters are determined based on tags in a dataset's cellbrowser.conf:

  • body_parts
  • diseases
  • projects
  • organisms
  • sources
  • life_stages
  • domains

This page will walk you through the process of curating these tag values for a single dataset. You can combine some of the steps here regarding BLAH with the information on the Managing_cellbrowser.conf_tag_values_for_multiple_datasets page to update the values for filter tags for many datasets.

Tag/value conventions

This sections covers our internal conventions for each set of tag/value pairs.

body_parts

For us at the Cell Browser, this tag is required for every dataset. (As determined by the 'reqTags' in your ~/.cellbrowser.conf

Values in this field are always lower case.

If you have a super high-level value, it's good to have a lower-level one as well. e.g. if you include 'brain' then you should also include a more specific brain region like 'cortex', 'hippocampus', etc.

diseases

If data is only from a non-diseased sample, use the value 'Healthy'.

Since the Cell Browser displays data at the cellular level, we need to describe the disease state of the specimen from which data was generated from. Some examples:

  • Donor has Type 2 Diabetes but donated skin for research. Skin is sequenced, then diseases = ["Healthy"], despite the donor having diabetes. Donor-level health can be mentioned in the abstract or methods section.
  • Donor has Pulmonary Fibrosis and donates lung tissue. The lung is sequenced, so then diseases = ["Pulmonary Fibrosis"] since the specimen that is sequenced was not in a healthy state.
  • Sometimes samples are taken from adjacent tissue next to a tumor. If the dataset includes the tumor sample and the adjacent tissue, then diseases = ["<disease/cancer type here>","Healthy"]. You can also have multiple diseases affiliated with a dataset.

If data covers a disease, look it up in MONDO disease ontology to ensure that we're using a common label for all datasets of that disease. However, if a disease is not listed under MONDO try Human Phenotype Ontology (HP). If this disease dataset also includes healthy samples, then include the value 'Healthy Control'.

The distinction between 'Healthy' and 'Healthy Control' allows people who want to see only healthy datasets to see those and not clutter the list with disease datasets. (Often the healthy control samples are mixed in with the disease samples and separating them out is non-trivial.)

projects

This setting is used to group dataset collections together for a particular project. Sometimes projects have the same funding agency or they are under the same initiative/grant. Good places to check if the data is associated with a project are in the acknowledgments and data availability sections of the papers. They are also sometimes mentioned in press articles and/or Twitter announcements.

Common projects that we have worked with:

  • Human Cell Atlas (HCA)
  • California Institute of Regenarative Medicine (CIRM)
  • Tabula Muris Consortium
  • GTEx
  • Allen Brain Atlas
  • GSA for Human
  • Mouse Cell Atlas
  • The Alexandria Project
  • Fly Cell Atlas
  • EvoCell

organisms

List all species included in the dataset (or subdatasets).

For vertebrate species use the form: Common name (G. species) e.g. Human (H. sapiens) Mouse (M. musculus)

For non-vertebrates, use the form: G. species e.g. C. robusta

life_stages

Human embryonic stage 0-8 weeks
fetal stage week 9-until birth
newborn stage 0-1 month
infant stage 1-24 months
child stage 2-12 years old
adolescent stage 13-18 years old
adult stage 19+ years
Mouse
embryonic stage 1-15 days
fetal stage day 15-until birth
early immature stage 1-7 days
infant stage 1-5 weeks
adolescent stage 6 weeks-2 months
adult stage 2+ months
Drosophila melanogaster
embryo stage 0-20 hours
larva stage 1-3 days
pupa stage 4-8 days
adult stage 9+ days

domains

Please specify these domains if any of the following applies to your dataset.

Development
- Dataset uses donors from multiple life stages
- Mentions ‘development’ in the description
- Looks at samples across a timecourse
- Organoid growth experiments
- Fetal/embryonic datasets
Aging
- Age is a major component of the study (Adult Pancreas, Aging Brain, Aging Human Skin)
- Timecourse component (Tabula Muris, Tabula Muris Senis)
Neurodegeneration
- Multiple sclerosis (MS), Parkinson's disease, Alzheimer's disease
Cancer
- Mention of cancer in the project description
- Tumor samples
Atlas
- Keywords in the description of the project like “Atlas” and “Landscape”
- Multi organ (Tabula Sapiens)
- Multi organ_parts (Immune Cell Atlas, Heart Cell Atlas)
Disease Model
- Samples compared to a healthy control (Lung in Pulmonary Fibrosis vs Control, Mouse Skin Stretch Response, Mouse DRG Injury)
COVID-19
- Mention of COVID-19
Stem Cells
- Mention of organoids
- Mention of stem cells
- Mention of stem cell lines (H1, H9, etc.)
- Mention of induced pluripotent stem cells (aka iPSC or iPS)
Evolution
- Sometimes include species other than Mouse and Human
- Looking at heritable changes over time
- Mention evolution in the description, paper, and grant (EvoCell Project)
Survey
- Focus on a large number of cells and their connections (Mouse Nervous System)
- Mention of “survey” in description and/or paper (Gut Cell Survey)
- Keywords like “Heterogeneity” (Mouse Oligodendrocyte Heterogeneity)

sources

Where you got the data from.

Adding tags to cellbrowser.conf