Managing cellbrowser.conf tag values for multiple datasets

From Genecats
Revision as of 18:00, 18 July 2022 by Mspeir (talk | contribs) (First draft of page, more to come)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

The cellbrowser.conf is made up of a series of tag/value pairs. For example, in the line “body_parts=[‘brain’]”, “body_parts” is the tag and “[‘brain’]” is the value. This page goes over how to manage (add/update/query) these tags across some/all of the datasets we host.

Adding/updating tags

Most of the tag/value pairs point the cell browser to the various input files (e.g. ). However, some of them are needed to control the filter options, and include:

  • body_parts
  • diseases
  • projects
  • organisms
  • sources
  • life_stages
  • domains

There may be more added in the future. For new datasets, you should be configuring these settings as you add the datasets. If new tags are added, it may be necessary to add and backfill these settings in the cellbrowser.conf files of hundreds of datasets. This type of mass update can be managed using the script addTags.

The input for addTags is an N-column, tab-separated file. The first column is the dataset name and following columns are the values you want added or updated. A header line is required and each column needs an entry in the header line that tells the script what tag the values are associated with. Here’s the first few lines of the file that were used to add disease, projects, and organisms tags when they were added:

dataset            body_parts organisms           projects                diseases
adultPancreas      pancreas   Human (H. sapiens)  CIRM                    Healthy
aging-brain        brain      Mouse (M. musculus)                         Healthy
aging-human-skin   skin       Human (H. sapiens)                          Healthy
mouse-esophagus    esophagus  Mouse (M. musculus)                         Healthy
tabula-muris-senis all        Mouse (M. musculus) Tabula Muris Consortium Healthy
tabulamuris        all        Mouse (M. musculus) Tabula Muris Consortium Healthy
tabula-sapiens     all        Human (H. sapiens)  Tabula Muris Consortium Healthy
gtex8              all        Human (H. sapiens)  GTEx                    Healthy
adult-brain-vasc   brain      Human (H. sapiens)                          Healthy

As you can see the labels in the header are the names of the tags that the values in that column are associated with. When addTags is run, for example, it will add an 'organisms' tag line to the cellbrowser.conf for the aging-brain dataset and set the value to ["Mouse (M. musculus)"] meaning the final line would be: organisms=["Mouse (M. musculus)"].

If you have multiple values for a tag, separate items by a comma.

dental-cells         teeth                           Human (H. sapiens), Mouse (M. musculus)                       Healthy
lepto-metastasis     brain, spinal cord              Human (H. sapiens)                                            Leptomeningeal Melanoma
stanford-czb-hlca    lung                            Human (H. sapiens)                                            Lung Cancer, Healthy Control
teichmann-asthma     lung                            Human (H. sapiens)                                            Asthma, Healthy Control
gut-cell-atlas       gut, colon, ileum, duojejunum   Human (H. sapiens)                      Human Cell Atlas, hca Crohn's Disease, Healthy Control, Healthy
lifespan-nasal-atlas respiratory system, nasal, lung Human (H. sapiens)                                            Influenza, Healthy Control

The script will translate the sets of comma-separated values to py

If the tags already exist in the cellbrowser.conf for a dataset listed, then nothing will be changed. If you want the values in the sheet to replace those current in the cellbrowser.conf, then you will need to add the ‘-u/–update’ option when running addTags.

Here’s an example that you can run for 5 datasets:


Getting tag values

If you want to see the values