Managing cellbrowser.conf tag values for multiple datasets
The cellbrowser.conf is made up of a series of tag/value pairs. For example, in the line “body_parts=[‘brain’]”, “body_parts” is the tag and “[‘brain’]” is the value. This page goes over how to manage (add/update/query) these tags across some/all of the datasets we host.
Adding/updating tags
Most of the tag/value pairs point the cell browser to the various input files (e.g. ). However, some of them are needed to control the filter options, and include:
body_parts
diseases
projects
organisms
sources
life_stages
domains
There may be more added in the future. For new datasets, you should be configuring these settings as you add the datasets. If new tags are added, it may be necessary to add and backfill these settings in the cellbrowser.conf files of hundreds of datasets. This type of mass update can be managed using the script addTags
.
The input for addTags is an N-column, tab-separated file. The first column is the dataset name and following columns are the values you want added or updated. A header line is required and each column needs an entry in the header line that tells the script what tag the values are associated with. Here’s the first few lines of the file that were used to add disease, projects, and organisms tags when they were added:
dataset body_parts organisms projects diseases adultPancreas pancreas Human (H. sapiens) CIRM Healthy aging-brain brain Mouse (M. musculus) Healthy aging-human-skin skin Human (H. sapiens) Healthy mouse-esophagus esophagus Mouse (M. musculus) Healthy tabula-muris-senis all Mouse (M. musculus) Tabula Muris Consortium Healthy tabulamuris all Mouse (M. musculus) Tabula Muris Consortium Healthy tabula-sapiens all Human (H. sapiens) Tabula Muris Consortium Healthy gtex8 all Human (H. sapiens) GTEx Healthy adult-brain-vasc brain Human (H. sapiens) Healthy
As you can see the labels in the header are the names of the tags that the values in that column are associated with. When addTags
is run, for example, it will add an 'organisms' tag line to the cellbrowser.conf for the aging-brain dataset and set the value to ["Mouse (M. musculus)"] meaning the final line would be: organisms=["Mouse (M. musculus)"]
.
If you have multiple values for a tag, separate items by a comma.
dental-cells teeth Human (H. sapiens), Mouse (M. musculus) Healthy lepto-metastasis brain, spinal cord Human (H. sapiens) Leptomeningeal Melanoma stanford-czb-hlca lung Human (H. sapiens) Lung Cancer, Healthy Control teichmann-asthma lung Human (H. sapiens) Asthma, Healthy Control gut-cell-atlas gut, colon, ileum, duojejunum Human (H. sapiens) Human Cell Atlas, hca Crohn's Disease, Healthy Control, Healthy lifespan-nasal-atlas respiratory system, nasal, lung Human (H. sapiens) Influenza, Healthy Control
The script will translate the sets of comma-separated values to py
If the tags already exist in the cellbrowser.conf for a dataset listed, then nothing will be changed. If you want the values in the sheet to replace those current in the cellbrowser.conf, then you will need to add the ‘-u/–update’ option when running addTags.
Here’s an example that you can run for 5 datasets:
Getting tag values
If you want to see the values