Cell Browser wrangling guided examples

From Genecats
Jump to navigationJump to search

This page will walk you through the basics of wrangling two cell browsers using the two main tools: cbImportSeurat and cbImportScanpy. The examples will be divided into three parts, each roughly corresponding to the stages of wrangling a dataset for the Cell Browser. All command-line steps are done on hgwdev.

Using cbImportScanpy

This section is intended to teach you the basics of using cbImportScanpy and how it fits into the wrangling process in general. To do so, we will be importing data from the h5ad file for the liver segment of Tabula Sapiens. We will use an h5ad file, which is written out using the python package AnnData and (almost) always compatible with cbImportScanpy.

Part 1: Directory setup and data export

In this first section, we will go through the process of setting up a directory in which you will download and then import the data for a cell browser.

Ensure that you are in the proper conda environment:

conda activate scanpyenv

Change into a good working directory:

cd /hive/users/${hgwdev_username}/cb

Create a directory for this dataset:

mkdir -p tabula-sapiens-liver/orig/

This command also makes an ‘orig’ directory. In the Cell Browser, we use this to store the unchanged files obtained from the submitter or downloaded from GEO/etc.

Change into that directory:

cd tabula-sapiens-liver/orig/

Copy over the h5ad file we’ll be working with:

cp /hive/data/inside/cells/exampleDatasets/TS_Liver.h5ad .

Determine what field to use as the input for the cluster field option:

h5adMetaInfo TS_Liver.h5ad

The cell_ontology_class field seems like it contains cell names that are derived from an ontology, a standardized, controlled set of names.

Go up a directory and export the data from that file:

cd ../
cbImportScanpy -i orig/TS_Liver.h5ad -o . --clusterField=cell_ontology_class

The options we've specified for cbImportScanpy are:

  • -i: the name of the input h5ad file
  • -o: the output directory (with '.' indicating the current directory)
  • --clusterField: the name we want to use as the default cluster labels (and calculate markers for)

(You can run cbImportScanpy with no arguments to see the full usage message.)

This export should not take more than 3 or 4 minutes. After it completes, you can do an ls and you should see files like meta.tsv or markers.tsv:

CbImportScanpy export files.png

These and other files will be used as input to cbBuild in the next section of this guide.

Part 2: cellbrowser.conf and cbBuild

Now, we will go through the process of modifying the cellbrowser.conf and building a cell browser for this dataset into your public_html directory.

Open the cellbrowser.conf file using vim:

vim cellbrowser.conf

Edit the name and shortLabel fields of your cellbrowser.conf so that it matches the following:

name='tabula-sapiens-liver'
shortLabel='Liver - Tabula Sapiens'

Build the cell browser into your public_html directory

cbBuild -o ~/public_html/cb

Look at your cell browser! It should be at https://hgwdev.gi.ucsc.edu/~${hgwdev_username}/cb/. It should look something like this:

CbImportScanpy first export display.png

When looking at the cell browser for this dataset, do you notice any changes that should be made to make the data more understandable for the average user? Maybe ‘layouts’ that need to be removed because they're uninformative? Or sample text that needs to be changed? We’ll talk more about polishing up the dataset in the next part. {image here?}

Part 3: desc.conf and final polish

Finally, we’ll cover filling out a desc.conf with some basic information about this dataset as well as polishing up any last visual details for this dataset.

Open the desc.conf file using vim:

vim desc.conf

Edit the following lines in your desc.conf to read:

 
title = "Liver Subset - Tabula Sapiens"
abstract = """
Liver subset of the Tabula Sapiens dataset covering over 5000 cells.
"""
paper_url="https://www.science.org/doi/10.1126/science.abl4896 The Tabula Sapiens Consortium. Science. 2022."
other_url="https://tabula-sapiens-portal.ds.czbiohub.org/ Tabula Sapiens Website"

Let's check if there are any 'colors' in the 'uns' slot of the h5ad:

h5ad TS_Liver.h5ad
AnnData object with n_obs × n_vars = 5007 × 58870
    obs: 'organ_tissue', 'method', 'donor', 'anatomical_information', 'n_counts_UMIs', 'n_genes', 'cell_ontology_class', 'free_annotation', 'manually_annotated', 'compartment', 'gender'
    var: 'gene_symbol', 'feature_type', 'ensemblid', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'mean', 'std'
    uns: '_scvi', '_training_mode', 'cell_ontology_class_colors', 'dendrogram_cell_type_tissue', 'dendrogram_computational_compartment_assignment', 'dendrogram_consensus_prediction', 'dendrogram_tissue_cell_type', 'donor_colors', 'donor_method_colors', 'hvg', 'method_colors', 'neighbors', 'sex_colors', 'tissue_colors', 'umap'
    obsm: 'X_pca', 'X_scvi', 'X_scvi_umap', 'X_umap'
    layers: 'decontXcounts', 'raw_counts'
    obsp: 'connectivities', 'distances'

It looks like there are, and their names match metadata field names in obs (e.g. cell_ontology_class & cell_ontology_class_colors). Now export these colors to a file:

colorExporter -i orig/TS_Liver.h5ad -o colors.tsv

Annotate marker genes with link outs to other resources:

cbMarkerAnnotate markers.tsv markers.annotated.tsv

Make these changes to the cellbrowser.conf:

#    {
#        "file": "scvi_umap_coords.tsv",
#        "shortLabel": "scvi_umap"
#    },
#    {
#        "file": "scvi_coords.tsv",
#        "shortLabel": "scvi"
#    },
#    {
#        "file": "pca_coords.tsv",
#        "shortLabel": "pca"
#    }

markers = [{"file": "markers.annotated.tsv", "shortLabel":"Cluster Markers"}]
colors="colors.tsv"

Rebuild the dataset

cbBuild -o ~/public_html/cb

Check it out: https://hgwdev.gi.ucsc.edu/~${hgwdev_username}/cb/. It should look something like this:

CbImportScanpy final display v2.png

What next?

Make other changes to the cellbrowser.conf and desc.conf files to see how they affect the display. (Don't forget to rebuild the dataset between those changes!)

Using cbImportSeurat

In this section we'll walk through how to create a cell browser starting with a Seurat RDS file, which is quite similar to using cbImportScanpy.

Part 1: Directory setup and data export

In this section, we'll set up the required directory structure for this new dataset and export the data from the RDS file.

Ensure that you are in the proper conda environment:

conda activate seuratenv

Change into a good working directory:

cd /hive/users/${hgwdev_username}/cb

Create a directory for this dataset:

mkdir -p mouse-dev-neocortex/orig/

This command also makes an ‘orig’ directory which we use to store the unchanged files obtained from the submitter or downloaded from GEO/etc.

Change into that directory:

cd mouse-dev-neocortex/orig/

Copy over the RDS file we’ll be working with:

cp /hive/data/inside/cells/exampleDatasets/Li_et_al_2020_UCSC_seurat_object.rds .

Go up a directory so that you are now just in the mouse-dev-neocortex directory. Now export the data from the RDS file:

cd ../
cbImportSeurat -i orig/Li_et_al_2020_UCSC_seurat_object.rds -o . --clusterField=clusters

The options we've specified for cbImportSeurat are:

  • -i: the name of the input RDS file
  • -o: the output directory (with '.' indicating the current directory)
  • --clusterField: the name we want to use as the default cluster labels (and calculate markers for)

(You can run cbImportSeurat with no arguments to see the full usage message.)

This export may take up to 30 minutes. After it completes, you can do an ls and you should see files like meta.tsv or markers.tsv:

CbImportSeurat export files.png

These and other files will be used as input to cbBuild in the next section of this guide.

Part 2: cellbrowser.conf and cbBuild

Next up is modifying the cellbrowser.conf and building a cell browser for this dataset.

First, open the cellbrowser.conf file using vim:

vim cellbrowser.conf

Edit the name and shortLabel fields of your cellbrowser.conf so that it matches the following:

name='mouse-dev-neocortex'
shortLabel='Developing Mouse Neocortex'

Build the cell browser into your public_html directory

cbBuild -o ~/public_html/cb

Look at your cell browser! It should be at https://hgwdev.gi.ucsc.edu/~${hgwdev_username}/cb/. It should look something like this:

CbImportSeurat first export display.png

Similar to the dataset we imported as part of the cbImportScanpy example above, do you notice any changes that could be made that might make the data understandable to a user? Maybe ‘layouts’ that need to be removed because they're uninformative? Or sample text that needs to be changed? We’ll go through some of those changes in the next part.

Part 3: desc.conf and final polish

Finally, we’ll cover filling out a desc.conf with some basic information about this dataset as well as polishing up any last visual details for this dataset.

Open the desc.conf file using vim:

vim desc.conf

For this dataset, we can pull from the paper Li et al itself to fill out the title, abstract, and other information. Edit the following lines in your desc.conf to read:

 
title = "Transcriptional priming as a conserved mechanism of lineage diversification in the developing mouse and human neocortex"
abstract = """
<p>
From
<a href="https://www.science.org/doi/10.1126/sciadv.abd2068"
target="_blank">Li et al</a>:
<p>
How the rich variety of neurons in the nervous system arises from neural stem
cells is not well understood. Using single-cell RNA-sequencing and in vivo
confirmation, we uncover previously unrecognized neural stem and progenitor
cell diversity within the fetal mouse and human neocortex, including multiple
types of radial glia and intermediate progenitors. We also observed that
transcriptional priming underlies the diversification of a subset of
ventricular radial glial cells in both species; genetic fate mapping confirms
that the primed radial glial cells generate specific types of basal progenitors
and neurons. The different precursor lineages therefore diversify streams of
cell production in the developing murine and human neocortex. These data show
that transcriptional priming is likely a conserved mechanism of mammalian
neural precursor lineage specialization.
"""

methods="""
<section>Dimension reduction and clustering</section>
<p>
We used PCA and t-distributed SNE as our main dimension reduction
approaches. PCA was performed with RunPCA function (Seurat) using HVGs.
Following PCA, we conducted JACKSTRAW analysis with 100 iterations to identify
statistically significant (P < 0.01) PCs that were driving systematic
variation. We used t-SNE to present data in 2D coordinates, generated by
RunTSNE function in Seurat. Significant PCs identified by JACKSTRAW analysis
were used as input. Perplexity was set to 30. t-SNE plots were generated using
R package ggplot2. Clustering was done with the Luvain-Jaccard algorithm
using t-SNE coordinates by FindClusters function from Seurat with default
setting.
"""

paper_url="https://advances.sciencemag.org/content/6/45/eabd2068 Li et al. 2020. Sci Adv."
pmid = "33158872"
geo_series = "GSE143949"
sra_study = "SRP243456"
bioproject = "PRJNA602313"

Make these changes to the cellbrowser.conf:

coords=[{"file": "tsne.coords.tsv", "shortLabel": "Seurat tsne"}]
body_parts=["brain","neocortex"]

Rebuild the dataset

cbBuild -o ~/public_html/cb

Check it out: https://hgwdev.gi.ucsc.edu/~${hgwdev_username}/cb/. It should look something like this:

CbImportSeurat final display.png

What next?

Make other changes to the cellbrowser.conf and desc.conf files to see how they affect the display. What does changing radius or alpha in cellbrowser.conf do? What about adding information for the lab or institution to the desc.conf? (Don't forget to rebuild the dataset between those changes!)