Advanced Cell Browser Topics

From Genecats
Revision as of 20:21, 16 June 2022 by Mspeir (talk | contribs) (First very rough pass at this page)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Generating coordinates using cbScanpy

Rarely, you will be wrangling a dataset where you need to generate the layout coordinates. This is most easily done with cbScanpy.

Setting up your scanpy.conf

Create one with the default values by running:

cbScanpy –init 

We recommend turning off most of the cell filtering steps as we assume that the authors/submitters have already done the appropriate filtering and the default settings for these filters can be overzealous (e.g. removing 75% or more of the cells in some cases). Make the following changes to the scanpy.conf:

doTimeCells=False

doFilterMito=False doFilterGenes=False

Context-dependent changes

Some changes only make sense depending on the the particulars of your dataset.

Are the values in your matrix already normalized/logged? (If values include decimals and the max value is low, e.g. 6.0-10.0, then it probably is. Then set

doExp=True

Does your dataset have more than 20,000 cells? Only run UMAP:

doLayouts=[“umap”]

Running cbScanpy

Once you have your scanpy.conf set up, it’s time to actually run cbScanpy.

cbScanpy -e orig/<expr_mat_file> -m orig/<meta_file> -o . -n <short_name> --skipMatrix --inCluster=<field_name>

If your scanpy.conf is not in the same directory as where you’re running cbScanpy you’ll need to specify that with the ‘-c’ option.

After that completes, run cbBuild and check out the results in the Cell Browser. Hopefully things separate out into relatively distinct clusters. If not, you can try adjusting the settings in scanpy.conf and trying again or asking the submitters/authors for input.

Setting up rclone

Installation

You can install this using conda in a new environment or one of your existing ones (e.g. scanpyenv). Here we’ll set it up in a separate environment.

Conda create Conda activate Conda install

Get it working with…

Google Drive Box Link to full list?

Downloading a file Other gotchas for cb work?

   File has to be in your drive/box/whatever. 
   Cant remember is there a way to download a public file?

Wrangling a bulk RNA dataset

Renaming a dataset

Note: a dataset’s shortname should (almost) never be changed after being pushed to the main site. People bookmark things and URLs make their way into publications and we want to try our hardest not to break those.

These steps allow you to change a dataset’s shortname, but not have to go through the often lengthy process of rebuilding a dataset from scratch.

First, rename the directory in datasets:

cd /hive/data/inside/cells/datasets/ 

mv {old_name} {new_name}

Then, rename the directory in htdocs-cells

cd /usr/local/apache/htdocs-cells/

mv {old_name} {new_name}

Best to do this soon after you rename the other dir so that the same mv command isn’t too far back in your history.

Finally, rebuild the dataset: cd - # Note that this will take you back to the last directory you were in cbBuild -o alpha

This is necessary because the old name is still present in various dataset.json files, so rebuilding will replace the old names with the new ones.