Assembly QA Jupyter

From Genecats
Revision as of 18:17, 27 June 2019 by Lrnassar (talk | contribs) (Created page with "== QAing Assemblies with Jupyter Notebook == This page is a wiki for the Jupyter Notebook which streamlines the steps in the Assembly QA wiki (http://genomewiki.ucsc.edu/gene...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

QAing Assemblies with Jupyter Notebook

This page is a wiki for the Jupyter Notebook which streamlines the steps in the Assembly QA wiki (http://genomewiki.ucsc.edu/genecats/index.php/Assembly_QA_Part_1_DEV_Steps). Jupyter notebook provides a live web environment that can run code, such as bash shell or python. First, you will want to copy the latest script to a safe location in your hive. From its current location:

cp /cluster/home/lrnassar/Random_Test_Files/Assembly_QA_Streamline.ipynb /hive/users/$user/jupyter/

Starting the notebook

Jupyter notebooks need to be started from the directory your notebook is located in. Following from the example above, you will first want to be in the directory:

cd /hive/users/$user/jupyter/

You will then want to start the notebook with the following command:

jupyter-notebook --ip 128.114.198.32 --no-browser --port 8085 

While specifying any port between 8081 and 8090. This should start jupyter, and give you a URL to enter on your web browser. Going to that URL you should see a list of files in the directory, including the copied file: Assembly_QA_Streamline. Clicking into it will start the notebook.

Specifying your variables

The script is organized into 5 separate sections.

  1. Auto Dev Steps - http://genomewiki.ucsc.edu/genecats/index.php/Assembly_QA_Part_1_DEV_Steps
  2. Manual Dev Steps - http://genomewiki.ucsc.edu/genecats/index.php/Assembly_QA_Part_1_DEV_Steps
  3. Track Steps - http://genomewiki.ucsc.edu/genecats/index.php/Assembly_QA_Part_2_Track_Steps
  4. Beta Steps - http://genomewiki.ucsc.edu/genecats/index.php/Assembly_QA_Part_3_BETA_Steps
  5. RR Steps - http://genomewiki.ucsc.edu/genecats/index.php/Assembly_QA_Part_4_RR_Steps

Each 'cell' is run independently. The first one, 'Auto Dev Steps', mostly performs automatic checks such as checking for minimum browser criteria, seeing if a BLAT server exists, etc.

The notebook currently takes 2 variables, with the later parts taking 3. These variables are located at the top of each cell. Currently they are:

  1. assembly
  2. prev_assembly
  3. RedmineNumber

You will have to fill these out at the top of each cell, each jupyter cell works independently and thus each cell requires its own variables. Enter the assembly in UCSC syntax, e.x 'equCab3', there are examples there as well. If this is a new assembly, 'prev_assembly' should be "None". When ready, run the cell by hitting the play button at the top of the script or the shortcut "control + enter".