Cell Browser data flow and architecture

From Genecats
Jump to navigationJump to search

How does data flow between the different machines?

650


How does building a cell browser work?

   What files are copied over?
   Which ones are transformed into another format?

1. Data is first deposited in a dataset directory inside /hive/data/inside/cells on hgwdev and then gets built onto cells-test using the command:

# For datasets with no additional subsets
cbBuild -o alpha

# For dataset collections you will use the recursive option "-r"
cbBuild -r -o alpha

2. The output files from cbBuild are placed inside /usr/local/apache/htdocs-cells. Note that the original configuration files and expression matrices inside the dataset directory are converted into either JSON or binary files (BIN). These files are used by the Cell Browser website to display the visualization. The original files are human readable; whereas, the ones used by the browser are for faster access.

3. Once the dataset is on cells-test, the next destination is cells-beta. You will push the directory and files from htdocs-cells onto /usr/local/apache/htdocs-cells-beta using the command:

# Push single dataset
cbPush dir-name

# To push multiple datasets you will need to place all dataset names inside quotes
cbPush "dir-name-1 dir-name-2 dir-name-3"

Note that cbPush requires you to input a directory name.

A good alias to have in your .bashrc that pushes the current directory you are in onto beta:

alias cbPushDir='cbPush "${PWD##*/}"'

You could name this alias whatever you prefer.

4. Once your dataset is on beta, you are almost there! Once the dataset is checked over for potential bugs, you will use the command:

sudo cellsPush

You will be prompted to type in a password, use your hgwdev password. Once you do that, the datasets will be built onto the hgw0, hgw1, and hgw2 machines! Voila! Important to note that sudo cellsPush pushes out ALL of the changes that are on beta, so make sure everything is ready to be pushed out. You can use datasetDiffs -r to double check if there are any additional changes that might get pushed out along with your new dataset.

System Architecture Map

Cb sysarchmap.png