User talk:Hiram

From genomewiki
Jump to: navigation, search

What I need to know about GIT

1. git status says:

# On branch master
# Your branch is ahead of 'origin/master' by 2 commits.

The question is, what is different now between what I have and master ?

git diff origin/master

How about just the file names:

git diff origin/master | grep "^+++"

2. what is the difference between a file I have and this file in its current state in the master, without altering anything I may have here now: ?

git fetch
git diff origin/master

3. how does one fetch a previous beta version ? For example, v235_branch. Clone the master, then checkout a branch:

git clone
cd kent
git checkout origin/v235_branch

This gives you a "HEAD-less" branch. If you want an actual branch of that version:

git checkout -b my_v235 origin/v235_branch

4. how do you diff a file between different versions of itself ?

ISMB 2008, Toronto Ontario Canada

Metagenomics approaches 1: Science. 2006 Jun 2;312(5778):1355-9
Metagenomic analysis of the human distal gut microbiome.
Gill SR, Pop M, Deboy RT, Eckburg PB, Turnbaugh PJ, Samuel BS, Gordon JI, Relman DA, Fraser-Liggett CM, Nelson KE.
| Fraser-Liggett

Rodrigo Lopez EBI

Estimating True Evolutionary Distances uunder the DCJ Model Estimating true evolutionary distances under the DCJ model
Yu Lin and Bernard M.E. Moret
Laboratory for Computational Biology and Bioinformatics,
Swiss Federal Institute of Technology (EPFL), EPFL-IIS-LCBB,
INJ 230, Station 14, CH-1015 Lausanne, Switzerland

Mail list concerning the big data management from the new sequencing technologies: bioinfo-core

VisANT - pathway visualization tool

All Proceedings published in Bioinformatics

Gabor Marth lab at Boston College, working on the difficulties brought up by the new sequencing technologies. Developing toolsets for alignment of reads and adjusting quality scores.

My Experiment and EBI workflow manager site


  • hm... thinking Gangleri 12:46, 7 April 2006 (PDT)
copying this code and making a preview at a Wikimedia foundation wiki ...
both variants will work; please tray to find out what Tidy is; where you can download ond how you can install it; I never operated a wiki myself but I did a lot of tests; I understand that Tidy is a kind of wizzard correcting HTML code / generating errorfree HTML code
note: commons:wikt:yi:user:Gangleri/tests/bugzilla#test wiki's lists a lot of test wikies; if you make a preview at [1] you will see that there the behaviour is the same as here; please do not get angry; I was searching the answer for months
Good luck! You can reach me on the IRC-channel #wiktionary or skype me 'irelgnag'. Best regards Gangleri 13:04, 7 April 2006 (PDT)
Works OK. Best regards Gangleri · T · m: Th · T 16:12, 7 April 2006 (PDT)

Genomics and Justice - 2007-05-18

What is important and want to carry forward from session to session

What are important but need to be backgrounded

what do we know

Morning session, 09:30

Participatory forms

What are the underlying concepts of Justice when a particular "performance" is seen in a public forum. Can you perhaps have a philosophical advisory team on the side to provide a checks and balance function and provide advice to questions of justice as they come up in a research situation.

Example from Tuscan Italy experience, don't mince words when asking for samples, make it clear that samples are needed to help with the research. Offer an exchange of something valuable for the samples. Doesn't do much good to try and promote some kind of community wide involvement as if you were passing off responsibility to that "group". The researcher has to have enough guts to realize they will continue to have the responsibility no matter how much the group has been informed about the research.


Public domain, collective commons. Copyleft. Foster innovation in complete openness by doing innovation in public. Against what do we need to secure the open access for this new set of biological information. "Creative commons is not communism." Illicit copies. The "generic" can only exist if the private patented item exists. Given definitional terms, it is only possible to have public domain commons if and only if there exists a private domain for that same business.

Patent system - one mechanism of answering the question of how to get innovation to happen. This avoids the problem of how do these innovations become applied to the actual problems that exist.

The benefits of the public domain. It can solve problems that markets do not care about. For example the knowledge of smoking is dangerous, was in the public domain in the 1950s and was alleviated by behavioral changes. Penicillin development has an interesting history. Interesting problems can be solved efficiently by taking public domain knowledge, turning it into a private resource which then makes it valuable and will become available.

Groups without power have notions of identity and property that need to be protected when external forces walk in and have different ideas of identity and property.


Aboriginal title, rights of occupancy, what native peoples have. What is present in native culture, definitions, meanings, stories, how do they fit into the structures being imposed upon them. DNA on "loan" contracts.

Test Custom Track Link

Custom track: Hg18 small exons and introns on the UCSC Genes track

Rat Genome Meeting CSHL 09 Dec 2007

new assembly due about end of Jan 2008

cDNAs being worked on from other strains.

Perhaps about 4,000 to 5,000 EuraTools

Sequencing other strains, brown norway on 454 to 4X end of 2008, maybe whole genome, depends upon technology.

Check with the Japan folks, they've been doing BAC ENDs on two rats. The data is in trace archives by about April.

Baylor is currently working on SNP calls, want one set, no dups please, about 8 strins, perhaps several million, trouble with how far apart the strains are.

sequencing, or sampling, other strains, best to improve QTL, don't worry about the exons.

the SHR strain may be done with the Solexa procedure if paired end protocol can work. Might be $30K Canadian for a Solexa run at 3X coverage.

need list of strains and the relative value in their sampling/sequencing.

Ewan proposes RGD identifiers as the primary naming, certainly xref to the others. Should there be an RGD id for all gene models ? Trouble is when things disappear. The idea is to have as much as possible. Beware of naming schemes that are too tenative to begin with.

Entrez genes, Ensembl

request to have all the genes done by Sept.

coordinate between Ensembl, RGD and UCSC to cull the best coincident predictions, and have something to say about the differences.

Ontologies at RGD - Mary Shimoyama

Gene Ontology

Mammalian Phenotype Ontology - developed at MGD

Pathway Ontology - developed at RGD



Rat genes with GO 17,000

Trying to keep up with GO on a weekly basis

Ewan suggests that strain names are used only if they are in the RGD Strain Ontology DB

There is a big Mouse ontology meeting in Europe every year, good to coordinate with them. Euro Phemone

eQTL tracks are new to Ensembl available via the DAS source menus in about a week.

Ewan would like UCSC to be DAS clients.

Can the Rat folks agree on a set of cell lines to use for genome-wide surveys ? Do rat folks think of genome-wide surveys ?

Next Rat meeting like this next year in Hinxton. Then 2009 CSHL again.

Top 10 items to worry about

It is important to keep your list straight when worrying about things. A proper perspective needs to be considered at all times.

  1. insect population decline
  2. climate change
  3. billions of refugees wandering about looking for food and water
  4. fresh water supply
  5. population stress exacerbates all other problems
  6. declining fish populations
  7. antibiotic resistant bacterial infections
  8. mono-culture crop and livestock strains
  9. USA public debt
  10. the end of oil
  11. deep sea oil well blow out
  12. nuclear power plant meets tsunami

Links to remember

NY Times 2007-06-29 The Cult of the Amateur, By MICHIKO KAKUTANI, Published: June 29, 2007

Amazon AWS services 2009-06-16

Amazon Bioinformatics Seattle 2010-06-08

  • James Hamilton power and infrastructure cost analysis
  • BioTeam services
  • BioTeam blog Chris Dagdigian, scriptable infrastructure
  • Cloud Management open source software
  • Next Gen Sequence cloud management, Andreas Sundquist
  • Pacific Biosciences fast sequencing, 15 minute genome sequencer by 2014, with this speed the analysis needs to be done as the sequence is produced to reduce the amount of data to store.
  • Complete Genomics sequencing and analysis results delivered via S3, ready now to do 500 human genomes per month, may need a genome browser.
  • Galaxy high throughput sequencing analysis with Galaxy instances in the cloud
  • Chef! cookbooks for cloud compute management
  • Brad Chapman Mass. General Hospital, promoting open-source collaboration for bioinformatics tools. His github blog

ISCA 3rd International meeting, Bethesda MD, 2010-06-29,30

Notes 2010-09-30

  • analyzing WEB traffic: http