Genbank updates

From Genecats
Revision as of 19:07, 7 March 2011 by Vanessa (talk | contribs) (New page: To disable genbank updates to an assembly: In addition to removing the assembly name from the hgwbeta.dbs and rr.dbs files, the files here need to be removed: /cluster/data/genbank/data...)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

To disable genbank updates to an assembly:

In addition to removing the assembly name from the hgwbeta.dbs and rr.dbs files, the files here need to be removed:

/cluster/data/genbank/data/ftp/

Otherwise the genbank files will keep reappearing.

So, to disable a genome:

   -> on hgwdev remove the assembly name from these files: 
          ~/kent/src/hg/makeDb/genbank/etc/hgwbeta.dbs
          ~/kent/src/hg/makeDb/genbank/etc/rr.dbs
      and commit.  Please remove the name, don't comment it out, git keeps the file edit history.
   -> ssh hgwbeta
   -> cd ~/kent/src/hg/makeDb/genbank
   -> git pull
   -> make etc-update-rr
   -> cd to /cluster/data/genbank/data/ftp/${assembly} 
   -> remove /cluster/data/genbank/data/ftp/${assembly} 
   -> remove files on hgdownload by sending a push request to drop this directory and its contents:
          /usr/local/apache/htdocs/goldenPath/${assembly}

Some extra notes about Genbank tables

The current list of Genbank tables (curated by Mark Diekhans) is located at hgwdev:/cluster/data/genbank/etc/genbank.tbls (also located at hgwbeta:/genbank/etc/genbank.tbls). All tables in the list up to 'gbLoaded' must exist; those after 'gbLoaded' are optional. To get a list of those tables included in a database (using hg18 as an example), do:

 hgsql -N -e 'SHOW TABLES' hg18 | egrep -f /cluster/data/genbank/etc/genbank.tbls  (hgwdev)
 hgsql -N -e 'SHOW TABLES' hg18 | egrep -f /genbank/etc/genbank.tbls  (hgwbeta)

The two tables 'gbCdnaInfo' and 'gbStatus' are main tables that should contain all entries for a database.