UdcFuse

From genomewiki
Revision as of 19:13, 12 October 2009 by AngieHinrichs (talk | contribs) (intro)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

The udc (Url Data Cache - kent/src/lib/udc.c) module is the URL random access and sparse-file caching mechanism underlying the bigBed and bigWig custom track implementation. Each bigBed/bigWig custom track's track line includes the bigDataUrl parameter which is set to the URL of the user's bigBed/bigWig file, e.g. "track name=myBB type=bigBed dataUrl=http://my.edu/myBigBed.bb".

Similar to bigBed/bigWig, the BAM alignment format (binary compressed flavor of [SAM])is indexed for random access which makes it suitable for track display. The [samtools-C] library includes code to do HTTP and FTP random access using the BAM index, so it is easy to implement basic custom track support by simply passing the bigDataUrl to samtools-C access functions.

However, samtools-C lacks SSL (https/ftps) and caching. For each access, the entire index file is downloaded to a file in the current directory. SSL is a valuable feature for users who want to display unpublished data (one alpha-tester is waiting for SSL support), and the lack of caching slows down the genome browser display (constant 4sec track load time for a 1000 Genomes test BAM file with smaller-than-average index file size).

MarkD made the most excellent suggestion of placing udc underneath the file handles used by samtools-C, as a Filesystem in Userspace ([FUSE]) module. FUSE provides an efficient kernel interface for userspace code to implement a fully functional file system. udcFuse is a userspace program built on FUSE that mounts a filesystem that is actually a wrapper on udc functionality.