Selecting a graphing track data format

Introduction

There are several different types of data submission formats to enable drawing graphs on the genome browser. The structure of the data should be considered in order to select the appropriate type of data format. Proper selection of the data format is important to avoid very large data submission files and to allow efficient display in the genome browser. Proper selection of graphing options is critical to portray accurately the intended meaning in the data.

Genome Graphs

Draws line graph through specified chromosome positions
Best used for genome-wide sparse data points (sparse == less than three hundred thousand)
Not recommend for dense data sets (dense == values closer than 100,000 bases to each other)
See also: Genome Graphs

Bed Graph

Draws bar graph at specified chromosome segment region
Best used for genome-wide data sets on the order of several million to perhaps 10 million positions
Best used when data is not spaced at regular intervals, and the size of the specified regions is not a constant
See also: Bed Graph

Wiggle Variable Step

Draws bar graph at specified chromosome segment region
Best used for genome-wide data sets on the order of several 10's of million data points
Specified regions must be a constant size (specified by the span argument)
Chromosome positions can be at irregular intervals, but caution is advised in certain cases
This is the second most efficient space format for wiggle data input
This format can be inefficient during encoding and display if the irregular spacing of the data points is just too extreme. In this case, the Bed Graph is the backup format.
See also: Wiggle Formats

Wiggle Fixed Step

Draws bar graph at specified chromosome segment region
Best used for genome-wide data sets on the order of several 10's of million data points
Specified regions must be a constant size (specified by the span argument)
Chromosome positions are precisely at regular intervals (specified by step argument)
This is the most efficient space format for wiggle data input
See also: Wiggle Formats

Wiggle Bed Graph

Obsolete data format, use the Bed Graph instead

Notes

At the current time, data sets with more than 100 million data points are impractical due to network transmission time and data transformation and database loading times. Larger sets of data can be attempted, but are not guaranteed to survive the various time-out mechanisms in the pipeline. The visible symptom of a timeout during loading will be a blank WEB browser screen.
It does help to compress (gzip) the submitted data file, resulting in a better network transmission time.
Pseudo line graphs can be drawn with the wiggle tracks by setting optional drawing parameters in the display of the track to draw points instead of bars with smoothing on to smear the points together into a line.
Beware of optional data graphing parameters when viewing the resulting data track. The selection of windowingFunction, viewLimits, autoScaling, etc... can dramatically change the apparent meaning of the data display.
There is no graphing format that draws multiple data values at identical chromosome locations. The loading mechanism does attempt to prevent this situation, but not all cases of overlapping data values can be detected. If multiple data values at identical chromosome locations sneak under the detection mechanisms, graphing behavior is not guaranteed.

Selecting a graphing track data format

Contents

Introduction

Genome Graphs

Bed Graph

Wiggle Variable Step

Wiggle Fixed Step

Wiggle Bed Graph

Notes

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

related sites

hosted projects

Tools