This simple program compresses wig (genomic data for the UCSC genome browser) files. It removes long zero-information stretches by inserting fixedStep lines, which can be used to skip large parts of the genome. It runs in O(n) time and uses constant memory. It really is terribly simple, but I've uploaded it here so that hopefully somebody can avoid reinventing the wheel.
In particular, if you need to create a bigWig file, you want the input wig file to be as small as possible, otherwise you will run out of memory on your machine (wig2bigwig may use 25gb of RAM or more). This will substantially decrease the file size of wig files from data such as ChIP-seq or ChIP-chip.
./wigShrink 3 test.wig testout.wig
In this case, it will search through the included test.wig and remove any three or more zeros and insert new fixedStep lines. It currently only supports single-value wig files in this format:
fixedStep chrom=chr2 start=40 step=1 0 0 fixedStep chrom=chr2 start=42 step=1 0 0 0 0 1
Note: Setting wigShrink to remove 3 or more zeros is not recommended for real use! 100 is a reasonable setting to use.
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 2 of the License.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
- WigShrink 0.1 C++ Source - Tested on Mac OS X and Linux