Supplementary Data:
Genome-wide in silico prediction of gene expression

These pages contain supplementary data and scripts to recreate the experiments published in the paper below:

The Paper:

The paper is currently under review. In the meantime, the citation is:

McLeay, R. C., Cuellar Partida, G., and Bailey, T. L. (2011a). Genome-wide in silico prediction of gene expression. Submitted.


Motivation: Modelling the regulation of gene expression can provide insight into the regulatory roles of individual transcription factors (TFs) and histone modifications. Recently, Ouyang et al. (2009) modelled gene expression levels in mouse embryonic stem (mES) cells using in vivo ChIP-seq measurements of TF binding. ChIP-seq TF binding data, however, is tissue-specific and relatively difficult to obtain. This limits the applicability of gene expression models that rely on ChIP-seq TF binding data.

Results: In this study, we build regression-based models that relate gene expression to the binding of 12 different transcription factors, seven histone modifications, and chromatin accessibility (DNase I hypersensitivity). We find that expression models based on computationally predicted TF binding can achieve similar accuracy to those using in vivo TF binding data, and that including binding at weak sites is critical for accurate prediction of gene expression. We also find that incorporating histone modification and chromatin accessibility data results in additional accuracy. Surprisingly, we find that models that use no TF binding data at all, but only histone modification and chromatin accessibility data, are essentially as accurate as those based on in vivo TF binding data.