From pottery styles to mouse development: a method for delineating mammalian transcriptional regulatory networks
Olena Morozova, Vyacheslav Morozov, Mikhail Bilenky, Gordon Robertson, Marco Marra
Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Research Centre, British Columbia Cancer Agency, Suite 100, 570 West 7th Avenue, Vancouver, BC V5Z 4E6, Canada
Transcription factors (TFs) are key regulators of gene expression that presumably account for the coordinated regulation of functionally-related genes. Therefore, identifying and characterizing functional associations of TFs on a global scale is of paramount importance to understanding transcriptional control mechanisms that govern various aspects of development and organogenesis. The Mouse Atlas of Gene Expression project (http://www.mouseatlas.org/) provides a unique resource of SAGE gene expression data derived from more than 200 different mouse tissues representing various developmental stages, and thus can be used to infer functional associations of transcription factors expressed throughout mouse development.
In contrast to other high throughput gene expression techniques, SAGE has been relatively under-exploited due to the lack of appropriate clustering methods. Here, we adopted a seriation algorithm originally developed for temporally ordering archaeological deposits, and demonstrated that it can be used for the analysis of SAGE data. We applied the algorithm to order 948 mouse TFs based on the similarity of their SAGE expression profiles, producing a TF co-expression matrix. By applying different cutoffs to the correlation coefficients in the matrix, we were able to reconstruct a series of TF genetic interaction networks identifying groups of TFs with similar expression profiles. At the most conservative threshold (correlation coefficient >= 0.9) we were left with two main clusters of most highly related TFs. We validated the final clusters by examining the upstream regions of the co-expressed TFs with the cisRED pipeline (http://www.cisred.org/) that uses multiple tools to identify conserved regulatory regions that may account for the observed co-expression. We demonstrate that the network clusters obtained by seriation analysis consist of TFs that contain a higher than random proportion of shared transcription factor binding sites and modules (transcription factor binding sites within 300bp of each other). The network hubs uncovered by our analysis identify potential major regulators of particular importance to mammalian development and organogenesis.