With the rapid development of high-throughput genomic experiments, tremendous amount of data are generated annually and accumulated in the public domain. Effective data integration has introduced new statistical challenges. Two types of information integration are often considered: (1) horizontal meta-analysis that combines multiple genomic studies of the same type (e.g. multiple microarray studies, methylation studies or GWAS studies; see Figure A) to increase statistical power, accuracy and validated conclusion (2) vertical integrative analysis that combines multi-omics data (e.g. gene expression, CNV, genotyping, methylation, somatic mutation, miRNA and clincal variables) of the same patient cohort to investigate disease subtypes, disease associated or driving genes and related regulatory network. The ultimate goal is translational "personalized medicine" to better diagnose and treat patients.
A good example for data integration is the Cancer Genome Atlas (TCGA) project (http://cancergenome.nih.gov/). This project investigates more than 20 cancers, each with 200-500 tumor samples and some paired adjacent normal samples. For each sample, multiple omics data have been generated, including gene expression, SNPs, somatic mutation, CNV, methylation, miRNA and clinical variables. When restricting one type of omics data (e.g. gene expression or methylation) and combine multiple cancers for pan-cancer analysis, horizontal genomic meta-analysis should be used. If considering a given cancer (e.g. breast cancer) and combine multi-omics data, the vertical integrative analysis should be considered.
MetaOmics project is a software suite for combining multiple genomic studies for horizontal meta-analysis and data integration. Currently the methods focus on combining multiple transcriptomic data but most of them can be extended for combining other types of genomic data, such as GWAS, eQTL, methylation and copy number variation. Below are packages publicly available or under development.
- MetaQC: Quality control to determine inclusion and exclusion of studies in the meta-analysis.
- MetaDE: Meta-analysis to detect differentially expressed genes.
- MetaPath: Meta-analysis to detect pathways associated with condition changes (i.e. gene set analysis).
- MetaPCA: Meta-analysis for PCA dimension reduction.
- MetaClust: Meta-analysis for gene clustering (gene module identification) and sample clustering (subtype discovery).
- MetaNetwork: Meta-analysis for network analysis.