Mega2 has been enhanced to use a SQLite database as an intermediate data representation. Additionally, Mega2 now stores bialleleic genotype data in a highly compressed form, much like that of the GenABEL R facility and the PLINK binary format. Concurrently, the R community and Bioconductor community have developed a variety of genetic analysis programs complimentary to the programs available through Mega2. We have now made it easy to load SQLite3 Mega2 databases directly into R as data frames to use these R facilities. In addition, we have developed C++ functions for R to decompress needed subsets of the genotype data, on the fly, in a memory efficient manner. We have also created several more R functions that illustrate how to use the data frames as well as perform useful functions: these permit one to run the 'pedgene' R package to carry out gene-based association tests on family data using selected marker subsets, to output the 'mega2r' data as a VCF file and related files (for phenotype and family data), and to convert the data frames into 'GenABEL' R gwaa.data-class objects. The Mega2R package enhances GenABEL since it supports additional input data formats (such as PLINK, VCF and IMPUTE2) not currently supported by GenABEL.
The Mega2R package is available from The Comprehensive R Archive Network (CRAN):
Mega2R on CRAN.
To easily install within R, issue the R command
install.packages("Mega2R")
To learn to use the Mega2R package, please use this tutorial:
Mega2R Tutorial text (html): mega2rtutorial.html
This gives an overview of Mega2R:
The Mega2R poster (PDF), which was presented at the 2017 American Society of Human Genetics meeting.
Mega2R uses SQLite databases produced by Mega2; documentation for Mega2 can be found here.