About

I am an assistant professor at the Department of Biostatistics, University of Pittsburgh. I received my Ph.D. degree in Biostatistics from the University of Michigan. Before that, I got my B.A. in Mathematics and M.S. in Statistics from the University of Virginia. I was once an undergraduate student at the Sun Yat-sen University.

My research lies at the intersection of biostatistics and machine learning, with a broad goal of promoting and propelling health data science. I am particularly interested in developing statistical methods for integrative data analysis that combines data sets from multiple sources or knowledge of different types to achieve higher precision and power. With this in mind, my current research program focuses on developing methods that support regression, prediction and decision making based on large scale (distributed) data sets. I also develop data processing tools for analyzing high-dimensional data. Most of my work is inspired by and closely related to applications in bioinformatics, clinical trials, electronic health records, environmental health sciences, fairness and disparity, and health policies.

  • Data integration and meta-analysis
  • Causal inference and precision medicine
  • Longitudinal data analysis
  • Subgroup analysis
  • High-dimensional data analysis

Major grant support
PI (2023-2025) NIH R21DA055672 Federated learning methods for heterogeneous and distributed Medicaid data
PI (2023-2026) NSF DMS 2310217 Fusion pursuit for pattern-mixture models with application to longitudinal studies with nonignorable missing data
Co-I (2023-2026) NIH R01LM014142 Disease subtyping guided by clinical phenotype for precision medicine
Co-I (2022-2026) NIH R01DA055585 Improving racial equity in opioid use disorder treatment in Medicaid
Co-I (2021-2025) NIH R01GM141081 Precision medicine approach to glucocortisteroids in sepsis

Recent News

  • I am currently looking for highly motivated student(s) with strong statistical/computational/causal inference background to join my group. Please get in touch with me if you are interested.
  • I'm honored to receive an NSF award to develop novel analysis approaches for longitudinal data with nonignorable missingness.
  • I'm honored to receive an NIH award to develop one-shot federated learning methods for causal inference on distributed Medicaid claims data.
  • I'm honored to receive the 2023 IMS New Researchers Travel Award.
  • Haoyi Fu received Honorable Mention in the ASA 2023 Student Paper Competition (MDD Section).
  • Haoyi Fu successfully defended his dissertation and will join Novartis in Spring 2023.
  • Xiaoqing (Ellen) Tan received the ENAR 2023 Distinguished Student Paper Award.
  • A paper on robust decision making was accepted to NeurIPS 2022.
  • A collaborative work on landmark analysis was published in Addiction.
  • Xiaoqing (Ellen) Tan successfully defended her dissertation and will join Meta in Fall 2022.
  • A paper on causal model averaging was accepted to ICML 2022.

Selected Publications

See my Google Scholar page for the complete list and citation metrics.
* student as first author, + as corresponding or last author

Methods
  • RISE: robust individualized decision learning with sensitive variables
    [Link] -- Tan, X.*, Qi, Z., Seymour, C.W., and Tang, L.+
    2022 -- Neural Information Processing Systems (NeurIPS 2022)
  • A tree-based model averaging approach for personalized treatment effect estimation from heterogeneous data sources
    [Link] -- Tan, X.*, Chang, C.H., Zhou, L., and Tang, L.+
    2022 -- International Conference on Machine Learning (ICML 2022)
  • Method of contraction-expansion (MOCE) for simultaneous inference in linear models
    [Link] -- Wang, F., Zhou, L., Tang, L., and Song, P.X.
    2021 -- Journal of Machine Learning Research
  • A sparse negative binomial mixture model for clustering RNA-seq count data
    [Link] -- Li, Y., Rahman, T., Ma, T., Tang, L., and Tseng, G.
    2021 -- Biostatistics
  • Post‐stratification fusion learning in longitudinal data analysis
    [Link] -- Tang, L.+, and Song, P.X.
    2021 -- Biometrics
  • An epidemiological forecast model and software assessing interventions on COVID-19 epidemic in China (with discussion)
    [Link] -- Wang, L., Zhou, Y., He, J., Zhu, B., Wang, F., Tang, L., Kleinsasser, M., Barker, D., Eisenberg, M.C., and Song, P.X.
    2020 -- Journal of Data Science
  • Distributed simultaneous inference in generalized linear models via confidence distribution
    [Link] -- Tang, L., Zhou, L., and Song, P.X.
    2020 -- Journal of Multivariate Analysis
  • Fused lasso approach in regression coefficients clustering -- learning parameter heterogeneity in data integration
    [Link] -- Tang, L., and Song, P.X.
    2016 -- Journal of Machine Learning Research
Applications
  • Development and validation of an overdose risk prediction tool using prescription drug monitoring program data
    [Link] -- Gellad, W.F., Yang, Q., Adamson, K.M., Kuza, C.C., Buchanich, J.M., Bolton, A.L., Murzynski, S.M., Thomas Goetz, C., Washington, T., Lann, M.F., Chang, C.H., Suda, K.J., and Tang, L.+
    2023 -- Drug and Alcohol Dependence
  • Duration of medication treatment for opioid-use disorder and risk of overdose among Medicaid enrollees in eleven states: A retrospective cohort study
    [Link] -- Burns, M., Tang, L., Chang, C.H., Kim, J.Y., Ahrens, K., Lindsay, A., Cunningham, P., Gordon, A., Jarlenski, M.P., Lanier, P., Mauk, R., McDuffie, M.J., Mohamoud, S., Talbert, J., Zivin, K., and Donohue, J.
    2022 -- Addiction
  • Use of medications for treatment of opioid use disorder among US Medicaid enrollees in 11 states, 2014-2018
    [Link] -- Donohue, J.M., Jarlenski, M., Kim, J.Y., Tang, L., Ahrens, K., Allen, L., Austin, A., Barnes, A.J., Burns, M., Chang, C.H., Clark, S., Cole, E., Crane, D., Cunningham, P., Idala, D., Junker, S., Lanier, P., Mauk, R., McDuffie, M.J., Mohamoud, S., Pauly, N., Sheets, L., Talbert, J., Zivin, K., Gordon, A.J., and Kennedy, S.
    2021 -- Journal of the American Medical Association
  • Integrative analysis of gene-specific DNA methylation and untargeted metabolomics data from the ELEMENT cohort
    [Link] -- Goodrich, J.M., Hector, E.C., Tang, L., LaBarre, J.L., Dolinoy, D.C., Mercado-Garcia, A., Cantoral, A., Song, P.X., Téllez-Rojo, M.M., and Peterson, K.E.
    2020 -- Epigenetics Insights
  • Mitochondrial nutrient utilization underlying the association between metabolites and insulin resistance in adolescents
    [Link] -- LaBarre, J.L., Peterson, K.E., Kachman, M.T., Perng, W., Tang, L., Hao, W., Zhou, L., Karnovsky, A., Cantoral, A., Téllez-Rojo, M.M., Song, P.X., and Burant, C.F.
    2020 -- The Journal of Clinical Endocrinology & Metabolism
  • Urate and nonanoate mark the relationship between sugar-sweetened beverage intake and blood pressure in adolescent girls: A metabolomics analysis in the ELEMENT cohort.
    [Link] -- Perng, W., Tang, L., Song, P.X., Goran, M., Tellez-Rojo, M.M., Cantoral, A., and Peterson, K.E.
    2019 -- Metabolites
  • Metabolomic profiles and development of metabolic risk during the pubertal transition: a prospective study in the ELEMENT project
    [Link] -- Perng, W., Tang, L., Song, P.X., Tellez-Rojo, M.M., Cantoral, A., Peterson, K.E.
    2019 -- Pediatric Research
  • Lipid Metabolism is a key mediator of developmental epigenetic programming
    [Link] -- Marchlewicz, E.H., Dolinoy, D.C., Tang, L., Milewski, S., Jones, T.R., Goodrich, J.M., Soni, T., Domino, S.E., Song, P.X., Burant, C. and Padmanabhan, V.
    2016 -- Scientific Reports

Software

  • RISE: deriving robust individualized decision (e.g. treatment) rule
    The Python package can be used to learn robust individualized decisions to improve the worst-case outcomes of individuals caused by sensitive variables that are unavailable at the time of decision. [GitHub]
  • ifedtree: tree-based federated learning for heterogeneous model harmonization
    The R package harmonizes models derived from heterogeneous data sources to boost the power of a target study, without the need for individual-level data from the other sources. It also provides visualization of the heterogeneous association stucture across studies. [GitHub]
  • metafuse: fused lasso approach for data integration
    The package allows detection of heterogeneous effects across multiple independent datasets when analyzed jointly. It provides visualization of covariate-specific effect subgrouping via dendrograms, and enables variable selection. [CRAN]
  • modac: method of divide-and-combine for penalized GLM
    Map-reduce functions in Python for fitting GLM when a dataset is large and stored on distributed Hadoop clusters. The method provides stable inference. [GitHub]
  • eSIR: extended SIR (Susceptible-Infectious-Removed) model
    R package of an epidemiological forecast model for assessing interventions based on COVID-19 data. [GitHub]
  • pgee: R implementation of penalized GEE with LASSO, SCAD and MCP [GitHub]
  • R Shiny app for a quick random effect meta-analysis
    Based on metafor, it yields visualization and summary statistics to help understand betweeb-site heterogeneity. [Link]

Students

Current Students
Past Students
  • [PhD] Haoyi Fu (co-advise with Robert Krafty)
  • [PhD] Xiaoqing (Ellen) Tan (co-advise with Gong Tang)
  • [PhD] Peng Liu (co-advise with George Tseng)
  • [MS] Liling Lu
  • [MS] Jason N. Kennedy (co-advise with Jeanine Buchanich)
  • [MS] Zhuxuan Fu
  • [MS] Ruishen Lyu

Miscellaneous

  • Outside of work, I like to swim, run, and spend time with my family.
  • I got into the hobby of woodworking during the pandemic. Check here for some of my work.

This page was last modified on: 6/15/2023