About

I am an associate professor at the Department of Biostatistics & Health Data Science, University of Pittsburgh. I received my Ph.D. degree in Biostatistics from the University of Michigan. Before that, I got my B.A. in Mathematics and M.S. in Statistics from the University of Virginia. I was once an undergraduate student at the Sun Yat-sen University.

My research lies at the intersection of biostatistics and machine learning, with a broad goal of promoting and propelling health data science. I am particularly interested in developing statistical methods for integrative data analysis that combines data sets from multiple sources or knowledge of different types to achieve higher precision and power. With this in mind, my current research program focuses on developing methods that support regression, prediction and decision making based on large scale (distributed) data sets. I also develop data processing tools for analyzing high-dimensional data. Most of my work is inspired by and closely related to applications in bioinformatics, clinical trials, electronic health records, environmental health sciences, and health policies.

  • Data integration and transfer learning
  • Causal inference and precision medicine
  • High-dimensional data and subgroup analysis
  • Longitudinal data analysis

Major grant support
PI (2025-2027) NIH R56LM014522 Improving safety and trustworthiness in data-driven decision learning for sepsis
PI (2023-2026) NSF DMS 2310217 Fusion pursuit for pattern-mixture models with application to longitudinal studies with nonignorable missing data
PI (2023-2025) NIH R21DA055672 Federated learning methods for heterogeneous and distributed Medicaid data
Co-I (2023-2026) NIH R01LM014142 Disease subtyping guided by clinical phenotype for precision medicine
Co-I (2022-2026) NIH R01DA055585 Improving racial equity in opioid use disorder treatment in Medicaid
Co-I (2021-2025) NIH R01GM141081 Precision medicine approach to glucocortisteroids in sepsis

News

  • I am honored to receive an NIH award to develop safe and trustworthy decision learning methods for sepsis.
  • I am honored to become an Elected Member of the International Statistical Institute (ISI).
  • Crystal Zang received Poster Award (Section on Text Analysis) at the 2025 Joint Statistical Conference.
  • Xinlei Chen received ASA 2025 Student Paper Award (Statistical Computing and Statistical Graphics Sections).
  • I am honored to be selected as one of the speakers of Pitt Senior Vice Chancellor’s Research Seminar 2025.

Selected Publications

See my Google Scholar page for the complete list and citation metrics.
__ student as first/second author; as corresponding or last author

Methods
  • Harmony-based data integration for distributed single-cell multi-omics data
    [Link] -- Yuan, R., Rong, Z., Hu, H., Liu, T., Tao, S., Chen, W.*, and Tang, L.*
    2025 -- PLOS Computational Biology
  • Robust transfer learning for individualized treatment rules in the presence of missing data
    [Link] -- Sui, Z., Ding, Y., and Tang, L.*
    2025 -- Biostatistics
  • Federated learning of robust individualized decision rules with application to heterogeneous multi-hospital sepsis population
    [Link] -- Chen, X., Talisa, V.B., Tan, X., Qi, Z., Kennedy, J.N., Chang, C.H., Seymour, C.W., and Tang, L.*
    2025 -- Annals of Applied Statistics
  • RISE: robust individualized decision learning with sensitive variables
    [Link] -- Tan, X., Qi, Z., Seymour, C.W., and Tang, L.*
    2022 -- Neural Information Processing Systems (NeurIPS)
  • A tree-based model averaging approach for personalized treatment effect estimation from heterogeneous data sources
    [Link] -- Tan, X., Chang, C.H., Zhou, L., and Tang, L.*
    2022 -- International Conference on Machine Learning (ICML)
Applications
  • Development and evaluation of a machine learning model to predict acute care for opioid use disorder among Medicaid enrollees engaged in a community‐based treatment program
    [Link] -- Xue, L., Yin, R., Cole, E.S., Lo-Ciganic, W.H., Gellad, W.F., Donohue, J.M., and Tang, L.*
    2025 -- Addiction
  • Heterogeneity in the effect of early goal-directed therapy for septic shock: A secondary analysis of two multicenter international trials
    [Link] -- Shah, F.A., Talisa, V.B., Chang, C.H., Triantafyllou, S., Tang, L., Mayr, F.B., Higgins, A.M., Peake, S.L., Mouncey, P., Harrison, D.A. and DeMerle, K.M., Kennedy, J.N., Cooper, G.F., Bellomo, R., Rowan, K., Yealy, D.M., Seymour, C.W., Angus, D.C., and Yende, S.P.
    2025 -- Critical Care Medicine
  • Development and validation of an overdose risk prediction tool using prescription drug monitoring program data
    [Link] -- Gellad, W.F., Yang, Q., Adamson, K.M., Kuza, C.C., Buchanich, J.M., Bolton, A.L., Murzynski, S.M., Thomas Goetz, C., Washington, T., Lann, M.F., Chang, C.H., Suda, K.J., and Tang, L.*
    2023 -- Drug and Alcohol Dependence
  • Duration of medication treatment for opioid-use disorder and risk of overdose among Medicaid enrollees in eleven states: A retrospective cohort study
    [Link] -- Burns, M., Tang, L., Chang, C.H., Kim, J.Y., Ahrens, K., Lindsay, A., Cunningham, P., Gordon, A., Jarlenski, M.P., Lanier, P., Mauk, R., McDuffie, M.J., Mohamoud, S., Talbert, J., Zivin, K., and Donohue, J.
    2022 -- Addiction
  • Use of medications for treatment of opioid use disorder among US Medicaid enrollees in 11 states, 2014-2018
    [Link] -- Donohue, J.M., Jarlenski, M., Kim, J.Y., Tang, L., Ahrens, K., Allen, L., Austin, A., Barnes, A.J., Burns, M., Chang, C.H., Clark, S., Cole, E., Crane, D., Cunningham, P., Idala, D., Junker, S., Lanier, P., Mauk, R., McDuffie, M.J., Mohamoud, S., Pauly, N., Sheets, L., Talbert, J., Zivin, K., Gordon, A.J., and Kennedy, S.
    2021 -- Journal of the American Medical Association

Students

Current Students
Past Students

Software

  • Federated Harmony: the classic Harmony but for distributed data [GitHub]
    Python package for implementing privacy-preserving batch-effect correction for single-cell expression matrices. It simulates the collaboration between multiple data-holding institutions and a coordinating center to harmonise latent representations without centralising raw data.
  • RTL: transfer learning of individualized treatment rules with missing data [GitHub]
    Python code for robust transfer learning of individualized treatment rules in the presence of missing data. It implements a quantile-based optimization framework to handle covariate shift and missing covariates.
  • FLoRI: federated learning of robust individualized treatment rules [GitHub]
    Python code for federated learning of robust individualized decision rules across heterogeneous multi-hospital networks without sharing patient-level data. It implements a privacy-preserving training method while accounting for cross-site heterogeneity.
  • RISE: learning robust individualized treatment rules with sensitive variables [GitHub]
    Python package to learn robust individualized decisions to improve the worst-case outcomes of individuals caused by sensitive variables that are unavailable at the time of decision.
  • ifedtree: joint approach for learning heterogeneous individualized treatment effects [GitHub]
    R package for harmonizing individualized treatment rules derived from heterogeneous data sources to boost the power of a target study, without the need for individual-level data from the other sources. It provides visualization of the heterogeneous association stucture across studies.
  • GEEfuse: joint approach for fitting heterogeneous GEEs [Paper]
    R code for the detection of heterogeneous effects across multiple independent datasets when fitting GEE models.
  • metafuse: joint approach for fitting heterogeneous GLMs [CRAN]
    R package allowing detection of heterogeneous effects across multiple independent datasets when analyzed jointly. It provides visualization of covariate-specific effect subgrouping via dendrograms, and enables variable selection.
  • modac: divide-and-conquer for fitting penalized GLMs [GitHub]
    Python map-reduce functions for fitting GLM when a dataset is large and stored on distributed Hadoop clusters. The method provides stable inference.
  • eSIR: an extension of the SIR infectious disease model [GitHub]
    R package of an epidemiological forecast model for assessing interventions based on COVID-19 data.
  • Calculator for quick random effects meta-analysis [Link]
    R Shiny app based on metafor, it yields visualization and summary statistics to help understand betweeb-site heterogeneity.

Miscellaneous

  • Outside of work, I like to swim, run, and spend time with my family.
  • I got into the hobby of woodworking during the pandemic. Check here for some of my work.

This page was last modified on: 9/23/2025