About

I am an associate professor at the Department of Biostatistics & Health Data Science, University of Pittsburgh. I received my Ph.D. degree in Biostatistics from the University of Michigan. Before that, I got my B.A. in Mathematics and M.S. in Statistics from the University of Virginia. I was once an undergraduate student at the Sun Yat-sen University.

My research lies at the intersection of biostatistics and machine learning, with a broad goal of promoting and propelling health data science. I am particularly interested in developing statistical methods for integrative data analysis that combines data sets from multiple sources or knowledge of different types to achieve higher precision and power. With this in mind, my current research program focuses on developing methods that support regression, prediction and decision making based on large scale (distributed) data sets. I also develop data processing tools for analyzing high-dimensional data. Most of my work is inspired by and closely related to applications in bioinformatics, clinical trials, electronic health records, environmental health sciences, fairness and disparity, and health policies.

  • Data integration and meta-analysis
  • Causal inference and precision medicine
  • Longitudinal data analysis
  • Subgroup analysis
  • High-dimensional data analysis

Major grant support
PI (2023-2025) NIH R21DA055672 Federated learning methods for heterogeneous and distributed Medicaid data
PI (2023-2026) NSF DMS 2310217 Fusion pursuit for pattern-mixture models with application to longitudinal studies with nonignorable missing data
Co-I (2023-2026) NIH R01LM014142 Disease subtyping guided by clinical phenotype for precision medicine
Co-I (2022-2026) NIH R01DA055585 Improving racial equity in opioid use disorder treatment in Medicaid
Co-I (2021-2025) NIH R01GM141081 Precision medicine approach to glucocortisteroids in sepsis

Recent News

  • Xinlei Chen received ASA 2025 Student Paper Award (Statistical Computing and Statistical Graphics Sections).
  • I am honored to be selected as one of the speakers of Pitt Senior Vice Chancellor’s Research Seminar 2025.
  • Crystal Zang received Miheala Serban Award for Best Poster Presentation at the 2024 ASA Pittsburgh Chapter Banquet.
  • I am honored to receive an NSF award to develop novel analysis approaches for longitudinal data with nonignorable missingness.
  • I am honored to receive an NIH award to develop one-shot federated learning methods for causal inference on distributed Medicaid claims data.
  • I am honored to receive the 2023 IMS New Researchers Travel Award.

Recent Publications

See my Google Scholar page for the complete list and citation metrics.
* student as first author, + as corresponding or last author

Methods
  • Federated learning of robust individualized decision rules with application to heterogeneous multi-hospital sepsis population
    [In press] -- Chen, X.*, Talisa, V.B., Tan, X., Qi, Z., Kennedy, J.N., Chang, C.H., Seymour, C.W., and Tang, L.+
    2025 -- Annals of Applied Statistics
  • Transfer learning via random forests: a one-shot federated approach
    [Link] -- Xiang, P., Zhou, L., and Tang, L.+
    2024 -- Computational Statistics & Data Analysis
  • RISE: robust individualized decision learning with sensitive variables
    [Link] -- Tan, X.*, Qi, Z., Seymour, C.W., and Tang, L.+
    2022 -- Neural Information Processing Systems (NeurIPS)
  • A tree-based model averaging approach for personalized treatment effect estimation from heterogeneous data sources
    [Link] -- Tan, X.*, Chang, C.H., Zhou, L., and Tang, L.+
    2022 -- International Conference on Machine Learning (ICML)
  • Method of contraction-expansion (MOCE) for simultaneous inference in linear models
    [Link] -- Wang, F., Zhou, L., Tang, L., and Song, P.X.
    2021 -- Journal of Machine Learning Research
Applications
  • Heterogeneity in the effect of early goal-directed therapy for septic shock: A secondary analysis of two multicenter international trials
    [Link] -- Shah, F.A., Talisa, V.B., Chang, C.H., Triantafyllou, S., Tang, L., Mayr, F.B., Higgins, A.M., Peake, S.L., Mouncey, P., Harrison, D.A. and DeMerle, K.M., Kennedy, J.N., Cooper, G.F., Bellomo, R., Rowan, K., Yealy, D.M., Seymour, C.W., Angus, D.C., and Yende, S.P.
    2025 -- Critical Care Medicine
  • Development and validation of an overdose risk prediction tool using prescription drug monitoring program data
    [Link] -- Gellad, W.F., Yang, Q., Adamson, K.M., Kuza, C.C., Buchanich, J.M., Bolton, A.L., Murzynski, S.M., Thomas Goetz, C., Washington, T., Lann, M.F., Chang, C.H., Suda, K.J., and Tang, L.+
    2023 -- Drug and Alcohol Dependence
  • Duration of medication treatment for opioid-use disorder and risk of overdose among Medicaid enrollees in eleven states: A retrospective cohort study
    [Link] -- Burns, M., Tang, L., Chang, C.H., Kim, J.Y., Ahrens, K., Lindsay, A., Cunningham, P., Gordon, A., Jarlenski, M.P., Lanier, P., Mauk, R., McDuffie, M.J., Mohamoud, S., Talbert, J., Zivin, K., and Donohue, J.
    2022 -- Addiction
  • Use of medications for treatment of opioid use disorder among US Medicaid enrollees in 11 states, 2014-2018
    [Link] -- Donohue, J.M., Jarlenski, M., Kim, J.Y., Tang, L., Ahrens, K., Allen, L., Austin, A., Barnes, A.J., Burns, M., Chang, C.H., Clark, S., Cole, E., Crane, D., Cunningham, P., Idala, D., Junker, S., Lanier, P., Mauk, R., McDuffie, M.J., Mohamoud, S., Pauly, N., Sheets, L., Talbert, J., Zivin, K., Gordon, A.J., and Kennedy, S.
    2021 -- Journal of the American Medical Association

Software

  • RISE: deriving robust individualized decision (e.g. treatment) rule
    The Python package can be used to learn robust individualized decisions to improve the worst-case outcomes of individuals caused by sensitive variables that are unavailable at the time of decision. [GitHub]
  • ifedtree: tree-based federated learning for heterogeneous model harmonization
    The R package harmonizes models derived from heterogeneous data sources to boost the power of a target study, without the need for individual-level data from the other sources. It also provides visualization of the heterogeneous association stucture across studies. [GitHub]
  • metafuse: fused lasso approach for data integration
    The package allows detection of heterogeneous effects across multiple independent datasets when analyzed jointly. It provides visualization of covariate-specific effect subgrouping via dendrograms, and enables variable selection. [CRAN]
  • modac: method of divide-and-combine for penalized GLM
    Map-reduce functions in Python for fitting GLM when a dataset is large and stored on distributed Hadoop clusters. The method provides stable inference. [GitHub]
  • eSIR: extended SIR (Susceptible-Infectious-Removed) model
    R package of an epidemiological forecast model for assessing interventions based on COVID-19 data. [GitHub]
  • pgee: R implementation of penalized GEE with LASSO, SCAD and MCP [GitHub]
  • R Shiny app for a quick random effect meta-analysis
    Based on metafor, it yields visualization and summary statistics to help understand betweeb-site heterogeneity. [Link]

Students

Current Students
  • [PhD] Jinhong Li (co-advise with Guan Yu)
  • [PhD] Zhiyu Sui (co-advise with Ying Ding)
  • [PhD] Xinlei Chen
  • [PhD] Crystal (Ziwei) Zang (co-advise with Rebecca Deek)
  • [PhD] Ruizhi Yuan (co-advise with Wei Chen)
  • [MS] Yaxin Lin
  • [MS] Tyler J. Kelly

Past Students
  • [PhD] Haoyi Fu (co-advise with Robert Krafty) @Novartis
  • [PhD] Xiaoqing (Ellen) Tan (co-advise with Gong Tang) @Meta
  • [PhD] Peng Liu (co-advise with George Tseng) @Merck
  • [MS] Liling Lu @UPMC
  • [MS] Jason N. Kennedy (co-advise with Jeanine Buchanich) @Pitt
  • [MS] Zhuxuan Fu @St. Luke's Health System
  • [MS] Ruishen Lyu @Cleveland Clinic

Miscellaneous

  • Outside of work, I like to swim, run, and spend time with my family.
  • I got into the hobby of woodworking during the pandemic. Check here for some of my work.

This page was last modified on: 2/11/2025