Daniel R. Jiang

Chinese: 姜若凌, Korean: 강 다니엘

Hello! I am a Research Scientist at Meta (formerly Facebook) Core Data Science and affiliated as an Adjunct Professor at the University of Pittsburgh. My research interests are in the areas of reinforcement learning, approximate dynamic programming, and Bayesian optimization. I am also interested in taking an operations research perspective in various application domains, such as energy markets, ride-sharing, and public health. I received my Ph.D. in May 2016 in Operations Research and Financial Engineering from Princeton University.

drjiang@pitt.edu   CV (PDF)

Publications and Papers Under Review

Dynamic Inventory Repositioning in On-Demand Rental Networks

Saif Benjafaar, Daniel R. Jiang, Xiang Li, and Xiaobo Li

Management Science (Articles in Advance), 2022.

Brief Description: We consider a product rental network with a fixed number of rental units distributed across multiple locations. We show convexity of the value function and that the optimal policy can be described in terms of a well-specified region over the state space. We leverage these results in an infinite-horizon, cutting-plane-based ADP algorithm and prove its asymptotic optimality, improving upon previous convergence results in the literature.

Multi-Step Budgeted Bayesian Optimization with Unknown Evaluation Costs

Raul Astudillo, Daniel R. Jiang, Max Balandat, Eytan Bakshy, Peter Frazier

Advances in Neural Information Processing Systems, NeurIPS 2021.

Brief Description: Most Bayesian optimization algorithms ignore how evaluation costs, which are often unknown, may change over the optimization domain. An unknown cost function with a budget constraint introduces a new dimension to the exploration-exploitation trade-off, where learning about the cost incurs the cost itself. We propose a new dynamic programming-based acquisition function for this problem setting.

Structured Actor-Critic for Managing Public Health Points-of-Dispensing

Yijia Wang and Daniel R. Jiang

Submitted, 2021.

Brief Description: We consider the setting of public health medical inventory control/dispensing and propose a new actor-critic algorithm that tracks both policy and value function approximations. The algorithm utilizes structure in both the policy and value to improve the empirical convergence rate. We also provide a case study for the problem of dispensing naloxone (an overdose reversal drug) amidst the ongoing opioid crisis.

Efficient Nonmyopic Bayesian Optimization via One-Shot Multi-Step Trees

Shali Jiang*, Daniel R. Jiang*, Max Balandat*, Brian Karrer, Jacob R. Gardner, Roman Garnett

Advances in Neural Information Processing Systems, NeurIPS 2020.

Brief Description: Bayesian optimization is a sequential decision making framework for optimizing expensive-to-evaluate black-box functions. Computing a full lookahead policy amounts to solving a stochastic dynamic program, which is highly intractable. Instead, we propose a multi-step scenario tree formulation and a one-shot optimization approach that operates by differentiating through the entire decision tree. (* equal contribution).

Lookahead-Bounded Q-Learning

Ibrahim El-Shar and Daniel R. Jiang

International Conference on Machine Learning, ICML 2020.

Brief Description: We introduce the lookahead-bounded Q-learning (LBQL) algorithm, a new, provably convergent variant of Q-learning that seeks to make better use of collected experience through the use of noisy "lookahead" upper and lower bounds that constrain the Q-iterates. The algorithm operates via a "feedback loop" by using approximate Q-values to estimate bounds and subsquently using those bounds to improve the Q-values (and repeat).

BoTorch: A Framework for Efficient Monte-Carlo Bayesian Optimization

M. Balandat, B. Karrer, D. R. Jiang, S. Daulton, B. Letham, A. G. Wilson, E. Bakshy

Advances in Neural Information Processing Systems, NeurIPS 2020.

Brief Description: Bayesian optimization provides sample-efficient global optimization for a broad range of applications, including automatic machine learning, molecular chemistry, and experimental design. We introduce BoTorch, a modern programming framework for Bayesian optimization, along with a new "one-shot" approach to optimizing the Knowledge Gradient acquisition function.

Optimistic Monte Carlo Tree Search with Sampled Information Relaxation Dual Bounds

Daniel R. Jiang, Lina Al-Kanj, and Warren B. Powell

Operations Research, 68(6), pp. 1678-1697, 2020.

Brief Description: MCTS is a well-known strategy for solving sequential decision problems, particularly in the area of game-play AI. We propose a new technique called Primal-Dual MCTS that utilizes sampled information relaxation (Brown et. al., 2010) bounds on potential actions in order to make tree expansion decisions. The approach shows promise when used to optimize the behavior of a driver navigating a graph while operating on a ride-sharing platform.

Subgoal-based Exploration via Bayesian Optimization

Yijia Wang, Matthias Poloczek, Daniel R. Jiang

Preliminary version at Task-Agnostic Reinforcement Learning Workshop at ICLR 2019.

Brief Description: We consider problems where an agent faces an unknown task (drawn from a distribution of MDPs) in the future and is given prior opportunities to "practice" on related tasks where the interactions are still expensive. We propose a one-step Bayes-optimal algorithm for selecting subgoal designs, along with the number of episodes and the episode length during training, to efficiently maximize the expected performance of the agent at test time.

Practicality of Nested Risk Measures for Dynamic Electric Vehicle Charging

Daniel R. Jiang and Warren B. Powell

Major revision at Manufacturing & Service Operations Management, 2019.

Brief Description: Risk-averse MDPs formulated with nested (dynamic) risk measures are often used as a tool for solving problems with predefined "practical" risk and reward metrics. In this paper, we study the extent to which the two-sides of this framework are compatible with each other in the setting of dynamic EV charging — roughly speaking, does a "more risk-averse" MDP provide lower risk in the practical sense as well?

Feedback-Based Tree Search for Reinforcement Learning

Daniel R. Jiang, Emmanuel Ekwedike, and Han Liu

International Conference on Machine Learning, ICML 2018.

Brief Description: We describe a technique that iteratively applies MCTS on batches of small, finite-horizon versions of the original infinite-horizon MDP. We show that a deep neural network implementation of the technique can create a competitive AI agent for a popular multi-player online battle arena (MOBA) game.

Shape Constraints in Economics and Operations Research

Andrew L. Johnson and Daniel R. Jiang

Statistical Science, 33(4), pp. 527-546, 2018.

Brief Description: This paper reviews an illustrative set of research on shape constrained estimation in the economics and operations research literature. We highlight the methodological innovations and applications, with a particular emphasis on utility functions, production economics, and sequential decision making applications.

Risk-Averse Approximate Dynamic Programming with Quantile-Based Risk Measures

Daniel R. Jiang and Warren B. Powell

Mathematics of Operations Research, 43(2), pp. 554-579, 2018.

Brief Description: We propose a new Q-learning algorithm and a companion sampling procedure to solve risk-averse Markov decision processes under a class of dynamic quantile-based risk measures. Convergence results are proven and an application to energy storage is shown.

An Approximate Dynamic Programming Algorithm for Monotone Value Functions

Daniel R. Jiang and Warren B. Powell

Operations Research, 63(6), pp. 1489-1511, 2015.

Brief Description: We describe a provably convergent algorithm to exploit the structural property of monotonicity that arises in many applications in operations research, finance, and economics. We show via simulations that near optimal solutions can be obtained using the proposed method when the exact approach is computationally intractable.

Optimal Hour-Ahead Bidding in the Real-Time Electricity Market with Battery Storage using Approximate Dynamic Programming

Daniel R. Jiang and Warren B. Powell

INFORMS Journal on Computing, 27(3), pp. 525-543, 2015.

Brief Description: We formulate a mathematical model for bidding in the real-time market with the goal of performing energy arbitrage (i.e., exploiting variations in spot prices to profit) in the presence of storage. We train and test an approximate dynamic programming policy on real spot price data from the NYISO and show its value over heuristic policies used in industry.


@ University of Pittsburgh

Approximate Dynamic Programming, Ph.D. Level

Instructor, Spring 2017, Fall 2018

Course Description: ADP refers to a broad set of computational methods used for finding approximately optimal policies of intractable sequential decision problems (MDPs). We'll begin with an overview of classical methods and transition to a survey of state-of-the-art developments. The lectures will focus on mathematical proofs and underlying theory while the course project will allow students practice with numerical implementations. All lecture notes, based on a number of papers and texts (primarily Bertsekas and Tsitsiklis), are available online.

Decision Models, Undergraduate/Master's Level

Instructor, Fall 2016, Fall 2017, Fall 2018

Course Description: Decision making is the key in understanding a variety of problems in industry, including inventory control, revenue management, pricing, energy, healthcare, logistics, and finance. In this course, we focus on stochastic decision models (i.e., "decision making under uncertainty") and discuss the fundamental methodology/models in conjunction with applications to real world problems. Students should have a basic understanding of probability and optimization (linear programming).

Ride-sharing Analytics Game, Undergraduate/Master's Level

Instructor, Fall 2018

Description: This was an event designed for the Decision Models course, where teams of students (1) analyze a data set, (2) design a pricing, manufacturing, advertising, and repositioning strategy to operate a ride-sharing company, and (3) compete in a live competition where decisions are submitted periodically to a simulator designed specifically for the course. If you are an instructor and interested in running this event in your course, please let me know. See the description and data for more information.

Reinforcement Learning, Master's Level

Instructor, Summer 2018

Course Description: This is an introductory course on reinforcement learning (RL). The basics of MDPs necessary for RL will be covered, along with a wide range of methods (e.g., TD learning, Q-learning, policy gradients) that perform evaluation and control will be covered. The focus in this course will be on applications, implementation, intuition, and some theory. All lecture notes, based on the Sutton & Barto RL textbook, are available online.

Other Projects

What Would I Say: Read the New Yorker, CNN, and Telegraph articles about the project we created at Hack Princeton 2013, which uses Markov chains to simulate a user's social media posts. The site has generated over 17 million page views and 9 million unique users.

Fantasy Football: Use this app to quantify the role of luck in (Yahoo!) Fantasy Football by generating probability distributions of your record over randomized season schedules.