Topic: Bridging Online and Offline Learning Towards Improved Data-Driven Decision Making
Speaker: Yunzong Xu, MIT
Abstract
Machine learning is playing an increasingly important role in decision making, with key applications ranging from dynamic pricing and recommendation systems to personalized medicine and clinical trials. While supervised machine learning traditionally excels at making predictions based on i.i.d. offline data, many modern decision-making tasks require making sequential decisions based on data collected online. Such discrepancy gives rise to important challenges of bridging offline supervised learning and online interactive learning to unlock the full potential of data-driven decision making.
In the main part of this talk, we consider the challenge of reducing difficult online decision-making problems to well-understood offline supervised learning problems. Focusing on contextual bandits, a core class of online decision-making problems, we present the first optimal and efficient reduction from contextual bandits to offline regression. A remarkable consequence of our results is that advances in offline regression immediately translate to contextual bandits, statistically and computationally. We illustrate the advantages of our results through new guarantees in complex operational environments and experiments on real-world datasets. We also discuss the extensions of our results to more challenging setups, including reinforcement learning in large state spaces.
After the main part, I will provide an overview of my additional work and broader research agenda on bridging online and offline learning towards improved data-driven decision making. I will highlight the importance of problem structures and discuss the exciting opportunities for the operations research community.
Speaker Bio
Yunzong Xu is a fifth-year PhD student in the Institute for Data, Systems, and Society at MIT, advised by Prof. David Simchi-Levi. His research lies at the intersection of machine learning and operations research. His current research interests include data-driven decision making, reinforcement learning, bandit learning, statistical learning, econometrics and causal inference, with applications to e-commerce, supply chains, and healthcare. Over the course of his PhD, his research has been recognized by multiple paper awards from INFORMS George Nicholson Paper Competition, Applied Probability Society, and other competitions/organizations. His industrial experience includes an internship at Microsoft Research on reinforcement learning, as well as an ongoing research collaboration with IBM and Boston Scientific on healthcare inventory management. Prior to joining MIT, he received his dual bachelor’s degrees in information systems and mathematics from Tsinghua University in 2018.