publications
content in reversed chronological order.
2024
- Nonparametric Additive Value Functions: Interpretable Reinforcement Learning with an Application to Surgical RecoveryPatrick Emedom-Nnamdi, Timothy R. Smith, Jukka-Pekka Onnela, and Junwei LuAnnals of Applied Statistics, to appear, 2024
We propose a nonparametric additive model for estimating interpretable value functions in reinforcement learning. Learning effective adaptive clinical interventions that rely on digital phenotyping features is a major for concern medical practitioners. With respect to spine surgery, different post-operative recovery recommendations concerning patient mobilization can lead to significant variation in patient recovery. While reinforcement learning has achieved widespread success in domains such as games, recent methods heavily rely on black-box methods, such neural networks. Unfortunately, these methods hinder the ability of examining the contribution each feature makes in producing the final suggested decision. While such interpretations are easily provided in classical algorithms such as Least Squares Policy Iteration, basic linearity assumptions prevent learning higher-order flexible interactions between features. In this paper, we present a novel method that offers a flexible technique for estimating action-value functions without making explicit parametric assumptions regarding their additive functional form. This nonparametric estimation strategy relies on incorporating local kernel regression and basis expansion to obtain a sparse, additive representation of the action-value function. Under this approach, we are able to locally approximate the action-value function and retrieve the nonlinear, independent contribution of select features as well as joint feature pairs. We validate the proposed approach with a simulation study, and, in an application to spine disease, uncover recovery recommendations that are inline with related clinical knowledge.
2023
- Knowledge Transfer from Teachers to Learners in Growing-Batch Reinforcement LearningPatrick Emedom-Nnamdi, Abram L. Friesen, Bobak Shahriari, Nando Freitas, and Matt W. HoffmanInternational Conference on Learning Representations (ICLR) – Reincarnating RL Workshop, 2023
Standard approaches to sequential decision-making exploit an agent’s ability to continually interact with its environment and improve its control policy. However, due to safety, ethical, and practicality constraints, this type of trial-and-error experimentation is often infeasible in many real-world domains such as healthcare and robotics. Instead, control policies in these domains are typically trained offline from previously logged data or in a growing-batch manner. In this setting a fixed policy is deployed to the environment and used to gather an entire batch of new data before being aggregated with past batches and used to update the policy. This improvement cycle can then be repeated multiple times. While a limited number of such cycles is feasible in real-world domains, the quality and diversity of the resulting data are much lower than in the standard continually-interacting approach. However, data collection in these domains is often performed in conjunction with human experts, who are able to label or annotate the collected data. In this paper, we first explore the trade-offs present in this growing-batch setting, and then investigate how information provided by a teacher (i.e., demonstrations, expert actions, and gradient information) can be leveraged at training time to mitigate the sample complexity and coverage requirements for actor-critic methods. We validate our contributions on tasks from the DeepMind Control Suite.
- Interpretable Statistical Learning for Real-World Behavioral DataPatrick Ugochukwu Emedom-NnamdiHarvard University, 2023
The rapid development of data collection methods and analysis techniques has revolutionized our understanding of human behavior and its relationship to health outcomes. However, despite the increasing availability of real-world behavioral data, the effective use of this information for real-time prediction and intervention remains a significant challenge. This dissertation explores interpretable statistical learning methods for real-world behavioral data, with a focus on overcoming limitations in episodic data collection by leveraging smartphone-based digital phenotyping. The approaches explored ultimately provide a scalable method for utilizing real-world history data on human behavior to inform decision-making and interventions, while improving current standards of care. Chapter 1 presents a novel method for estimating interpretable value functions in reinforcement learning. By incorporating local kernel regression and basis expansion, we develop a sparse, additive representation of the action-value function. This allows us to approximate the action-value function and retrieve the nonlinear, independent contributions of select features and joint feature pairs. We validate this approach through a simulation study and an application to spine disease, uncovering recovery recommendations in line with clinical knowledge. Chapter 2 explores the trade-offs of learning in the growing-batch reinforcement learning setting and investigates how information provided by a teacher (i.e., demonstrations, expert actions, and gradient information) can be leveraged during training to mitigate the sample complexity and coverage requirements for actor-critic methods. We validate our contributions on tasks from the DeepMind Control Suite. Chapter 3 introduces an approach where we use hidden semi-Markov models on smartphone activity logs to identify key patterns of differentiation in smartphone usage among adolescents with bipolar disorder and their typically-developing peers. This analysis enables the identification of latent constructs that correspond to resting and active smartphone usage, providing insights into the long-term behavioral trends in adolescents with bipolar disorder. Chapter 4 presents the Digital Assessment in Neuro-Oncology (DANO) pilot, which leverages smartphone-based digital phenotyping to monitor post-operative recovery in glioblastoma patients. We analyze passive GPS and accelerometer data to construct mobility patterns and compare these patterns with a control group of non-operative spine disease patients. Our findings reveal significant changes in mobility among glioblastoma patients during the first six months following surgery and between subsequent cycles of chemotherapy.
- Stasis: Reinforcement Learning Simulators for Human-Centric Real-World EnvironmentsGeorgios Efstathiadis, Patrick Emedom-Nnamdi, Arinbjörn Kolbeinsson, Jukka-Pekka Onnela, and Junwei LuICLR 2023 – Workshop on Trustworthy Machine Learning for Healthcare, 2023
We present on-going work toward building Stasis, a suite of reinforcement learning (RL) environments that aim to maintain realism for human-centric agents operating in real-world settings. Through representation learning and alignment with real-world offline data, Stasis allows for the evaluation of RL algorithms in offline environments with adjustable characteristics, such as observability, heterogeneity and levels of missing data. We aim to introduce environments the encourage training RL agents that are capable of maintaining a level of performance and robustness comparable to agents trained in real-world online environments, while avoiding the high cost and risks associated with making mistakes during online training. We provide examples of two environments that will be part of Stasis and discuss its implications for the deployment of RL-based systems in sensitive and high-risk areas of application.
2017
- Queueing Analysis of a Chagas Disease Control CampaignMaria T. Rieders, Patrick Emedom-Nnamdi, and Michael Z. Levy2017
A critical component of preventing the spread of vector borne diseases such as Chagas disease are door-to-door campaigns by public health officials that implement insecticide application in order to eradicate the vector infestation of households. The success of such campaigns depends on adequate household participation during the active phase as well as on sufficient follow-up during the surveillance phase when newly infested houses or infested houses that had not participated in the active phase will receive treatment. Queueing models which are widely used in operations management give us a mathematical representation of the operational efforts needed to contain the spread of infestation. By modeling the queue as consisting of all infested houses in a given locality, we capture the dynamics of the insect population due to prevalence of infestation and to the additional growth of infestation by redispersion, i.e. by the spread of infestation to previously uninfested houses during the wait time for treatment. In contrast to traditional queueing models, houses waiting for treatment are not known but must be identified through a search process by public health workers. Thus, both the arrival rate of houses to the queue as well as the removal rate from the queue depend on the current level of infestation. We incorporate these dependencies through a load dependent queueing model which allows us to estimate the long run average rate of removing houses from the queue and therefore the cost associated with a given surveillance program. The model is motivated by and applied to an ongoing Chagas disease control campaign in Arequipa, Peru.