Statistics research seminar


Modal title

Modal content

Spring Semester 2024

Date / Time Speaker Title Location
7 March 2024
Elliot Young
The University of Cambridge

Research Seminar in Statistics

Title Sandwich Boosting for accurate estimation in partially linear models for grouped data
Speaker, Affiliation Elliot Young, The University of Cambridge
Date, Time 7 March 2024, 16:15-17:15
Location HG G 19.1
Abstract We study partially linear models in settings where observations are arranged in independent groups but may exhibit within-group dependence. Existing approaches estimate linear model parameters through weighted least squares, with optimal weights (given by the inverse covariance of the response, conditional on the covariates) typically estimated by maximising a (restricted) likelihood from random effects modelling or by using generalised estimating equations. We introduce a new ‘sandwich loss’ whose population minimiser coincides with the weights of these approaches when the parametric forms for the conditional covariance are well-specified, but can yield arbitrarily large improvements in linear parameter estimation accuracy when they are not. Under relatively mild conditions, our weighted least squares (within a double machine learning framework) estimated coefficients are asymptotically Gaussian and enjoy minimal variance among estimators with weights restricted to a given class of functions, when user-chosen regression methods are used to estimate nuisance functions. We further expand the class of functional forms for the weights that may be fitted beyond parametric models by leveraging the flexibility of modern machine learning methods within a new gradient boosting scheme for minimising the sandwich loss. We demonstrate the effectiveness of both the sandwich loss and what we call ‘sandwich boosting’ in a variety of settings with simulated and real-world data.
Sandwich Boosting for accurate estimation in partially linear models for grouped dataread_more
HG G 19.1
21 March 2024
Bryon Aragam
The University of Chicago Booth School of Business

Research Seminar in Statistics

Title Research Seminar on Statistics - FDS Seminar joint talk: Statistical aspects of nonparametric latent variable models and causal representation learning
Speaker, Affiliation Bryon Aragam, The University of Chicago Booth School of Business
Date, Time 21 March 2024, 16:15-17:15
Location HG D 1.2
Abstract One of the key paradigm shifts in statistical machine learning over the past decade has been the transition from handcrafted features to automated, data-driven representation learning. A crucial step in this pipeline is to identify latent representations from observational data along with their causal structure. In many applications, the causal variables are not directly observed, and must be learned from data, often using flexible, nonparametric models such as deep neural networks. These settings present new statistical and computational challenges that will be focus of this talk. We will re-visit the statistical foundations of nonparametric latent variable models as a lens into the problem of causal representation learning. We discuss our recent work on developing methods for identifying and learning causal representations from data with rigourous guarantees, and discuss how even basic statistical properties are surprisingly subtle. Along the way, we will explore the connections between causal graphical models, deep generative models, and nonparametric mixture models, and how these connections lead to a useful new theory for causal representation learning.
Research Seminar on Statistics - FDS Seminar joint talk: Statistical aspects of nonparametric latent variable models and causal representation learningread_more
HG D 1.2
26 April 2024
Richard De Veaux
Williams College

Research Seminar in Statistics

Title The Seven Deadly Sins of Data Science
Speaker, Affiliation Richard De Veaux, Williams College
Date, Time 26 April 2024, 15:15-16:15
Location HG G 19.1
Abstract As we are all too aware, organizations accumulate vast amounts of data from a variety of sources nearly continuously. Big data and data science advocates promise the moon and the stars as you harvest the potential of all these data. And now, AI threatens our jobs and perhaps our very existence. There is certainly a lot of hype. There’s no doubt that some savvy organizations are fueling their strategic decision making with insights from big data, but what are the challenges? Much can wrong in the data science process, even for trained professionals. In this talk I'll discuss a wide variety of case studies from a range of industries to illustrate the potential dangers and mistakes that can frustrate problem solving and discovery -- and that can unnecessarily waste resources. My goal is that by seeing some of the mistakes I (and others) have made, you will learn how to better take advantage of data insights without committing the "Seven Deadly Sins."
The Seven Deadly Sins of Data Scienceread_more
HG G 19.1
7 May 2024
Zijian Guo
Rutgers University

Research Seminar in Statistics

Title Adversarially Robust Learning: Identification, Estimation, and Uncertainty Quantification
Speaker, Affiliation Zijian Guo, Rutgers University
Date, Time 7 May 2024, 15:15-16:15
Location HG G 19.2
Abstract Empirical risk minimization may lead to poor prediction performance when the target distribution differs from the source populations. This talk discusses leveraging data from multiple sources and constructing more generalizable and transportable prediction models. We introduce an adversarially robust prediction model to optimize a worst-case reward concerning a class of target distributions and show that our introduced model is a weighted average of the source populations' conditional outcome models. We leverage this identification result to robustify arbitrary machine learning algorithms, including, for example, high-dimensional regression, random forests, and neural networks. In our adversarial learning framework, we propose a novel sampling method to quantify the uncertainty of the adversarial robust prediction model. Moreover, we introduce guided adversarially robust transfer learning (GART) that uses a small amount of target domain data to guide adversarial learning. We show that GART achieves a faster convergence rate than the model fitted with the target data. Our comprehensive simulation studies suggest that GART can substantially outperform existing transfer learning methods, attaining higher robustness and accuracy. Short Bio: Zijian Guo is an associate professor at the Department of Statistics at Rutgers University. He obtained a Ph.D. in Statistics in 2017 from Wharton School, University of Pennsylvania. His research interests include causal inference, multi-source and transfer learning, high-dimensional statistics, and nonstandard statistical inference.
Adversarially Robust Learning: Identification, Estimation, and Uncertainty Quantificationread_more
HG G 19.2
16 May 2024
Jiwei Zhao
University of Wisconsin–Madison

Research Seminar in Statistics

Title A Semiparametric Perspective on Unsupervised Domain Adaptation
Speaker, Affiliation Jiwei Zhao, University of Wisconsin–Madison
Date, Time 16 May 2024, 15:15-16:15
Location HG G 19.1
Abstract In studies ranging from clinical medicine to policy research, complete data are usually available from a population P, but the quantity of interest is often sought for a related but different population Q. In this talk, we consider the unsupervised domain adaptation setting under the label shift assumption. In the first part, we estimate a parameter of interest in population Q by leveraging information from P, where three ingredients are essential: (a) the common conditional distribution of X given Y, (b) the regression model of Y given X in P, and (c) the density ratio of the outcome Y between the two populations. We propose an estimation procedure that only needs some standard nonparametric technique to approximate the conditional expectations with respect to (a), while by no means needs an estimate or model for (b) or (c); i.e., doubly flexible to the model misspecifications of both (b) and (c). In the second part, we pay special attention to the case that the outcome Y is categorical. In this scenario, traditional label shift adaptation methods either suffer from large estimation errors or require cumbersome post-prediction calibrations. To address these issues, we propose a moment-matching framework for adapting the label shift, and an efficient label shift adaptation method where the adaptation weights can be estimated by solving linear systems. We rigorously study the theoretical properties of our proposed methods. Empirically, we illustrate our proposed methods in the MIMIC-III database as well as in some benchmark datasets including MNIST, CIFAR-10, and CIFAR-100.
A Semiparametric Perspective on Unsupervised Domain Adaptationread_more
HG G 19.1

Notes: if you want you can subscribe to the iCal/ics Calender.

JavaScript has been disabled in your browser