Seminar overview

×

Modal title

Modal content

Spring Semester 2025

Date & Time Speaker Title Location
Thr 20.02.2025
15:15-16:00
Rafael M. Frongillo
CU Boulder
Abstract
Abstract: Machine learning and data science competitions, wherein contestants submit predictions about held-out data points, are an increasingly common way to gather information and identify experts. One of the most prominent platforms is Kaggle, which has run competitions with prizes up to 3 million USD. The traditional mechanism for selecting the winner is simple: score each prediction on each held-out data point, and the contestant with the highest total score wins. Perhaps surprisingly, this reasonable and popular mechanism can incentivize contestants to submit wildly inaccurate predictions. The talk will begin with intuition for the incentive issues and what sort of strategic behavior one would expect---and when. One takeaway is that, despite conventional wisdom, large held-out data sets do not always alleviate these incentive issues, and small ones do not necessarily suffer from them, as we confirm with formal results. We will then discuss a new mechanism which is approximately truthful, in the sense that rational contestants will submit predictions which are close to their best guess. If time permits, we will see how the same mechanism solves an open question for online learning from strategic experts. Bio: Rafael (Raf) Frongillo is an Associate Professor of Computer Science at the University of Colorado Boulder. His research lies at the interface between theoretical machine learning and economics, primarily focusing on information elicitation mechanisms, which incentivize humans or algorithms to predict accurately. Before Boulder, Raf was a postdoc at the Center for Research on Computation and Society at Harvard University and at Microsoft Research New York. He received his PhD in Computer Science at UC Berkeley, advised by Christos Papadimitriou and supported by the NDSEG Fellowship.
ZueKoSt: Seminar on Applied Statistics
Incentive problems in data science competitions, and how to fix them
HG G 19.2
Thr 27.02.2025
16:15-17:15
David M. Blei
Columbia University
Abstract
A core problem in statistics and machine learning is to approximate difficult-to-compute probability distributions. This problem is especially important in Bayesian statistics, which frames all inference about unknown quantities as a calculation about a conditional distribution. In this talk I review and discuss innovations in variational inference (VI), a method that approximates probability distributions through optimization. VI has been used in myriad applications in machine learning and Bayesian statistics. After quickly reviewing the basics, I will discuss two lines of research in VI. I first describe stochastic variational inference, an approximate inference algorithm for handling massive datasets, and demonstrate its application to probabilistic topic models of millions of articles. Then I discuss black box variational inference, a more generic algorithm for approximating the posterior. Black box inference applies to many models but requires minimal mathematical work to implement. I will demonstrate black box inference on deep exponential families---a method for Bayesian deep learning---and describe how it enables powerful tools for probabilistic programming. Finally, I will highlight some more recent results in variational inference, including statistical theory, score-based objective functions, and interpolating between mean-field and fully dependent variational families.
Research Seminar in Statistics
Joint talk ETH-FDS Seminar - Research Seminar on Statistics:"Scaling and Generalizing Approximate Bayesian Inference"
HG D 1.2
Thr 27.02.2025
16:15-17:15
David M. Blei
Columbia University
Abstract
A core problem in statistics and machine learning is to approximate difficult-to-compute probability distributions. This problem is especially important in Bayesian statistics, which frames all inference about unknown quantities as a calculation about a conditional distribution. In this talk I review and discuss innovations in variational inference (VI), a method that approximates probability distributions through optimization. VI has been used in myriad applications in machine learning and Bayesian statistics. After quickly reviewing the basics, I will discuss two lines of research in VI. I first describe stochastic variational inference, an approximate inference algorithm for handling massive datasets, and demonstrate its application to probabilistic topic models of millions of articles. Then I discuss black box variational inference, a more generic algorithm for approximating the posterior. Black box inference applies to many models but requires minimal mathematical work to implement. I will demonstrate black box inference on deep exponential families---a method for Bayesian deep learning---and describe how it enables powerful tools for probabilistic programming. Finally, I will highlight some more recent results in variational inference, including statistical theory, score-based objective functions, and interpolating between mean-field and fully dependent variational families.
ETH-FDS seminar
Joint talk ETH-FDS Seminar - Research Seminar on Statistics: "Scaling and Generalizing Approximate Bayesian Inference"
HG D 1.2
Fri 07.03.2025
16:15-17:15
David M. Blei
Columbia University
Abstract
Analyzing nested data with hierarchical models is a staple of Bayesian statistics, but causal modeling remains largely focused on "flat" models. In this talk, we will explore how to think about nested data in causal models, and we will consider the advantages of nested data over aggregate data (such as data means) for causal inference. We show that disaggregating your data---replacing a flat causal model with a hierarchical causal model---can provide new opportunities for identification and estimation. As examples, we will study how to identify and estimate causal effects under unmeasured confounders, interference, and instruments. Preprint: https://arxiv.org/abs/2401.05330 This is joint work with Eli Weinstein.
ETH-FDS seminar
Joint talk ETH-FDS Seminar - Research Seminar on Statistics: "Hierarchical Causal Models"
HG D 7.2
Fri 07.03.2025
16:15-17:15
David M. Blei
Columbia University
Abstract
Analyzing nested data with hierarchical models is a staple of Bayesian statistics, but causal modeling remains largely focused on "flat" models. In this talk, we will explore how to think about nested data in causal models, and we will consider the advantages of nested data over aggregate data (such as data means) for causal inference. We show that disaggregating your data---replacing a flat causal model with a hierarchical causal model---can provide new opportunities for identification and estimation. As examples, we will study how to identify and estimate causal effects under unmeasured confounders, interference, and instruments. Preprint: https://arxiv.org/abs/2401.05330 This is joint work with Eli Weinstein.
Research Seminar in Statistics
Joint talk ETH-FDS Seminar - Research Seminar on Statistics:"Hierarchical Causal Models"
HG D 7.2
Thr 13.03.2025
16:15-17:15
Yinyue Ye
Stanford, CUHKSZ, HKUST, and SJTU
Abstract
This talk aims to present several mathematical optimization problems/algorithms for AI such as the LLM training, tunning and inferencing. In particular, we describe how classic optimization models/theories can be applied to accelerate and improve the Training/Tunning/Inferencing algorithms that are popularly used in LLMs. On the other hand, we show breakthroughs in classical Optimization (LP and SDP) Solvers aided by AI-related techniques such as first-order and ADMM methods, the low-rank SDP theories, and the GPU Implementations. Bio: Yinyu Ye is currently the K.T. Li Professor of Engineering at Department of Management Science and Engineering and Institute of Computational and Mathematical Engineering, Stanford University; and visiting chair professor of Shanghai Jiao Tong University. His current research topics include Continuous and Discrete Optimization, Data Science and Applications, Algorithm Design and Analyses, Algorithmic Game/Market Equilibrium, Operations Research and Management Science etc.; and he was one of the pioneers on Interior-Point Methods, Conic Linear Programming, Distributionally Robust Optimization, Online Linear Programming and Learning, Algorithm Analyses for Reinforcement Learning & Markov Decision Process and nonconvex optimization, and etc. He and his students have received numerous scientific awards, himself including the 2006 INFORMS Farkas Prize (Inaugural Recipient) for fundamental contributions to optimization, the 2009 John von Neumann Theory Prize for fundamental sustained contributions to theory in Operations Research and the Management Sciences, the inaugural 2012 ISMP Tseng Lectureship Prize for outstanding contribution to continuous optimization (every three years), the 2014 SIAM Optimization Prize awarded (every three years).
ETH-FDS seminar
Mathematical Optimization in the Era of AI
HG G 19.1
Fri 14.03.2025
15:15-16:00
Matteo Fontana
Royal Holloway, University of London
Abstract
Quantifying uncertainty in multivariate regression is crucial across many real-world applications. However, existing approaches for constructing prediction regions often struggle to capture complex dependencies, lack formal coverage guarantees, or incur high computational costs. Conformal prediction addresses these challenges by providing a robust, distribution-free framework with finite-sample coverage guarantees. In this study, we offer a unified comparison of multi-output conformal techniques, highlighting their properties and interrelationships. Leveraging these insights, we propose two families of conformity scores that achieve asymptotic conditional coverage: one can be paired with any generative model, while the other reduces computational overhead by utilizing invertible generative models. We then present a large-scale empirical analysis on 32 tabular datasets, comparing all methods under a consistent code base to ensure fairness and reproducibility.
Research Seminar in Statistics
Multi-Output Conformal Regression: A Unified View with Comparisons
HG G 19.1
Thr 20.03.2025
16:15-17:15
Stefan Wager
Stanford University
Abstract
The time at which renewable (e.g., solar or wind) energy resources produce electricity cannot generally be controlled. In many settings, consumers have some flexibility in their energy consumption needs, and there is growing interest in demand-response programs that leverage this flexibility to shift energy consumption to better match renewable production -- thus enabling more efficient utilization of these resources. We study optimal demand response in a model where consumers operate home energy management systems (HEMS) that can compute the "indifference set" of energy-consumption profiles that meet pre-specified consumer objectives, receive demand-response signals from the grid, and control consumer devices within the indifference set. For example, if a consumer asks for the indoor temperature to remain between certain upper and lower bounds, a HEMS could time use of air conditioning or heating to align with high renewable production when possible. Here, we show that while price-based mechanisms do not in general achieve optimal demand response, i.e., dynamic pricing cannot induce HEMS to choose optimal demand consumption profiles within the available indifference sets, pricing is asymptotically optimal in a mean-field limit with a growing number of consumers. Furthermore, we show that large-sample optimal dynamic prices can be efficiently derived via an algorithm that only requires querying HEMS about their planned consumption schedules given different prices. We demonstrate our approach in a grid simulation powered by OpenDSS, and show that it achieves meaningful demand response without creating grid instability. Mohammad Mehrabi, Omer Karaduman, Stefan Wager https://arxiv.org/abs/2409.07655
ETH-FDS seminar
Joint talk ETH-FDS Seminar - Research Seminar on Statistics: "Optimal Mechanisms for Demand Response: An Indifference Set Approach"
HG E 3
Thr 20.03.2025
16:15-17:15
Stefan Wager
Stanford University
Abstract
The time at which renewable (e.g., solar or wind) energy resources produce electricity cannot generally be controlled. In many settings, consumers have some flexibility in their energy consumption needs, and there is growing interest in demand-response programs that leverage this flexibility to shift energy consumption to better match renewable production -- thus enabling more efficient utilization of these resources. We study optimal demand response in a model where consumers operate home energy management systems (HEMS) that can compute the "indifference set" of energy-consumption profiles that meet pre-specified consumer objectives, receive demand-response signals from the grid, and control consumer devices within the indifference set. For example, if a consumer asks for the indoor temperature to remain between certain upper and lower bounds, a HEMS could time use of air conditioning or heating to align with high renewable production when possible. Here, we show that while price-based mechanisms do not in general achieve optimal demand response, i.e., dynamic pricing cannot induce HEMS to choose optimal demand consumption profiles within the available indifference sets, pricing is asymptotically optimal in a mean-field limit with a growing number of consumers. Furthermore, we show that large-sample optimal dynamic prices can be efficiently derived via an algorithm that only requires querying HEMS about their planned consumption schedules given different prices. We demonstrate our approach in a grid simulation powered by OpenDSS, and show that it achieves meaningful demand response without creating grid instability. Mohammad Mehrabi, Omer Karaduman, Stefan Wager https://arxiv.org/abs/2409.07655
Research Seminar in Statistics
Joint talk ETH-FDS Seminar - Research Seminar on Statistics: "Optimal Mechanisms for Demand Response: An Indifference Set Approach"
HG E 3
Tue 01.04.2025
17:15-18:30
Hyunju Kwon

Yuansi Chen
ETH Zurich
HG F 30
Wed 02.04.2025
15:15-16:00
Linbo Wang
University of Toronto
Abstract
In many observational studies, researchers are often interested in studying the effects of multiple exposures on a single outcome. Standard approaches for high-dimensional data such as the lasso assume the associations between the exposures and the outcome are sparse. These methods, however, do not estimate the causal effects in the presence of unmeasured confounding. In this paper, we consider an alternative approach that assumes the causal effects in view are sparse. We show that with sparse causation, the causal effects are identifiable even with unmeasured confounding. At the core of our proposal is a novel device, called the synthetic instrument, that in contrast to standard instrumental variables, can be constructed using the observed exposures directly. We show that under linear structural equation models, the problem of causal effect estimation can be formulated as an ℓ0-penalization problem, and hence can be solved efficiently using off-the-shelf software. Simulations show that our approach outperforms state-of-art methods in both low-dimensional and high-dimensional settings. We further illustrate our method using a mouse obesity dataset.
Research Seminar in Statistics
The synthetic instrument: From sparse association to sparse causation
HG G 19.1
Fri 11.04.2025
15:15-16:15
Victoria Stodden
University of Southern California
HG G 19.1
Thr 08.05.2025
15:15-16:15
Toby Hocking

Abstract
data.table is an R package with C code that is one of the most efficient open-source in-memory database packages available today. First released to CRAN by Matt Dowle in 2006, it continues to grow in popularity, and now over 1500 other CRAN packages depend on data.table. This talk will discuss basic and advanced data manipulation topics, and end with a discussion about how you can contribute to data.table.
ZueKoSt: Seminar on Applied Statistics
Using and contributing to the data.table package for efficient big data analysis
HG
Mon 12.05.2025
17:15-18:15
Andrew Stuart
Caltech
Abstract
ETH-FDS Stiefel Lectures
Stiefel Lecture 2025
HG F 30
Wed 25.06.2025
HG F 3
Thr 26.06.2025
HG F 3
Fri 27.06.2025
HG F 3
JavaScript has been disabled in your browser