Seminar overview
×
Modal title
Modal content
Spring Semester 2025
Date & Time | Speaker | Title | Location |
---|---|---|---|
Thr 20.02.2025 15:15-16:00 |
Rafael M. Frongillo CU Boulder |
Abstract
Abstract: Machine learning and data science competitions, wherein contestants submit predictions about held-out data points, are an increasingly common way to gather information and identify experts. One of the most prominent platforms is Kaggle, which has run competitions with prizes up to 3 million USD. The traditional mechanism for selecting the winner is simple: score each prediction on each held-out data point, and the contestant with the highest total score wins. Perhaps surprisingly, this reasonable and popular mechanism can incentivize contestants to submit wildly inaccurate predictions. The talk will begin with intuition for the incentive issues and what sort of strategic behavior one would expect---and when. One takeaway is that, despite conventional wisdom, large held-out data sets do not always alleviate these incentive issues, and small ones do not necessarily suffer from them, as we confirm with formal results. We will then discuss a new mechanism which is approximately truthful, in the sense that rational contestants will submit predictions which are close to their best guess. If time permits, we will see how the same mechanism solves an open question for online learning from strategic experts.
Bio: Rafael (Raf) Frongillo is an Associate Professor of Computer Science at the University of Colorado Boulder. His research lies at the interface between theoretical machine learning and economics, primarily focusing on information elicitation mechanisms, which incentivize humans or algorithms to predict accurately. Before Boulder, Raf was a postdoc at the Center for Research on Computation and Society at Harvard University and at Microsoft Research New York. He received his PhD in Computer Science at UC Berkeley, advised by Christos Papadimitriou and supported by the NDSEG Fellowship.
ZueKoSt: Seminar on Applied StatisticsIncentive problems in data science competitions, and how to fix themread_more |
HG G 19.2 |
Thr 27.02.2025 16:15-17:15 |
David M. Blei Columbia University |
Abstract
A core problem in statistics and machine learning is to approximate
difficult-to-compute probability distributions. This problem is
especially important in Bayesian statistics, which frames all
inference about unknown quantities as a calculation about a conditional distribution. In this talk I review and discuss
innovations in variational inference (VI), a method that approximates probability distributions through optimization. VI has been used in myriad applications in machine learning and Bayesian statistics.
After quickly reviewing the basics, I will discuss two lines of
research in VI. I first describe stochastic variational inference, an approximate inference algorithm for handling massive datasets, and
demonstrate its application to probabilistic topic models of millions of articles. Then I discuss black box variational inference, a more generic algorithm for approximating the posterior. Black box inference applies to many models but requires minimal mathematical work to implement. I will demonstrate black box inference on deep exponential families---a method for Bayesian deep learning---and describe how it enables powerful tools for probabilistic programming.
Finally, I will highlight some more recent results in variational
inference, including statistical theory, score-based objective
functions, and interpolating between mean-field and fully dependent variational families.
Research Seminar in StatisticsJoint talk ETH-FDS Seminar - Research Seminar on Statistics:"Scaling and Generalizing Approximate Bayesian Inference"read_more |
HG D 1.2 |
Thr 27.02.2025 16:15-17:15 |
David M. Blei Columbia University |
Abstract
A core problem in statistics and machine learning is to approximate
difficult-to-compute probability distributions. This problem is
especially important in Bayesian statistics, which frames all inference about unknown quantities as a calculation about a conditional distribution. In this talk I review and discuss
innovations in variational inference (VI), a method that approximates
probability distributions through optimization. VI has been used in
myriad applications in machine learning and Bayesian statistics.
After quickly reviewing the basics, I will discuss two lines of research in VI. I first describe stochastic variational inference, an approximate inference algorithm for handling massive datasets, and
demonstrate its application to probabilistic topic models of millions of articles. Then I discuss black box variational inference, a more generic algorithm for approximating the posterior. Black box inference applies to many models but requires minimal mathematical work to implement. I will demonstrate black box inference on deep
exponential families---a method for Bayesian deep learning---and
describe how it enables powerful tools for probabilistic programming.
Finally, I will highlight some more recent results in variational
inference, including statistical theory, score-based objective
functions, and interpolating between mean-field and fully dependent variational families.
ETH-FDS seminar Joint talk ETH-FDS Seminar - Research Seminar on Statistics: "Scaling and Generalizing Approximate Bayesian Inference"read_more |
HG D 1.2 |
Fri 07.03.2025 16:15-17:15 |
David M. Blei Columbia University |
Abstract
Analyzing nested data with hierarchical models is a staple of Bayesian statistics, but causal modeling remains largely focused on "flat" models. In this talk, we will explore how to think about nested data in causal models, and we will consider the advantages of nested data over aggregate data (such as data means) for causal inference. We show that disaggregating your data---replacing a flat causal model with a hierarchical causal model---can provide new opportunities for
identification and estimation. As examples, we will study how to
identify and estimate causal effects under unmeasured confounders, interference, and instruments.
Preprint: https://arxiv.org/abs/2401.05330
This is joint work with Eli Weinstein.
ETH-FDS seminar Joint talk ETH-FDS Seminar - Research Seminar on Statistics: "Hierarchical Causal Models"read_more |
HG D 7.2 |
Fri 07.03.2025 16:15-17:15 |
David M. Blei Columbia University |
Abstract
Analyzing nested data with hierarchical models is a staple of Bayesian statistics, but causal modeling remains largely focused on "flat" models. In this talk, we will explore how to think about nested data in causal models, and we will consider the advantages of nested data over aggregate data (such as data means) for causal inference. We show that disaggregating your data---replacing a flat causal model with a hierarchical causal model---can provide new opportunities for
identification and estimation. As examples, we will study how to
identify and estimate causal effects under unmeasured confounders, interference, and instruments.
Preprint: https://arxiv.org/abs/2401.05330
This is joint work with Eli Weinstein.
Research Seminar in StatisticsJoint talk ETH-FDS Seminar - Research Seminar on Statistics:"Hierarchical Causal Models"read_more |
HG D 7.2 |
Thr 13.03.2025 16:15-17:15 |
Yinyue Ye Stanford, CUHKSZ, HKUST, and SJTU |
Abstract
This talk aims to present several mathematical optimization problems/algorithms for AI such as the LLM training, tunning and inferencing. In particular, we describe how classic optimization models/theories can be applied to accelerate and improve the Training/Tunning/Inferencing algorithms that are popularly used in LLMs. On the other hand, we show breakthroughs in classical Optimization (LP and SDP) Solvers aided by AI-related techniques such as first-order and ADMM methods, the low-rank SDP theories, and the GPU Implementations.
Bio: Yinyu Ye is currently the K.T. Li Professor of Engineering at Department of Management Science and Engineering and Institute of Computational and Mathematical Engineering, Stanford University; and visiting chair professor of Shanghai Jiao Tong University. His current research topics include Continuous and Discrete Optimization, Data Science and Applications, Algorithm Design and Analyses, Algorithmic Game/Market Equilibrium, Operations Research and Management Science etc.; and he was one of the pioneers on Interior-Point Methods, Conic Linear Programming, Distributionally Robust Optimization, Online Linear Programming and Learning, Algorithm Analyses for Reinforcement Learning & Markov Decision Process and nonconvex optimization, and etc. He and his students have received numerous scientific awards, himself including the 2006 INFORMS Farkas Prize (Inaugural Recipient) for fundamental contributions to optimization, the 2009 John von Neumann Theory Prize for fundamental sustained contributions to theory in Operations Research and the Management Sciences, the inaugural 2012 ISMP Tseng Lectureship Prize for outstanding contribution to continuous optimization (every three years), the 2014 SIAM Optimization Prize awarded (every three years).
ETH-FDS seminar Mathematical Optimization in the Era of AIread_more |
HG G 19.1 |
Fri 14.03.2025 15:15-16:00 |
Matteo Fontana Royal Holloway, University of London |
Abstract
Quantifying uncertainty in multivariate regression is crucial across many real-world applications. However, existing approaches for constructing prediction regions often struggle to capture complex dependencies, lack formal coverage guarantees, or incur high computational costs. Conformal prediction addresses these challenges by providing a robust, distribution-free framework with finite-sample coverage guarantees. In this study, we offer a unified comparison of multi-output conformal techniques, highlighting their properties and interrelationships. Leveraging these insights, we propose two families of conformity scores that achieve asymptotic conditional coverage: one can be paired with any generative model, while the other reduces computational overhead by utilizing invertible generative models. We then present a large-scale empirical analysis on 32 tabular datasets, comparing all methods under a consistent code base to ensure fairness and reproducibility.
Research Seminar in StatisticsMulti-Output Conformal Regression: A Unified View with Comparisonsread_more |
HG G 19.1 |
Thr 20.03.2025 16:15-17:15 |
Stefan Wager Stanford University |
Abstract
The time at which renewable (e.g., solar or wind) energy resources produce electricity cannot generally be controlled. In many settings, consumers have some flexibility in their energy consumption needs, and there is growing interest in demand-response programs that leverage this flexibility to shift energy consumption to better match renewable production -- thus enabling more efficient utilization of these resources. We study optimal demand response in a model where consumers operate home energy management systems (HEMS) that can compute the "indifference set" of energy-consumption profiles that meet pre-specified consumer objectives, receive demand-response signals from the grid, and control consumer devices within the indifference set. For example, if a consumer asks for the indoor temperature to remain between certain upper and lower bounds, a HEMS could time use of air conditioning or heating to align with high renewable production when possible. Here, we show that while price-based mechanisms do not in general achieve optimal demand response, i.e., dynamic pricing cannot induce HEMS to choose optimal demand consumption profiles within the available indifference sets, pricing is asymptotically optimal in a mean-field limit with a growing number of consumers. Furthermore, we show that large-sample optimal dynamic prices can be efficiently derived via an algorithm that only requires querying HEMS about their planned consumption schedules given different prices. We demonstrate our approach in a grid simulation powered by OpenDSS, and show that it achieves meaningful demand response without creating grid instability.
Mohammad Mehrabi, Omer Karaduman, Stefan Wager
https://arxiv.org/abs/2409.07655
ETH-FDS seminar Joint talk ETH-FDS Seminar - Research Seminar on Statistics: "Optimal Mechanisms for Demand Response: An Indifference Set Approach"read_more |
HG E 3 |
Thr 20.03.2025 16:15-17:15 |
Stefan Wager Stanford University |
Abstract
The time at which renewable (e.g., solar or wind) energy resources produce electricity cannot generally be controlled. In many settings, consumers have some flexibility in their energy consumption needs, and there is growing interest in demand-response programs that leverage this flexibility to shift energy consumption to better match renewable production -- thus enabling more efficient utilization of these resources. We study optimal demand response in a model where consumers operate home energy management systems (HEMS) that can compute the "indifference set" of energy-consumption profiles that meet pre-specified consumer objectives, receive demand-response signals from the grid, and control consumer devices within the indifference set. For example, if a consumer asks for the indoor temperature to remain between certain upper and lower bounds, a HEMS could time use of air conditioning or heating to align with high renewable production when possible. Here, we show that while price-based mechanisms do not in general achieve optimal demand response, i.e., dynamic pricing cannot induce HEMS to choose optimal demand consumption profiles within the available indifference sets, pricing is asymptotically optimal in a mean-field limit with a growing number of consumers. Furthermore, we show that large-sample optimal dynamic prices can be efficiently derived via an algorithm that only requires querying HEMS about their planned consumption schedules given different prices. We demonstrate our approach in a grid simulation powered by OpenDSS, and show that it achieves meaningful demand response without creating grid instability. Mohammad Mehrabi, Omer Karaduman, Stefan Wager https://arxiv.org/abs/2409.07655
Research Seminar in StatisticsJoint talk ETH-FDS Seminar - Research Seminar on Statistics: "Optimal Mechanisms for Demand Response: An Indifference Set Approach"read_more |
HG E 3 |
Tue 01.04.2025 17:15-18:30 |
Hyunju Kwon Yuansi Chen ETH Zurich |
HG F 30 |
|
Wed 02.04.2025 15:15-16:00 |
Linbo Wang University of Toronto |
Abstract
In many observational studies, researchers are often interested in studying the effects of multiple exposures on a single outcome. Standard approaches for high-dimensional data such as the lasso assume the associations between the exposures and the outcome are sparse. These methods, however, do not estimate the causal effects in the presence of unmeasured confounding. In this paper, we consider an alternative approach that assumes the causal effects in view are sparse. We show that with sparse causation, the causal effects are identifiable even with unmeasured confounding. At the core of our proposal is a novel device, called the synthetic instrument, that in contrast to standard instrumental variables, can be constructed using the observed exposures directly. We show that under linear structural equation models, the problem of causal effect estimation can be formulated as an ℓ0-penalization problem, and hence can be solved efficiently using off-the-shelf software. Simulations show that our approach outperforms state-of-art methods in both low-dimensional and high-dimensional settings. We further illustrate our method using a mouse obesity dataset.
Research Seminar in StatisticsThe synthetic instrument: From sparse association to sparse causationread_more |
HG G 19.1 |
Fri 11.04.2025 15:15-16:15 |
Victoria Stodden University of Southern California |
HG G 19.1 |
|
Thr 08.05.2025 15:15-16:15 |
Toby Hocking |
Abstract
data.table is an R package with C code that is one of the
most efficient open-source in-memory database packages available
today. First released to CRAN by Matt Dowle in 2006, it continues to
grow in popularity, and now over 1500 other CRAN packages depend on
data.table. This talk will discuss basic and advanced data
manipulation topics, and end with a discussion about how you can
contribute to data.table.
ZueKoSt: Seminar on Applied StatisticsUsing and contributing to the data.table package for efficient big data analysisread_more |
HG |
Mon 12.05.2025 17:15-18:15 |
Andrew Stuart Caltech |
HG F 30 |
|
Wed 25.06.2025 |
Abstract
More information: https://math.ethz.ch/fim/activities/conferences/High-dimensional-statistics-applications-and-distributional-shifts.htmlcall_made High-dimensional statistics, applications, and distributional shifts -- A workshop in celebration of Peter Bühlmann's 60th birthday“, June 2025call_made |
HG F 3 |
|
Thr 26.06.2025 |
Abstract
More information: https://math.ethz.ch/fim/activities/conferences/High-dimensional-statistics-applications-and-distributional-shifts.htmlcall_made High-dimensional statistics, applications, and distributional shifts -- A workshop in celebration of Peter Bühlmann's 60th birthday“, June 2025call_made |
HG F 3 |
|
Fri 27.06.2025 |
Abstract
More information: https://math.ethz.ch/fim/activities/conferences/High-dimensional-statistics-applications-and-distributional-shifts.htmlcall_made High-dimensional statistics, applications, and distributional shifts -- A workshop in celebration of Peter Bühlmann's 60th birthday“, June 2025call_made |
HG F 3 |