Young Data Science Researcher Seminar Zurich

×

Modal title

Modal content

Please subscribe here if you would you like to be notified about these events via e-mail. Moreover you can also subscribe to the iCal/ics Calender.

Spring Semester 2021

Date / Time Speaker Title Location
12 February 2021
15:00-16:00
Tengyao Wang
UCL
Details

Young Data Science Researcher Seminar Zurich

Title Two-sample testing of high-dimensional linear regression coefficients via complementary sketching
Speaker, Affiliation Tengyao Wang, UCL
Date, Time 12 February 2021, 15:00-16:00
Location
Abstract We introduce a new method for two-sample testing of high-dimensional linear regression coefficients without assuming that those coefficients are individually estimable. The procedure works by first projecting the matrices of covariates and response vectors along directions that are complementary in sign in a subset of the coordinates, a process which we call `complementary sketching'. The resulting projected covariates and responses are aggregated to form two test statistics, which are shown to have essentially optimal asymptotic power under a Gaussian design when the difference between the two regression coefficients is sparse and dense respectively. Simulations confirm that our methods perform well in a broad class of settings.
Two-sample testing of high-dimensional linear regression coefficients via complementary sketchingread_more
19 February 2021
15:00-16:00
Evgenii Chzhen
Université Paris-Saclay
Details

Young Data Science Researcher Seminar Zurich

Title A minimax framework for quantifying risk-fairness trade-off in regression
Speaker, Affiliation Evgenii Chzhen, Université Paris-Saclay
Date, Time 19 February 2021, 15:00-16:00
Location
Abstract A theoretical framework is proposed for the problem of learning a real-valued function which meets fairness requirements. This framework is built upon the notion of alpha-relative (fairness) improvement of the regression function which we introduce using the theory of optimal transport. Setting alpha = 0 corresponds to the regression problem under the Demographic Parity constraint, while alpha = 1 corresponds to the classical regression problem without any constraints. For alpha in-between 0 and 1 the proposed framework allows to continuously interpolate between these two extreme cases and to study partially fair predictors. Within this framework we precisely quantify the cost in risk induced by the introduction of the fairness constraint. We put forward a statistical minimax setup and derive a general problem-dependent lower bound on the risk of any estimator satisfying alpha-relative improvement constraint. We illustrate our framework on a model of linear regression with Gaussian design and systematic group-dependent bias, deriving matching (up to absolute constants) upper and lower bounds on the minimax risk under the introduced constraint. This talk is based on a joint work with Nicolas Schreuder, see [arXiv:2007.14265].
A minimax framework for quantifying risk-fairness trade-off in regressionread_more
26 February 2021
16:00-17:00
Feng Ruan
UC Berkeley
Details

Young Data Science Researcher Seminar Zurich

Title Searching for Interactions in Linear Time
Speaker, Affiliation Feng Ruan, UC Berkeley
Date, Time 26 February 2021, 16:00-17:00
Location
Abstract We tackle the problem of variable selection with a focus on discovering interactions between variables. With p variables, there are O(p^k) possible interactions of order k making exhaustive search infeasible. We show that it is nonetheless possible to identify the variables involved in interactions (of any order) with only linear computation cost, O(p), and in a nonparametric fashion. Our algorithm is based on minimizing a non-convex objective, carefully designed to have a favorable landscape. We provide finite sample guarantees on both false positives (we show all stationary points of the objective exclude noise variables) and false negatives (we characterize the sample sizes needed for gradient descent to converge to a "good’’ stationary point).
Searching for Interactions in Linear Timeread_more
5 March 2021
14:30-15:30
Aaditya Ramdas
Carnegie Melon University
Details

Young Data Science Researcher Seminar Zurich

Title Estimating means of bounded random variables by betting
Speaker, Affiliation Aaditya Ramdas, Carnegie Melon University
Date, Time 5 March 2021, 14:30-15:30
Location
Abstract We derive confidence intervals (CI) and time-uniform confidence sequences (CS) for the classical problem of estimating an unknown mean from bounded observations. We present a general approach for deriving concentration bounds, that can be seen as a generalization (and improvement) of the celebrated Chernoff method. At its heart, it is based on deriving a new class of composite nonnegative martingales, with strong connections to betting and the method of mixtures. We show how to extend these ideas to sampling without replacement, another heavily studied problem. In all cases, our bounds are adaptive to the unknown variance, and empirically vastly outperform competing approaches based on Hoeffding or empirical Bernstein inequalities and their recent supermartingale generalizations. In short, we establish a new state-of-the-art for four fundamental problems: CSs and CIs for bounded means, with and without replacement. This work is joint with Ian Waudby-Smith, a preprint is here: https://arxiv.org/abs/2010.09686.
Estimating means of bounded random variables by bettingread_more
12 March 2021
16:30-17:30
Christos Thrampoulidis
University of British Columbia
Details

Young Data Science Researcher Seminar Zurich

Title Blessings and Curses of Overparameterization: A Precise High-dimensional Approach
Speaker, Affiliation Christos Thrampoulidis, University of British Columbia
Date, Time 12 March 2021, 16:30-17:30
Location
Abstract State-of-the-art deep neural networks generalize well despite being exceedingly overparameterized and trained without explicit regularization. Understanding the principles behind this phenomenon —termed benign overfitting or double descent— poses a new challenge to modern learning theory as it contradicts classical statistical wisdoms. Key questions include: What are the fundamental mechanics behind double descent? How do its features, such as the transition threshold and global minima, depend on the training data and on the algorithms used for training? While increasing overparameterization can improve classification accuracy, it also comes with larger, thus slower and computationally more expensive, architectures that can be prohibitive in resource-constrained applications. Is then overparameterization only relevant in training large networks or can it also benefit training smaller models when combined with appropriate model pruning techniques? What are the generalization dynamics of pruning overparameterized models? Finally, while overparameterization leads to lower misclassification error, what is its effect on fairness performance metrics, such as balanced error and equal opportunity? Can we design better loss functions, compared to standard losses such as cross entropy, which provably improve fairness performance of large models in the presence of label-imbalanced and/or group-sensitive datasets? This talk will shed light on the questions raised above. At the heart of the results presented lies a powerful analysis framework for precise high-dimensional statistical analysis. This so-called convex Gaussian min-max Theorem framework builds on Gordon’s Gaussian comparison inequality and is rooted in the study of sharp phase-transitions in Compressed Sensing.
Blessings and Curses of Overparameterization: A Precise High-dimensional Approachread_more
19 March 2021
16:00-17:00
Stephen Bates
UC Berkeley
Details

Young Data Science Researcher Seminar Zurich

Title Distribution-Free, Risk-Controlling Prediction Sets
Speaker, Affiliation Stephen Bates, UC Berkeley
Date, Time 19 March 2021, 16:00-17:00
Location
Abstract To enable valid statistical inference in prediction tasks, we show how to generate set-valued predictions with black-box models that control various notions of statistical error. Our approach guarantees that the expected loss on future test points falls below a user-specified level, for any predictive model and underlying distribution. Building on conformal prediction, we use a holdout set to calibrate the size of the prediction sets, generalizing the approach to control error notions such as the false rejection rate. We demonstrate our procedure in four large-scale problems: (1) multi-label classification, where each observation has multiple associated labels; (2) classification problems where the labels have a hierarchical structure; (3) image segmentation, where we wish to predict a set of pixels containing an object of interest; and (4) protein structure prediction.
Distribution-Free, Risk-Controlling Prediction Setsread_more
26 March 2021
15:00-16:00
Dana Yang
Duke University
Details

Young Data Science Researcher Seminar Zurich

Title The planted matching problem: Sharp threshold and infinite-order phase transition
Speaker, Affiliation Dana Yang, Duke University
Date, Time 26 March 2021, 15:00-16:00
Location
Abstract Motivated by the application of tracking moving particles from snapshots, we study the problem of reconstructing a perfect matching hidden in a randomly weighted nxn bipartite graph. The edge set includes every node pair in the hidden matching and each of the other n(n-1) node pairs independently with probability d/n. The weight of each edge is independently drawn from distributions P or Q, depending on whether the edge is in the hidden matching. In this talk, we establish the information-theoretic threshold for recovering almost all the edges of the hidden matching. We show that the sharp threshold occurs at \sqrt{d}B(P,Q)=1, where B(P,Q) stands for the Bhattacharyya coefficient, resolving the conjecture in [Moharrami et al. 2019, Semerjian et al. 2020]. Furthermore, in the special case of complete exponentially weighted graphs with d=n, P=\exp(\lambda), and Q=\exp(1/n), for which the sharp threshold simplifies to \lambda=4, we prove that when \lambda <= 4-\epsilon, the optimal reconstruction error (the average number of misclassified edges) is \exp( \Theta(1/\sqrt{\epsilon})), confirming the conjectured infinite-order phase transition in [Semerjian et al. 2020]. This is joint work with Jian Ding, Yihong Wu and Jiaming Xu. The preprint of this work is available at https://arxiv.org/abs/2103.09383.
The planted matching problem: Sharp threshold and infinite-order phase transitionread_more
16 April 2021
16:30-17:30
Spencer Frei
UCLA
Details

Young Data Science Researcher Seminar Zurich

Title Provable Generalization of SGD-trained Neural Networks of Any Width in the Presence of Adversarial Label Noise
Speaker, Affiliation Spencer Frei, UCLA
Date, Time 16 April 2021, 16:30-17:30
Location
Abstract Can overparameterized neural networks trained by SGD provably generalize when the labels are corrupted with substantial random noise? We answer this question in the affirmative by showing that for a broad class of distributions, one-hidden-layer networks trained by SGD generalize when the distribution is linearly separable but corrupted with adversarial label noise, despite the capacity to overfit. Equivalently, such networks have classification accuracy competitive with that of the best halfspace over the distribution. Our results hold for networks of arbitrary width and for arbitrary initializations of SGD. In particular, we do not rely upon the approximations to infinite width networks that are typically used in theoretical analyses of SGD-trained neural networks.
Provable Generalization of SGD-trained Neural Networks of Any Width in the Presence of Adversarial Label Noiseread_more
23 April 2021
15:00-16:00
Yuqi Gu
Duke University
Details

Young Data Science Researcher Seminar Zurich

Title Bayesian pyramids:Identifying Interpretable Discrete Latent Structures from Discrete Data
Speaker, Affiliation Yuqi Gu, Duke University
Date, Time 23 April 2021, 15:00-16:00
Location
Abstract High dimensional categorical data are routinely collected in biomedical and social sciences. It is of great importance to build interpretable models that perform dimension reduction and uncover meaningful latent structures from such discrete data. Identifiability is a fundamental requirement for valid modeling and inference in such scenarios, yet is challenging to address when there are complex latent structures. We propose a class of interpretable discrete latent structure models for discrete data and develop a general identifiability theory. Our theory is applicable to various types of latent structures, ranging from a single latent variable to deep layers of latent variables organized in a sparse graph (termed a Bayesian pyramid). The proposed identifiability conditions can ensure Bayesian posterior consistency under suitable priors. As an illustration, we consider the two-latent-layer model and propose a Bayesian shrinkage estimation approach. Simulation results for this model corroborate identifiability and estimability of the model parameters. Applications of the methodology to DNA nucleotide sequence data uncover discrete latent features that are both interpretable and highly predictive of sequence types. The proposed framework provides a recipe for interpretable unsupervised learning of discrete data, and can be a useful alternative to popular machine learning methods.
Bayesian pyramids:Identifying Interpretable Discrete Latent Structures from Discrete Dataread_more
30 April 2021
16:00-17:00
Dominik Rothenhäusler
Stanford
Details

Young Data Science Researcher Seminar Zurich

Title Conditional inference: Towards a hierarchy of statistical evidence
Speaker, Affiliation Dominik Rothenhäusler, Stanford
Date, Time 30 April 2021, 16:00-17:00
Location
Abstract Statistical uncertainty has many sources. P-values and confidence intervals usually quantify the overall uncertainty, which may include variation due to sampling and uncertainty due to measurement error, among others. Practitioners might be interested in quantifying only one source of uncertainty. For example, one might be interested in the uncertainty of a regression coefficient of a fixed set of subjects, which corresponds to quantifying the uncertainty due to measurement error and ignoring the variation induced by sampling the covariates. In causal inference, it is common to infer treatment effects for a certain set of subjects, only accounting for uncertainty due to random treatment assignment. Motivated by these examples, we consider conditional inference for conditional parameters in parametric and semi-parametric models; where we condition on observed characteristics of a population. We discuss methods to obtain conditionally valid p-values and confidence intervals. Conditional p-values can be used to construct a hierarchy of statistical evidence that may help clarify the generalizability of a statistical finding. In addition, we will discuss preliminary results on how to conduct transfer learning of conditional parameters, with rigorous conditional guarantees. This is ongoing work with Ying Jin.
Conditional inference: Towards a hierarchy of statistical evidenceread_more
7 May 2021
15:00-16:00
Cheng Mao
Georgia Tech
Details

Young Data Science Researcher Seminar Zurich

Title Random Graph Matching: Efficient Algorithms for Recovery and Detection
Speaker, Affiliation Cheng Mao, Georgia Tech
Date, Time 7 May 2021, 15:00-16:00
Location
Abstract Graph matching, also known as network alignment, refers to the problem of matching the vertices of two unlabeled, edge-correlated graphs. The problem finds applications in many areas, such as computational biology, network privacy, and computer vision. In this talk, I will discuss several recent advances in efficient recovery and detection of the latent matching between the two graphs. In particular, under a correlated Erdős–Rényi model, we can recover the exact matching with high probability if the correlation between the two graphs on n vertices is at least 1-1/polyloglog(n), while the associated detection problem can be efficiently solved if the correlation is at least a constant. On the other hand, there is evidence for computational hardness when the correlation is small. Moreover, I will discuss the theoretical significance of the graph matching problem, its connection to other problems with a planted signal, and many future directions. The talk is based on joint works with Zhou Fan, Yihong Wu, Jiaming Xu, Mark Rudelson, Konstantin Tikhomirov, and Sophie H. Yu.
Random Graph Matching: Efficient Algorithms for Recovery and Detectionread_more
14 May 2021
15:00-16:00
Boris Muzellec
ENS Paris-Saclay
Details

Young Data Science Researcher Seminar Zurich

Title Breaking the curse of dimensionality in smooth optimal transport.
Speaker, Affiliation Boris Muzellec, ENS Paris-Saclay
Date, Time 14 May 2021, 15:00-16:00
Location
Abstract It is well-known that plug-in statistical estimation of optimal transport suffers from the curse of dimensionality. While recent works were able to leverage smoothness to improve the rate of estimation, the computational complexity of the resulting methods still degrades exponentially with the dimension. In this talk, we show how to leverage smoothness using a kernel sum-of-squares representation of the dense set of inequalities satisfied by optimal transport. Using this technique, we propose a polynomial-time algorithm that results in estimation rates that do not depend on the dimension – at the price of constants that may still depend exponentially on the dimension, in the worst case.
Breaking the curse of dimensionality in smooth optimal transport.read_more
21 May 2021
15:00-16:00
Johannes Wiesel
Columbia University
Details

Young Data Science Researcher Seminar Zurich

Title Estimating processes in adapted Wasserstein distance
Speaker, Affiliation Johannes Wiesel, Columbia University
Date, Time 21 May 2021, 15:00-16:00
Location
Abstract A number of researchers have independently introduced topologies on the set of laws of stochastic processes that extend the usual weak topology. Depending on the respective scientific background this was motivated by applications and connections to various areas (e.g. Plug-Pichler - stochastic programming, Hellwig - game theory, Aldous - stability of optimal stopping, Hoover-Keisler - model theory). Remarkably, all these seemingly independent approaches define the same adapted weak topology in finite discrete time. Our first main result is to construct an adapted variant of the empirical measure that consistently estimates the laws of stochastic processes in full generality. A natural compatible metric for the weak adapted topology is the given by an adapted refinement of the Wasserstein distance, as established in the seminal works of Pflug-Pichler. Specifically, the adapted Wasserstein distance allows to control the error in stochastic optimization problems, pricing and hedging problems, optimal stopping problems, etc. in a Lipschitz fashion. The second main result of this talk yields quantitative bounds for the convergence of the adapted empirical measure with respect to adapted Wasserstein distance. Surprisingly, we obtain virtually the same optimal rates and concentration results that are known for the classical empirical measure wrt. Wasserstein distance. Lastly we construct a coefficient of association as an application of the above theory. This talk is based on joint work with Julio Backhoff, Daniel Bartl and Mathias Beiglböck.
Estimating processes in adapted Wasserstein distanceread_more
28 May 2021
15:00-16:00
Morgane Austern
Microsoft Research New England
Details

Young Data Science Researcher Seminar Zurich

Title Asymptotics of learning on dependent and structured random objects
Speaker, Affiliation Morgane Austern, Microsoft Research New England
Date, Time 28 May 2021, 15:00-16:00
Location
Abstract Classical statistical inference relies on numerous tools from probability theory to study the properties of estimators. However, these same tools are often inadequate to study modern machine problems that frequently involve structured data (e.g networks) or complicated dependence structures (e.g dependent random matrices). In this talk, we extend universal limit theorems beyond the classical setting. Firstly, we consider distributionally \structured" and dependent random object{i.e random objects whose distribution are invariant under the action of an amenable group. We show, under mild moment and mixing conditions, a series of universal second and third order limit theorems: central-limit theorems, concentration inequalities, Wigner semi-circular law and Berry-Esseen bounds. The utility of these will be illustrated by a series of examples in machine learning, network and information theory. Secondly by building on these results, we establish the asymptotic distribution of the cross-validated risk with the number of folds allowed to grow at an arbitrary rate. Using this, we study the statistical speed-up of cross validation compared to a train-test split procedure, which reveals surprising results even when used on simple estimators.
Asymptotics of learning on dependent and structured random objectsread_more
4 June 2021
16:30-17:30
Lihua Lei
Stanford University
Details

Young Data Science Researcher Seminar Zurich

Title Conformal Inference of Counterfactuals and Time-to-event Outcomes
Speaker, Affiliation Lihua Lei, Stanford University
Date, Time 4 June 2021, 16:30-17:30
Location
Abstract Recent progress in machine learning provides us with myriad powerful tools for prediction. When they are deployed for high-stakes decision-making, it is also crucial to have valid uncertainty quantification, which is challenging for complex predictive algorithms. The challenge is more pronounced in situations where the predicted targets are not fully observed in the data. This talk introduces conformal inference-based approaches to generate calibrated prediction intervals for two types of partially observed outcomes: (1) counterfactuals characterized by potential outcomes, observable only for those in a particular treatment arm, and (2) time-to-event outcomes, observable only for those whose event has occurred. When the missing data mechanism is known, as in randomized experiments, both approaches achieve desired coverage in finite samples without any assumption on the distribution of the outcome conditional on the covariates or the accuracy of the predictive algorithm. When the missing data mechanism is unknown, both approaches satisfy a doubly robust guarantee of coverage. We demonstrate on both simulated and real datasets that our prediction intervals are calibrated and relatively tight.
Conformal Inference of Counterfactuals and Time-to-event Outcomesread_more
11 June 2021
15:00-16:00
Yuhao Wang
Tsinghua University
Details

Young Data Science Researcher Seminar Zurich

Title Debiased Inverse Propensity Score Weighting for Estimation of Average Treatment Effects with High-Dimensional Confounders
Speaker, Affiliation Yuhao Wang, Tsinghua University
Date, Time 11 June 2021, 15:00-16:00
Location
Abstract We consider estimation of average treatment effects given observational data with high-dimensional pretreatment variables. Existing methods for this problem typically assume some form of sparsity for the regression functions. In this work, we introduce a debiased inverse propensity score weighting (DIPW) scheme for average treatment effect estimation that delivers \sqrt{n}-consistent estimates of the average treatment effect when the propensity score follows a sparse logistic regression model; the regression functions are permitted to be arbitrarily complex. Our theoretical results quantify the price to pay for permitting the regression functions to be unestimable, which shows up as an inflation of the variance of the estimator compared to the semiparametric efficient variance by at most O(1) under mild conditions. Given the lack of assumptions on the regression functions, averages of transformed responses under each treatment may also be estimated at the \sqrt{n} rate, and so for example, the variances of the potential outcomes may be estimated. We show how confidence intervals centred on our estimates may be constructed, and also discuss an extension of the method to estimating projections of the heterogeneous treatment effect function.
Debiased Inverse Propensity Score Weighting for Estimation of Average Treatment Effects with High-Dimensional Confoundersread_more
JavaScript has been disabled in your browser