Young Data Science Researcher Seminar Zurich

Modal title

Modal content

Please subscribe here if you would you like to be notified about these events via e-mail. Moreover you can also subscribe to the iCal/ics Calender.

Spring Semester 2020

Date / Time

Speaker

Title

Location

22 May 2020
15:00-16:00

Thomas Berrett
ENSAE, Paris

Details

Young Data Science Researcher Seminar Zurich

Title	The conditional permutation test for independence while controlling for confounders
Speaker, Affiliation	Thomas Berrett, ENSAE, Paris
Date, Time	22 May 2020, 15:00-16:00
Location	zoom
Abstract	In this talk I will discuss the problem of testing conditional independence. In standard independence testing problems, given a test statistic measuring the strength of dependence, a simple and practical approach to find critical values and calibrate our tests is to permute the data uniformly at random and recalculate the test statistic in order to simulate its behaviour under the null hypothesis of independence. Unfortunately, this is no longer effective when testing conditional independence, and may result in misleading conclusions. We propose a general new method, the conditional permutation test, for testing the conditional independence of variables X and Y given a potentially high-dimensional random vector Z that may contain confounding factors. The proposed test permutes entries of X non-uniformly, so as to respect the existing dependence between X and Z and thus account for the presence of these confounders. Like the conditional randomization test of Candès et al., our test is useful in the `Model-X’ framework, in which an approximation to the distribution of X\|Z is available—while Candès et al.’s test uses this estimate to draw new X values, for our test we use this approximation to design an appropriate non-uniform distribution on permutations of the X values already seen in the true data. We provide an efficient Markov Chain Monte Carlo sampler for the implementation of our method, and establish bounds on the Type I error in terms of the error in the approximation of the conditional distribution of X\|Z, finding that, for the worst case test statistic, the inflation in Type I error of the conditional permutation test is no larger than that of the conditional randomization test. We validate these theoretical results with experiments on simulated data and on the Capital Bikeshare data set. This talk is based on joint work with Yi Wang, Rina Foygel Barber and Richard Samworth.

The conditional permutation test for independence while controlling for confoundersread_more

zoom

29 May 2020
15:00-16:00

Wooseok Ha
University of California, Berkeley

Details

Young Data Science Researcher Seminar Zurich

Title	An equivalence between critical points for rank constraints versus low-rank factorizations
Speaker, Affiliation	Wooseok Ha, University of California, Berkeley
Date, Time	29 May 2020, 15:00-16:00
Location
Abstract	Two common approaches in low-rank optimization problems are either working directly with a rank constraint on the matrix variable, or optimizing over a low-rank factorization so that the rank constraint is implicitly ensured. In this talk, we show the natural connection between the rank constrained and factorized approaches. In particular, we show that all second-order stationary points of the factorized objective function correspond to fixed points of projected gradient descent run on the original problem (where the projection step enforces the rank constraint). This result allows us to unify many existing optimization guarantees that have been proved specifically in either the rank-constrained or the factorized setting, and leads to new results for certain settings of the problem. A major tool for handling the low-rank constraint is the local concavity coefficient, which aims to measure the concavity of a rank-constrained space. We demonstrate the application of our results to several concrete low-rank optimization problems arising in the matrix inverse problems.

An equivalence between critical points for rank constraints versus low-rank factorizations read_more

5 June 2020
15:00-16:00

Cong Ma
Princeton University

Details

Young Data Science Researcher Seminar Zurich

Title	Bridging convex and nonconvex optimization in noisy matrix completion: Stability and uncertainty quantification
Speaker, Affiliation	Cong Ma, Princeton University
Date, Time	5 June 2020, 15:00-16:00
Location
Abstract	This talk is concerned with noisy matrix completion: given partial and corrupted entries of a large low-rank matrix, how to estimate and infer the underlying matrix? Arguably one of the most popular paradigms to tackle this problem is convex relaxation, which achieves remarkable efficacy in practice. However, the statistical stability guarantees of this approach are still far from optimal in the noisy setting, falling short of explaining its empirical success. Moreover, it is generally very challenging to pin down the distributions of the convex estimator, which presents a major roadblock in assessing the uncertainty, or “confidence”, of the obtained estimates. Our recent work makes progress towards understanding stability and uncertainty quantification for noisy matrix completion. When the rank of the unknown matrix is a constant: (1) we demonstrate that the convex estimator achieves near-optimal estimation errors vis-à-vis random noise; (2) we develop a de-biased estimator that admits accurate distributional characterizations, thus enabling asymptotically optimal inference of the low-rank factors and the entries of the matrix. All of this is enabled by bridging convex relaxation with the nonconvex Burer-Monteiro approach, a seemingly distinct algorithmic paradigm that is provably stable against noise.

Bridging convex and nonconvex optimization in noisy matrix completion: Stability and uncertainty quantificationread_more

12 June 2020
16:00-17:00

Mohamed Ndaoud
University of Southern California

Details

Young Data Science Researcher Seminar Zurich

Title	Fast and adaptive iterative hard thresholding in high dimensional linear regression: A non-convex algorithmic regularization approach
Speaker, Affiliation	Mohamed Ndaoud, University of Southern California
Date, Time	12 June 2020, 16:00-17:00
Location
Abstract	Datasets with large number of features are becoming increasingly available and important in every field of research and innovation, urging practitioners to develop scalable algorithms with fast convergence. The question of fast convergence in the classical problem of high dimensional linear regression has been extensively studied. Arguably, one of the fastest procedures in practice is Iterative Hard Thresholding (IHT). Still, IHT relies strongly on the knowledge of the true sparsity parameter $s$. In this paper, we present a novel fast procedure for estimation in the high dimensional linear regression model. Taking advantage of the interplay between estimation, support recovery and optimization we achieve both optimal statistical accuracy and fast convergence. The main advantage of our procedure is that it is fully adaptive, making it more practical than state of the art IHT methods. Our procedure achieves optimal statistical accuracy faster than, for instance, classical algorithms for the Lasso. As a consequence, we present a new iterative hard thresholding algorithm for high dimensional linear regression that is minimax optimal, fast and adaptive.

Fast and adaptive iterative hard thresholding in high dimensional linear regression: A non-convex algorithmic regularization approachread_more

19 June 2020
16:30-17:30

Lucy Gao
University of Washington

Details

Young Data Science Researcher Seminar Zurich

Title	Statistical Inference for Multi-View Clustering
Speaker, Affiliation	Lucy Gao, University of Washington
Date, Time	19 June 2020, 16:30-17:30
Location
Abstract	In the multi-view data setting, multiple data sets are collected on a single, common set of observations. For example, we might perform genomic and proteomic assays on a single set of tumour samples, or we might collect relationship data from two online social networks for a single set of users. It is tempting to cluster the observations using all of the data views, in order to fully exploit the available information. However, clustering the observations using all of the data views implicitly assumes that a single underlying clustering of the observations is shared across all data views. If this assumption does not hold, then clustering the observations using all data views may lead to spurious results. We seek to evaluate the assumption that there is some underlying relationship among the clusterings from the different data views, by asking the question: are the clusters within each data view dependent or independent? We develop new tests for answering this question based on multivariate and/or network data views. This is joint work with Jacob Bien (University of Southern California) and Daniela Witten (University of Washington).

Statistical Inference for Multi-View Clusteringread_more

26 June 2020
15:00-16:00

Pragya Sur
Harvard University

Details

Young Data Science Researcher Seminar Zurich

Title	A precise high-dimensional theory for boosting
Speaker, Affiliation	Pragya Sur, Harvard University
Date, Time	26 June 2020, 15:00-16:00
Location
Abstract	This talk will introduce a precise high-dimensional asymptotic theory for Boosting on separable data, taking both statistical and computational perspectives. We will consider the common modern setting where the number of features p and the sample size n are both large and comparable, and in particular, look at scenarios where the data is separable in an asymptotic sense. On the statistical front, we will provide an exact analysis of the generalization error of Boosting, when the algorithm interpolates the training data and maximizes an empirical L1 margin. The angle between the Boosting solution and the ground truth can be explicitly characterized. On the computational front, we will provide a sharp analysis of the stopping time when Boosting approximately maximizes the empirical L1 margin. Our theory provides several insights into properties of Boosting, for instance, we discover that the larger the margin, the smaller the proportion of active features (with zero initialization). At the heart of our theory lies a detailed study of the maximum L1 margin, using tools from stochastic convex geometry. This is based on joint work with Tengyuan Liang.

A precise high-dimensional theory for boostingread_more

3 July 2020
15:00-16:00

Chi Jin
Princeton University

Details

Young Data Science Researcher Seminar Zurich

Title	Near-Optimal Reinforcement Learning with Self-Play
Speaker, Affiliation	Chi Jin, Princeton University
Date, Time	3 July 2020, 15:00-16:00
Location
Abstract	Self-play, where the algorithm learns by playing against itself without requiring any direct supervision, has become the new weapon in modern Reinforcement Learning (RL) for achieving superhuman performance in practice. However, the majority of existing theory in reinforcement learning only applies to the setting where a single agent plays against a fixed environment. It remains largely open how to design efficient self-play algorithms in two-player sequential games, especially when it is necessary to manage the exploration/exploitation tradeoff. In this talk, we present the first line of provably efficient self-play algorithms in a basic setting of tabular episodic Markov games. Our algorithms further feature the near-optimal sample complexity---the number of samples required by our algorithms matches the information-theoretic lower bound up to a polynomial factor of the length of each episode.

Near-Optimal Reinforcement Learning with Self-Playread_more

10 July 2020
15:00-16:00

Sven Wang
University of Cambridge

Details

Young Data Science Researcher Seminar Zurich

Title	On polynomial-time computation of high-dimensional posterior measures by Langevin-type algorithms
Speaker, Affiliation	Sven Wang, University of Cambridge
Date, Time	10 July 2020, 15:00-16:00
Location
Abstract	The problem of generating random samples of high-dimensional posterior distributions arising from Gaussian process priors is considered. The main results consist of non-asymptotic computational guarantees for Langevin-type MCMC algorithms which scale polynomially in key quantities such as the dimension of the model, the desired precision level, and the number of available statistical measurements. As a direct consequence, it is shown that posterior mean vectors as well as maximum a posteriori (MAP) estimates are computable in polynomial time, with high probability under the distribution of the data. These results are derived in a general high-dimensional non-linear regression setting where posterior measures are not necessarily log-concave, employing a set of local `geometric' assumptions on the parameter space. The theory is illustrated in a representative example from PDEs involving a non-linear inverse problem for the steady-state Schrödinger equation.

On polynomial-time computation of high-dimensional posterior measures by Langevin-type algorithmsread_more

17 July 2020
17:00-18:00

Nhat Ho
University of California, Berkeley

Details

Young Data Science Researcher Seminar Zurich

Title	Statistical and computational perspectives on latent variable models
Speaker, Affiliation	Nhat Ho, University of California, Berkeley
Date, Time	17 July 2020, 17:00-18:00
Location
Abstract	The growth in scope and complexity of modern data sets presents the field of statistics and data science with numerous inferential and computational challenges, among them how to deal with various forms of heterogeneity. Latent variable models provide a principled approach to modeling heterogeneous collections of data. However, due to the over-parameterization, it has been observed that parameter estimation and latent structures of these models have non-standard statistical and computational behaviors. In this talk, we provide new insights into these behaviors under mixture models, a building block of latent variable models. From the statistical viewpoint, we propose a general framework for studying the convergence rates of parameter estimation in mixture models based on Wasserstein distance. Our study makes explicit the links between model singularities, parameter estimation convergence rates, and the algebraic geometry of the parameter space for mixtures of continuous distributions. Based on the convergence rates of parameter estimation under the Wasserstein distance, we propose a novel Merge-Truncate-Merge procedure to consistently estimate the true number of components in mixture models. From the computational side, we study the non-asymptotic behavior of the Expectation-Maximization (EM) algorithm, which is a stable method, and some high-order and unstable algorithms, including Newton’s method, under the over-specified settings of mixture models in which the likelihood need not be strongly concave, or, equivalently, the Fisher information matrix might be singular. Focusing on the simple setting of a two-component mixture fit with equal mixture weights to a multivariate Gaussian distribution, we demonstrate that EM updates converge to a fixed point at Euclidean distance O((d/n)^{1/4}) from the true parameter after O((n/d)^{1/2}) steps where d is the dimension. On the other hand, Newton’s method can achieve the same statistical accuracy as the EM algorithm in exponentially fewer steps - namely, with the number of iterations being reduced from O((n/d)^{1/2}) to O(log(n/d)). Our result shows that unstable algorithms are preferred to their stable counterparts under this simple setting of mixture models. As a consequence, it rules out the belief that the stability of the algorithm is a necessity for obtaining efficient statistical estimators.

Statistical and computational perspectives on latent variable modelsread_more

24 July 2020
15:00-16:00

Haoyang Liu
University of Chicago

Details

Young Data Science Researcher Seminar Zurich

Title	Algorithmic optimality for iterative thresholding algorithms
Speaker, Affiliation	Haoyang Liu, University of Chicago
Date, Time	24 July 2020, 15:00-16:00
Location
Abstract	One challenge for high-dimensional data is to analyze the behavior of different algorithms facilitating sparsity, since the statistical task is usually formulated into an optimization problems with a non-convex sparsity constraint. In this talk, we will address the question of algorithmic optimality for the class of iterative thresholding algorithms under the framework of restricted optimality.

Algorithmic optimality for iterative thresholding algorithmsread_more

Archive: SS 24 AS 23 SS 23 AS 22 SS 22 AS 21 SS 21 AS 20 SS 20