Young Data Science Researcher Seminar Zurich

×

Modal title

Modal content

Please subscribe here if you would you like to be notified about these events via e-mail. Moreover you can also subscribe to the iCal/ics Calender.

Spring Semester 2022

Date / Time Speaker Title Location
10 February 2022
16:00-17:00
Sohom Bhattacharya
Stanford University
Details

Young Data Science Researcher Seminar Zurich

Title Global Testing under dependent Bernoullis
Speaker, Affiliation Sohom Bhattacharya, Stanford University
Date, Time 10 February 2022, 16:00-17:00
Location
Abstract We consider the problem of detecting whether or not, in a given network, there is a cluster of nodes which exhibit an unusual behavior. When the nodes correspond to independent Bernoulli random variables, such detection problems are well-studied in literature. How- ever, a fundamental question in this  eld is to study how dependence characterized by a network modulate the behavior of such problems. Formally, we will address the detection question when the nodes of a network correspond to Bernoulli variables with dependence modeled by graphical models (Ising models). Our results not only provide sharp constants of detection in these cases and thereby pinpoint the precise relationship of the detection problem with the underlying dependence, but also demonstrate how to be agnostic over the strength of dependence present in the respective models. This is based on joint work with Rajarshi Mukherjee and Gourab Ray.
Global Testing under dependent Bernoullisread_more
17 February 2022
16:00-17:00
Richard Guo
University of Cambridge
Details

Young Data Science Researcher Seminar Zurich

Title Variable elimination, graph reduction and efficient g-formula
Speaker, Affiliation Richard Guo, University of Cambridge
Date, Time 17 February 2022, 16:00-17:00
Location
Abstract We study efficient estimation of an intervention mean associated with a point exposure treatment under a causal graphical model represented by a directed acyclic graph without hidden variables. Under such model, it may happen that a subset of the variables are uninformative in that failure to measure them neither precludes identification of the intervention mean nor changes the semiparametric variance bound for regular estimators of it. Identification of such uninformative variables is particularly useful at the stage of designing a planned observational or randomized study in that measurements of such variables can be avoided without sacrificing efficiency. We develop a set of graphical criteria that are sound and complete for eliminating all uninformative variables. In addition, we construct a reduced directed acyclic graph that exactly represents the induced marginal model over the informative variables. We show that the interventional mean is identified by the g-formula (Robins, 1986) according to this graph. This g-formula is the irreducible, efficient identifying formula --- nonparametric plugin of the formula achieves the semiparametric efficiency bound of the original graphical model.
Variable elimination, graph reduction and efficient g-formularead_more
24 February 2022
15:00-16:00
Anna Ma
University of California
Details

Young Data Science Researcher Seminar Zurich

Title Gaussian Spherical Tessellations and Learning Adaptively
Speaker, Affiliation Anna Ma, University of California
Date, Time 24 February 2022, 15:00-16:00
Location
Abstract Signed measurements of the form $y_i = sign(\langle a_i, x \rangle)$ for $i \in [M]$ are ubiquitous in large-scale machine learning problems where the overarching task is to recover the unknown, unit norm signal $x \in \mathbb{R}^d$. Oftentimes, measurements can be queried adaptively, for example based on a current approximation of $x$, leading to only a subset of the $M$ measurements being needed. Geometrically, these measurements emit a spherical hyperplane tessellation in $\mathbb{R}^{d}$ where one of the cells in the tessellation contains the unknown vector $x$. Motivated by this problem, in this talk we will present a geometric property related to spherical hyperplane tessellations in $\mathbb{R}^{d}$. Under the assumption that $a_i$ are Gaussian random vectors, we will show that with high probability there exists a subset of the hyperplanes whose cardinality is on the order of $d\log(d)\log(M)$ such that the radius of the cell containing $x$ induced by these hyperplanes is bounded above by, up to constants, $d\log(d)\log(M)/M$. The work presented is joint work with Rayan Saab and Eric Lybrand.
Gaussian Spherical Tessellations and Learning Adaptivelyread_more
3 March 2022
15:00-16:00
Eliza O'Reilly
California Institute of Technology
Details

Young Data Science Researcher Seminar Zurich

Title Random Tessellation Features and Forests
Speaker, Affiliation Eliza O'Reilly, California Institute of Technology
Date, Time 3 March 2022, 15:00-16:00
Location
Abstract The Mondrian process in machine learning is a recursive partition of space with random axis-aligned cuts used to build random forests and Laplace kernel approximations. The construction allows for efficient online algorithms, but the restriction to axis-aligned cuts does not capture dependencies between features. By viewing the Mondrian as a special case of the stable under iterated (STIT) process in stochastic geometry, we resolve open questions about the generalization of cut directions. We utilize the theory of stationary random tessellations to show that STIT processes approximate a large class of stationary kernels and achieve minimax rates for Lipschitz and C^2 functions. This work opens many new questions at the intersection of stochastic geometry and machine learning. Based on joint work with Ngoc Mai Tran.
Random Tessellation Features and Forestsread_more
10 March 2022
15:00-16:00
Tomas Vaškevičius
University of Oxford
Details

Young Data Science Researcher Seminar Zurich

Title Exponential-tail excess risk bounds without Bernstein condition
Speaker, Affiliation Tomas Vaškevičius, University of Oxford
Date, Time 10 March 2022, 15:00-16:00
Location
Abstract The local Rademacher complexity framework is one of the most successful toolboxes for establishing sharp excess risk bounds for statistical estimators based on empirical risk minimization. However, the applicability of this toolbox hinges on the so-called Bernstein condition, often limiting direct application domains to proper and convex problem settings. In this talk, we will show how to obtain exponential-tail local Rademacher complexity excess risk bounds under an alternative condition. This alternative condition, leading to a more recent notion of localization via offset Rademacher complexities, is known to hold for some estimators in non-convex and improper settings. We will discuss applications of this theory to model selection aggregation and iterative regularization problems.
Exponential-tail excess risk bounds without Bernstein conditionread_more
17 March 2022
15:00-16:00
Alden Green
Carnegie Mellon University
Details

Young Data Science Researcher Seminar Zurich

Title Statistical Theory for Nonparametric Regression with Graph Laplacians
Speaker, Affiliation Alden Green, Carnegie Mellon University
Date, Time 17 March 2022, 15:00-16:00
Location
Abstract Graph-based learning refers to a family of conceptually simple and scalable approaches, which can be applied across many tasks and domains. We study graph-based learning in a relatively classical setting: nonparametric regression with point-cloud data lying on a (possibly) low-dimensional data manifold. In this setting, many graph-based methods can be interpreted as discrete approximations of “continuous-time methods”---meaning methods defined with respect to continuous-time differential operators—that serve as some of the traditional workhorses for nonparametric regression. Motivated by this connection, we develop theoretical guarantees for a pair of graph-based methods, Laplacian eigenmaps and Laplacian smoothing, which show that they achieve optimal rates of convergence over Sobolev smoothness classes. Indeed, perhaps surprisingly, these results imply that graph-based methods actually have superior properties than are suggested by tying them to standard continuous-time tools.
Statistical Theory for Nonparametric Regression with Graph Laplaciansread_more
14 April 2022
16:00-17:00
Yiqun Chen
University of Washington
Details

Young Data Science Researcher Seminar Zurich

Title Selective inference for k-means clustering
Speaker, Affiliation Yiqun Chen, University of Washington
Date, Time 14 April 2022, 16:00-17:00
Location
Abstract We consider the problem of testing for a difference in means between clusters of observations identified via k-means clustering. In this setting, classical hypothesis tests lead to an inflated Type I error rate, because the clusters were obtained on the same data used for testing. To overcome this problem, we take a selective inference approach. We propose a finite-sample p-value that controls the selective Type I error for testing the difference in means between a pair of clusters obtained using k-means clustering, and show that it can be efficiently computed. We apply our proposal in simulation, and on hand-written digits data and single-cell RNA-sequencing data. This is joint work with Daniela Witten.
Selective inference for k-means clusteringread_more
28 April 2022
15:00-16:00
Anish Agarwal
UC Berkeley
Details

Young Data Science Researcher Seminar Zurich

Title Causal Inference for social and engineering systems
Speaker, Affiliation Anish Agarwal, UC Berkeley
Date, Time 28 April 2022, 15:00-16:00
Location
Abstract What will happen to Y if we do A? A variety of meaningful social and engineering questions can be formulated this way: What will happen to a patient’s health if they are given a new therapy? What will happen to a country’s economy if policy-makers legislate a new tax? What will happen to a data center’s latency if a new congestion control protocol is used? We explore how to answer such counterfactual questions using observational data---which is increasingly available due to digitization and pervasive sensors---and/or very limited experimental data. The two key challenges are: (i) counterfactual prediction in the presence of latent confounders; (ii) estimation with modern datasets which are high-dimensional, noisy, and sparse. The key framework we introduce is connecting causal inference with tensor completion. In particular, we represent the various potential outcomes (i.e., counterfactuals) of interest through an order-3 tensor. The key theoretical results presented are: (i) Formal identification results establishing under what missingness patterns, latent confounding, and structure on the tensor is recovery of unobserved potential outcomes possible. (ii) Introducing novel estimators to recover these unobserved potential outcomes and proving they are finite-sample consistent and asymptotically normal. The efficacy of our framework is shown on high-impact applications. These include working with: (i) TaurRx Therapeutics to identify patient sub-populations where their therapy was effective. (ii) Uber Technologies on evaluating the impact of driver engagement policies without running an A/B test. (iii) The Poverty Action Lab at MIT to make personalized policy recommendations to improve childhood immunization rates across villages in Haryana, India. Finally, we discuss connections between causal inference, tensor completion, and offline reinforcement learning. Anish Brief Bio: Anish is currently a postdoctoral fellow at the Simons Institute at UC Berkeley. He did his PhD at MIT in EECS where he was advised by Alberto Abadie, Munther Dahleh, and Devavrat Shah. His research focuses on designing and analyzing methods for causal machine learning, and applying it to critical problems in social and engineering systems. He currently serves as a technical consultant to TauRx Therapeutics and Uber Technologies on questions related to experiment design and causal inference. Prior to the PhD, he was a management consultant at Boston Consulting Group. He received his BSc and MSc at Caltech.
Causal Inference for social and engineering systemsread_more
12 May 2022
15:00-16:00
Arshak Minasyan
CREST-ENSAE, Paris
Details

Young Data Science Researcher Seminar Zurich

Title All-In-One Robust Estimator of the Gaussian Mean
Speaker, Affiliation Arshak Minasyan, CREST-ENSAE, Paris
Date, Time 12 May 2022, 15:00-16:00
Location
Abstract We propose a robust-to-outliers estimator of the mean of a multivariate Gaussian distribution that enjoys the following properties: polynomial computational complexity, high breakdown point, minimax rate optimality (up to logarithmic factor) and asymptotical efficiency. Non-asymptotic risk bound for the expected error of the proposed estimator is dimension-free and involves only the effective rank of the covariance matrix. Moreover, we show that the obtained results can be extended to sub-Gaussian distributions, as well as to the cases of unknown rate of contamination or unknown covariance matrix. Joint work with Arnak Dalalyan (https://arxiv.org/abs/2002.01432)
All-In-One Robust Estimator of the Gaussian Meanread_more
19 May 2022
15:00-16:00
Stefan Perko
University of Jena
Details

Young Data Science Researcher Seminar Zurich

Title Towards diffusion approximations for stochastic gradient descent without replacement
Speaker, Affiliation Stefan Perko, University of Jena
Date, Time 19 May 2022, 15:00-16:00
Location
Abstract Stochastic gradient descent without replacement or reshuffling (SGDo) is predominantly used to train machine learning models in practice. However, the mathematical theory of this algorithm remains underexplored compared to its "with replacement" and "infinite data" counterparts. We propose a stochastic, continuous-time approximation to SGDo based on a family of stochastic differential equations driven by a stochastic process we call epoched Brownian motion, which encapsulates the behavior of reusing the same data points in subsequent epochs. We investigate this diffusion approximation by considering an application of SGDo to linear regression. Explicit convergence results are derived for constant learning rates and a sequence of learning rates satisfying the Robbins-Monro conditions. Finally, the validity of continuous-time dynamics are further substantiated by numerical experiments.
Towards diffusion approximations for stochastic gradient descent without replacementread_more
2 June 2022
16:00-17:00
Pedro Abdalla
ETH Zurich
Details

Young Data Science Researcher Seminar Zurich

Title Covariance Estimation: Optimal Dimension-free Guarantees for Adversarial Corruption and Heavy Tails
Speaker, Affiliation Pedro Abdalla, ETH Zurich
Date, Time 2 June 2022, 16:00-17:00
Location HG G 19.2
Abstract In this talk we introduce a new estimator of the covariance matrix that achieves the optimal rate of convergence (up to constant factors) in the operator norm under two standard notions of data contamination: We allow the adversary to corrupt an η-fraction of the sample arbitrarily, while the distribution of the remaining data points only satisfies that the Lp-marginal moment with some p≥4 is equivalent to the corresponding L2-marginal moment. Despite requiring the existence of only a few moments, our estimator achieves the same tail estimates as if the underlying distribution were Gaussian. We also discuss a dimension-free Bai-Yin type theorem in the regime p>4.
Covariance Estimation: Optimal Dimension-free Guarantees for Adversarial Corruption and Heavy Tailsread_more
HG G 19.2
16 June 2022
15:00-16:00
Denny Wu
University of Toronto
Details

Young Data Science Researcher Seminar Zurich

Title High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation
Speaker, Affiliation Denny Wu, University of Toronto
Date, Time 16 June 2022, 15:00-16:00
Location HG G 19.1
Abstract We study the first gradient descent step on the first-layer weights W in a two-layer neural network, where the parameters are randomly initialized, and the training objective is the empirical MSE loss. In the proportional asymptotic limit (where the training set size n, the number of input features d, and the width of the neural network N all diverge at the same rate), and under an idealized student-teacher setting, we show that the first gradient update contains a rank-1 "spike", which results in an alignment between the first-layer weights and the linear component of the teacher model f*. To characterize the impact of this alignment, we compute the prediction risk of ridge regression on the conjugate kernel after one gradient step on W with learning rate \eta. We consider two scalings of the first step learning rate \eta. For small \eta, we establish a Gaussian equivalence property for the trained feature map, and prove that the learned kernel improves upon the initial random feature model, but cannot defeat the best linear model on the input. Whereas for sufficiently large \eta, we prove that for certain f^*, the same ridge estimator on trained features can go beyond this "linear regime" and outperform a wide range of (fixed) kernels. Our results demonstrate that even one gradient step can lead to a considerable advantage over random features, and highlight the role of learning rate scaling in the initial phase of training.
High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representationread_more
HG G 19.1
JavaScript has been disabled in your browser