Master's theses

2024

Student Title Advisor(s) Date
Nicolas Kolly Techniques for the Analysis of Nonindependent Dyadic Data Dr. Lukas Meier Oct-2024
Abstract: Dyadic data analysis is concerned with the study of dyads, consisting of two nonindependent dyad members. Classical statistical techniques often assume independence in the individuals which is violated for dyadic data and needs to be addressed appropriately. Different statistical techniques can be applied to not only get around the problem of nonindependence, but also estimate specific effects of the unique dyad interactions.

In this thesis, multilevel modeling (MLM) and structural equation modeling (SEM) are introduced as the main techniques to analyse the dyadic data. Both MLM and SEM are used to illustrate the estimation of dyad effects and correlations in various examples of dyadic data. Further The actor-partner interdependence model (APIM) is introduced as a model for the interaction and influence of dyad variables. Then social relations models (SRM) are discussed for individuals paired in multiple dyads. Lastly, the combination of dyadic nonindependence with over-time data is discussed to estimate trends and influences in over-time dyadic data.
Gaspar Dugac Inference In Instrumental Variable Models Under Model Violations Prof. Dr. Jonas Peters Oct-2024
Abstract: Instrumental variable (IV) estimation is a widely used method for estimating causal effects
in the presence of unobserved confounders between an endogenous treatment variable and some outcome. The method relies on several assumptions. A valid IV should have an effect on the endogenous treatment variable (relevance), it should not have a direct effect on the outcome (exclusion restriction), and it should not be related to the unobserved confounder (unconfoundedness). Provided the instrument satisfies these assumptions, one can estimate the causal effect through Two-Stage Least Squares. Given the importance of
these assumptions, we require tests for verifying instrument validity. Tests of overidentifying
restrictions, such as the Sargan test, are widely used to verify instrument validity. However, the validity of the overidentifying restrictions is not sufficient to ensure the identification of the parameter of interest. The contribution of this thesis is twofold. Firstly, we present a novel method for detecting violations of the IV assumptions. The method
does not directly rely on the overidentifying restrictions, and hence does not suffer from the same pitfalls. We introduce an environment variable to the classical IV setting, and, conditional on the value of the environment variable, we construct Anderson-Rubin confidence intervals. We show that by observing the intersection of these confidence intervals, one can gain an insight into whether model assumptions are violated. Secondly, we develop
a novel procedure for estimating causal effects in the setting where we have a candidate set of instruments, with an unknown subset invalid instruments. The procedure generates estimated sets of valid instruments by observing the intersections of the per-instrument
confidence intervals. The resulting confidence interval is obtained by taking the union over the confidence intervals generated by treating the estimated sets of instruments as
valid. We show the procedure provides coverage guarantees, even in weak instrument settings. Unlike existing methods, which rely on either the majority or the plurality rule
holding, we show our method generates intervals at an honest coverage level, irrespective of the number of valid instruments or instrument strength. Furthermore, we show that,
asymptotically, under the majority rule, the procedure estimates only one set of valid instruments, which is asymptotically equal to the true set of valid instruments.
Antonia Schumacher Benchmarking Deep Generative Causal Inference Algorithms on Real Physical Systems Prof. Dr. Peter Bühlmann
Juan L. Gamella
Oct-2024
Abstract: Recent advances in machine learning have given way to novel deep generative approaches for modelling structural causal models. In addition, the causal chambers introduced by Gamella et al. (2024) provide real, experimental datasets with known causal struc- tures. Using this dataset, we compare causal normalizing flows (Causal-NF) Javaloy, S ́anchez-Mart ́ın, and Valera (2024), diffusion causal model (DCM) Chao, Blo ̈baum, and Kasiviswanathan (2023) and additive noise models to obtain valuable insights into model behaviour on real data. We give an overview of current approaches to modeling structural causal models and benchmarking. The theory behind the model as well as sampling and interventional approaches are detailed. The methods are evaluated based on the max- imum mean discrepancy Gretton, Borgwardt, Rasch, Scho ̈lkopf, and Smola (2012), the two-sample test Smirnov (1948) and visually and numerically using quantile based inter- vals, highest probability density intervals and Q-Q plots. Finally we present results on 109 interventions and examine noteworthy case-studies further. Overall the DCM holds the most promise while still leaving room for improvement. We end with highlighting limitations of this analysis and presenting ideas for future work.
Valentin Roth Sampling from the Posterior of Sparse PCA Prof. Dr. Yuansi Chen Oct-2024
Abstract: Principal Component Analysis (PCA) identifies the main direction of variation in data from its covariance matrix. Assuming a covariance matrix consisting of a Gaussian Orthogonal Ensemble spiked with a signal from the boolean hypercube $\{\pm1\}^n$, signal recovery is possible for signal to noise ratios $\lambda > \lambda_{stat} := 1$. Going beyond this statistical threshold requires structural assumptions on the signal. Imposing sparsity of the signal in form of only $k$ non-zero entries lowers $\lambda_{stat}$ for recovery but introduces a computational threshold $\lambda_{comp} > \lambda_{stat}$. Specialized algorithms can recover the signal in sub-exponential time in the hard regime where $\sqrt{k/n} \ll \lambda \ll \min\{1, k/\sqrt n\}$ and polynomial time in the easy regime when $\lambda \gg \min\{1,k/\sqrt n\}$. Taking a Bayesian perspective instead, the more general Glauber dynamics also known as single-site Gibbs sampler can be used to sample solutions from the posterior distribution of sparse PCA. Here, we investigate the algorithm's mixing time. For this, we relate the posterior distribution to well-studied Ising models from statistical physics. We show that despite the similar structure of the models, existing mixing time results for Ising models with bounded interaction matrices are not meaningful as they fail to exploit the structure introduced by the signal. To investigate this failure, we restrict the sparse PCA posterior to the true support - the subspace of size $k$ that contains the true signal, employ Bauerschmidt and Bodineau (2019)’s measure decomposition to decompose the restricted posterior and show that the resulting mixing measure is a mixture of $2^k$ Gaussians. We then provide lower bounds on the weight ratios of the Gaussians. These suggest that a Gaussian corresponding to the signal dominates the others for $\lambda \gg \sqrt{k/n}$. Based on this, we give a sketch for a covariance bound wherein an unfinished approximation argument based on the similarity of the mixing measure to the dominating Gaussian has to be part of future work. Successful execution of such a covariance bound and extension to arbitrary supports would conjecturally result in polynomial time mixing of Glauber dynamics on the sparse PCA posterior in the easy regime via the framework of spectral independence.
Manuel Rytz Towards Reproducibility and Transparency of QSARs: Comparison of Applicability Domain Approaches Dr. Markus Kalisch
Dr. Christoph Schür
Lilian Gasser
Oct-2024
Abstract: Quantitative Strucuture-Activity Relationships (QSAR) are commonly used in the field of cheminformatics to predict a wide variety of endpoints. They use a set of descriptors of chemical compounds to predict a target of interest. As with any predictive model, quality of predictions on new data hinges on the available training data. The applicability domain (AD) restricts the set of allowed testing data points. This is done based on available training data. Ideally, this results in more reliable and trustworthy predictions. In this thesis, we discuss the TARDIS (Transparency, Applicability, Reliability, Decidability) principle, an approach to categorizing AD methods and guiding their implementation. We review and discuss a set of methods and apply them to a recently curated dataset (ADORE) which is used to model acute mortality in aquatic species. We monitor the predictive performance on the full test data, and on the restricted test data after the AD method is applied. We aimed to see increased model performance after restriction. Overall, applying AD methods had mixed results on the predictive accuracy. Results often showed inconsistency across different descriptor sets making it hard to recommend a specific approach. Nonetheless, we deem it crucial for investigators to explore the AD of their model and, ideally, experiment with multiple methods. As an additional finding, we detected a group of stereoisomers that might not be ideally distributed into training and test splits in a part of the ADORE dataset. This issue should be further discussed by experts.
Winnie Chan Robust Feedback Optimization via Regularization Prof. Dr. Yuansi Chen
Prof. Dr. Florian Dörfler
Sep-2024
Abstract: The emerging control paradigm of feedback optimization (FO) drives a plant to its optimal steady state without detailed model information or offline numerical computations. However, since FO typically uses input-output sensitivities for its gradient-based updates, incorrect values can destabilize input and state evolutions, leading to suboptimality and potential system failure. To address the challenges posed by inaccurate sensitivities, we present robust FO, which takes into account sensitivity uncertainties during the iterative adjustment of control inputs. Consequently, we establish a general framework for robust FO in linear systems and introduce computationally tractable reformulations of the resulting min-max problem for two uncertainty sets. Functionally, these proposed robust objectives are equivalent to ridge and Lasso-regularized least squares problems. We then propose controller designs that employ projected (proximal) gradient descent with regularized update steps and penalty functions, providing a computationally efficient way to conduct robust FO under input and output constraints in time-varying environments.

Our characterization of robust FO's performance provides coupled optimality and stability guarantees dependent on the size of the regularization parameter and the optimal solution's gradient and variation, i.e., path length. For time-varying objectives caused by changing disturbances, we also show our robust algorithms can achieve dynamic regret on the order of the path lengths. Additionally, we perform a comparative empirical analysis between robust FO and its standard version as well as a sensitivity analysis of the regularization parameter. Our numerical simulations illustrate the superior optimality of robust FO under model mismatch, especially in high-dimensional settings.

Finally, we implement our scheme on a balanced three-phase backbone of the IEEE 123-bus standard for up to 50 control inputs. For various power system operating regimes, including topology change, we demonstrate that robust FO can successfully provide reasonable curtailment and reactive power control, even when standard FO fails.
Michael Etienne Van Huffel The Shape of Words: Exploring Diachronic Semantic Change Detection via High Dimensional Topological Data Analysis Dr. Markus Kalisch
Prof. Dr. O. Bobrowski Dr. H. Dubossarsky Dr. A. Monod
Sep-2024
Abstract: Diachronic semantic change, i.e., the phenomenon by which words in human language shift meaning over time, has become an area of significant interest in recent years for
researchers across a broad range of fields, from linguistics to natural language processing (NLP). With the rise of modern NLP architectures, contextualized word embeddings have
emerged as the most promising approach for tackling the task of semantic change detection (identifying whether a word’s meaning evolves over time) due to their ability to capture different word meanings based on surrounding context, in contrast to traditional static embeddings, which produce global and fixed word representations that ignore context. By capturing meaning at specific time points and comparing high-dimensional contextualized representations, we can track changes in word meanings over time. In this work, we introduce a novel framework for detecting and quantifying semantic change by leveraging Topological Data Analysis (TDA), a powerful methodology rooted
in algebraic topology that has recently been successfully applied across a wide range of fields. Specifically, we propose three different topology-based algorithms, as well as an ensemble algorithm, to detect semantic change. We conduct a large-scale analysis using three types of contextualized neural architectures: BERT, ELMo, and XLM-R, and utilize four diachronic corpora spanning English, German, Swedish, and Latin. We evaluate our algorithms on a competitive semantic change quantification task and demonstrate that our approach achieves performance comparable to, or even superior in the case of the English
corpora, to state-of-the-art methods using only pre-trained, non-fine-tuned embeddings. The validity of our findings is further discussed through multiple hypothesis testing. Using our topology-based semantic change analysis framework, we also provide strong statistical evidence in favor of the conjecture of the universal null hypothesis for persistence
diagrams—a groundbreaking conjecture in the literature, which posits that persistence diagrams arising from both random point-cloud data and real-world data follow a universal
probability law. Finally, we discuss the broader applicability of our algorithms, showcasing an example of how they can be extended to the more general task of detecting semantic
similarity in meaning representations in language models, laying the foundation for future analysis.
Nicola Stella Using anchor regression for out-of-distribution generalization of widely used risk prediction models Prof. Dr. Peter Bühlmann
Dr. Olga Demler
Dr. Lucas Kook
Sep-2024
Abstract: Current cardiovascular treatment guidelines across the world rely on risk-based models that utilize patients’ age, biomarkers, and lifestyle information to estimate the likelihood of developing cardiovascular diseases (CVD) in the future. These risk estimates significantly influence clinical therapy decisions, underscoring the importance of models that perform robustly across diverse populations.

A critical challenge in this context is ensuring that these models generalize effectively to data whose distribution is shifted, such as external patient populations.

Distributional anchor regression (DARe) is a novel technique explicitly designed to improve out-of-distribution (OOD) generalization, with its empirical properties well-established in recent literature. This thesis investigates the application of distributional anchor regression as a more robust tool to enhance the OOD generalization of SCORE2, a widely-used CVD risk prediction model in Europe
Shajivan Satkurunathan Classical and Bayesian Approaches for Population Estimation in Capture-Recapture: A Simulation Study Dr. Markus Kalisch Sep-2024
Abstract: This thesis investigates the performance of classical and Bayesian capture-recapture estimators in estimating population sizes. An extensive simulation study evaluates how these estimators handle heterogeneity arising from categorical covariates, focusing particularly on scenarios where such heterogeneity is completely or partially unobserved. Comparative analysis includes traditional methods like the Chapman estimator alongside Bayesian models. The results reveal that Bayesian approaches are superior in managing unobserved heterogeneity, whereas stratified classical estimators perform robustly when heterogeneity is accounted for. These findings provide crucial guidance for selecting appropriate estimation techniques based on the presence and nature of covariate-induced heterogeneity.
Yves Görgen Comparing fusion techniques for Sentinel-2 and Sentinel-1 in deforestation change detection Prof. Dr. Nicolai Meinshausen Sep-2024
Abstract: Deforestation significantly threatens biodiversity and contributes heavily to global CO2 emissions. Efforts to combat deforestation are increasing, with the EU taking a pivotal step by introducing the "EU Deforestation Regulation" (EUDR). EUDR prohibits the sale of seven commodities in the EU market if they originate from recently deforested land. A primary challenge in enforcing this regulation is accurately monitoring deforestation changes. This thesis addresses this challenge by exploring deep learning architectures to improve deforestation change detection through multimodal data fusion of Sentinel-2 (optical) and Sentinel-1 (radar) imagery. First, we contribute a novel, publicly available dataset incorporating Sentinel-1 and Sentinel-2 imagery for deforestation change detection. Our experiments show that fusing these data sources improves performance. Surprisingly, performance increases under cloudy conditions and on cloud-free images over solely Sentinel-2-based models. This finding suggests that Sentinel-1 provides valuable complementary information even in cloud-free scenarios. Further, we compare the effectiveness of different change detection architectures. We show that multi-stream models with four encoders, which separate weight sharing between optical and radar data, outperform simpler architectures.
Marco Froelich Investigating year-to-year variability of hot extremes and contributions from heat-generating mechanisms Prof. Dr. Nicolai Meinshausen
Dr. Matthias Rothlisberger
Sep-2024
Abstract: Heat-related extremes are important meteorological phenomena that can have strong consequences on human health and the environment. Climate change is expected to exacerbate these impacts through an increase in the frequency of hot extreme occurrence and intensity. Although there exists abundant literature on the typical physical functioning of these events and their association to the variability of the climate system on different temporal scales, there lacks a global assessment of the influence of major physical processes - heat advection, adiabatic compression and diabatic heating - on the yearly variation of hot extreme magnitudes. To remediate this knowledge gap, we first propose a data-driven, systematic analysis of second-moment characteristics of yearly maxima near-surface hot extreme events and contributing heat-generating processes. Second, we apply deep-learning methods to model hot extreme Lagrangian trajectories to gain insights into important dynamical features. No physical process is globally found to dominate variability in these events and significant variance contributions exist from at least two processes, suggesting that mean-state understanding of hot extreme development may not not be sufficient to explain large year-to-year differences in their magnitudes. Furthermore, this analysis reaffirms the presence of strong dependencies between the three physical mechanisms leading to a characterization of their variability by only one or two degrees of freedom in most of the world. Finally, the approach for the analysis of parcel trajectories was limited due to generally poor predictive performance, but showed that the patterns in advective, adiabatic and diabatic temperature anomaly generation follow patterns that may be predicted from their history, encouraging for future work. In addition, over oceans and many land regions we observe that adiabatic heating is minimal during the final 24h, suggesting that hot extreme primarily descend to the surface earlier than a day before, thus leading to contributions from advective and diabatic processes more likely.
Ioana Iacobici Hyperparameter Tuning for Gradient Boosting: Insights into Performance and Parameter Impact Prof. Dr. Fabio Sigrist Sep-2024
Abstract: Gradient boosting algorithms, such as XGBoost and LightGBM, continue to be the superior choice for tabular data in industrial applications, often outperforming deep learning in terms of efficiency and accuracy. Despite their popularity, hyperparameter tuning for gradient boosting models remains mostly underexplored, with no clear consensus on which parameters most significantly impact performance. This work addresses this gap by comparing the performance of four popular hyperparameter tuning methods on a curated selection of tabular datasets, and by analyzing the effects of different parameter values on model performance across both regression and classification tasks. Our contributions are threefold: (1) a comprehensive comparison of tuning strategies, highlighting their effectiveness in various settings; (2) the creation of a new dataset documenting hyperparameter combinations and performance scores; and (3) a novel analysis of hyperparameter importance using random effects modeling. Our experiments reveal that the Tree-structured Parzen Estimator consistently outperforms other hyperparameter optimization methods, particularly in regression tasks. Moreover, we uncover the most impactful hyperparameters for regression, as well as classification tasks. These findings provide valuable insights into hyperparameter optimization for gradient boosting, guiding future applications toward more precise and effective models.
Herwig Höhenberger Distributional conformal prediction Prof. Dr. Johanna Ziegel
Georgios Gavrilopoulos
Sep-2024
Abstract: How certain are you in your prediction?
Conformal prediction is a general method for producing, from exchangeable data, set-valued predictions that are valid:
A 90 % prediction region contains the true label 90 % of the time.

Based on Chernozhukov, Wüthrich et al. (2021), in this thesis, we present distributional conformal prediction (DCP), a special case kind of conformal prediction.
DCP uses the probability integral transform, bases its conformity measure on the data ranks and only needs a method for estimating conditional cumulative distribution functions (CDFs) to be implemented.
Compared to classical conformal prediction, DCP produces prediction regions that are asymptotically valid also for nonexchangeable data and under model misspecification, and DCP produces prediction regions that are asymptotically conditionally valid under correct model specification.

The main contributions of this thesis are a simulation study of the aforementioned asymptotics and a detailed description of the design and implementation of the study.
We applied DCP in conjunction with multiple CDF-estimating methods to synthetic data and evaluated the coverage and the size of the prediction regions of the procedures.
We also used DCP for the statistical post-processing of a numerical-weather-prediction ensemble for predicting overall precipitation within a 24 h period.
Filippo Rambelli An accuracy-runtime trade-off comparison of large-data Gaussian process approximations for spatial data Dr. Fabio Sigrist Sep-2024
Abstract: Gaussian process regression (GPR) is a flexible, closed-form, probabilistic model widely
employed across various fields, such as machine learning, time series analysis, and spatial statistics. The attractive properties of Gaussian processes are tempered by their
computational cost, requiring O(N3) operations and O(N2) memory for evaluating the joint density, making them prohibitive for large spatial datasets. Numerous approximation
techniques have been proposed over the past decades to address this limitation. In this work, we systematically compare the performances of eight Gaussian process approximations, evaluating their trade-offs between accuracy and runtime. The comparison encompasses multiple simulated datasets and four large real-world datasets, with assessments based on parameter estimation, likelihood evaluation, predictive diagnostics, and distance to exact calculations. No single approximation emerged as universally superior, but Vecchia’s approximation (Vecchia, 1988) proved to be the most versatile, delivering stable results that were persistently ranked among the best. Nevertheless, other approximations demonstrated greater accuracy for specific applications.
Christophe Muller Multi-agent RL for optimal maintenance of graph-based railway network Prof. Dr. Nicolai Meinshausen
Prof. Dr. Eleni Chatzi
Gregory Duthé and Giacomo Arcieri
Sep-2024
Abstract: All engineering infrastructures are subject to deterioration over time, necessitating planned maintenance actions to ensure their safety and efficiency. Railway tracks are prime examples of such infrastructures. In this work, we focus on the most critical maintenance actions for railway tracks: tamping, track renewal and re-tamping, along with the option of taking no maintenance action. Using real-world data provided by the Swiss Federal Railways, we first infer a hierarchical Bayesian model that captures both the deterioration process and the effects of various maintenance actions. Then, we implement methods to optimize decision-making within this learned environment.

This thesis builds upon the work from Aricieri et al. (2023, 2024) which followed the same two-step process on related data. Our contribution extends this research by considering the railway tracks network as a connected graph rather than treating segments in isolation. For the Bayesian inference, this implies modelling the covariance between neighbouring track segments, reflecting the spatial dependencies in their deterioration and maintenance processes. In the optimization phase, this approach has multiple implications. First, our cost function accounts for the economies of scale when maintaining adjacent segments. Second, we frame the optimization problem as a Cooperative Markov Game, where each track segment is treated as an agent tasked with selecting the appropriate maintenance action at each time step. To address this complex problem, we implemented and tested several Multi-Agent Reinforcement Learning techniques, with particular emphasis on methods that integrate the graph structure of the track network using Graph Neural Networks and Graph Transformers.

The results show that advanced MARL algorithms did not consistently outperform their non-graph-based counterparts. The Centralized Training with Centralized Execution paradigm did not offer significant advantages over simpler decentralized methods or naive techniques. Furthermore, the use of graph-aware deep neural network modules did not demonstrate clear added value. These findings may be attributed to high variance in the data, which led to significant variability in environment transition dynamics, complicating agents' ability to effectively coordinate and anticipate costs associated with track conditions.
Maria Eugenia Gil Pallares Modeling Network Dynamics from Survey Data Prof. Dr. Jonas Peters
Prof. Dr. Christoph Stadtfeld
Sep-2024
Abstract: Dynamic Network Actor Model (DyNAM) is a statistical model for the analysis of the evolution of social networks. Given a sequence of events occurring in a network (e.g.,
emails between ETH Z¨urich members), DyNAMs can be estimated and explain the social mechanisms that drive the dynamism of the network. However, sequences of relational event data are often incomplete or unknown, preventing the use of DyNAMs. Data collected in surveys or aggregated data from social media are examples of cases where the order of events cannot be recovered and missing data techniques are required. We present an application of the Expectation Maximization algorithm to the estimation of parameters of DyNAMs with an unknown sequence of events. The algorithm is rooted in the Stochastic Approximation Expectation Maximization literature, where EM is combined with MCMC to help explore the sampling space and approximate the Expectation step with random samples of plausible event sequences. The model partly replicates the
Stochastic Actor-Oriented Model (SAOM) for longitudinal panel data. The introduction of Parallel Tempering as a means of accelerating the sampling phase creates a distinction between the sampling mechanisms used in our model and SAOMs. In this Master Thesis, we present the overall structure of the novel model, followed by a study of the sampling schemes proposed. Preliminary results are discussed, and future developments are suggested.
William Xu ModernRegressionTechniques Dr. Markus Kalisch Sep-2024
Abstract: This thesis revisits the framework of anchor regression, a novel approach that addresses the challenges of heterogeneity in causal inference. Originally proposed to mitigate bias in predictive modeling under confounded environments, anchor regression introduces anchor variables—exogenous covariates that can serve as a reference for measuring heterogeneity in causal effects. This study critically evaluates the methodological underpinnings of anchor regression, exploring its theoretical implications and practical applications in diverse fields such as economics, social sciences, and medicine.

The core of this thesis lies in reevaluating the assumptions behind the use of anchor variables and their impact on the robustness of causal estimates. By conducting simulations and applying anchor regression to real-world datasets, the thesis assesses the method's performance relative to traditional regression techniques, particularly in scenarios characterized by significant heterogeneity and confounding.
Hendrik Plett Estimating Edge Retrieval Probabilities to Benchmark Causal Discovery Algorithms on Data from Real Physical Systems Prof. Dr. Peter Bühlmann
Juan L. Gamella
Sep-2024
Abstract: Due to a lack of real-world datasets with a known ground truth DAG, benchmarking causal discovery algorithms usually involves simulating datasets from simulated DAGs - and then averaging a univariate score (such as the SHD or SID) over the causal structures an algorithm fits to these datasets. This approach has two drawbacks: First, it is questionable how well such results transfer to real-world datasets (\cite{reisach2021bewaresimulateddagcausal}). Secondly, reducing the estimated DAG to a univariate score overlooks important details about algorithmic performance.

Therefore, in this thesis, we use newly published \textit{real} data from the Light Tunnel - a real physical system with a known causal ground truth (\cite{gamella2024causalchambersrealphysical}). In addition, we do not deploy univariate scores but analyze algorithmic performance on the edge level. To avoid overinterpreting single edges that might vanish when resampling the data, we introduce the \texttt{causalbenchmark} package. It treats the causal structure an algorithm fits to a dataset as a random binary matrix - and estimates its first moment. This corresponds to estimating the probability that an algorithm returns a particular edge - for all possible edges. Moreover, \texttt{causalbenchmark} provides functionality to visualize the resulting Edge Probability Matrix ($EPM$), and to visualize how this $EPM$ changes when we alter an algorithm's data or hyperparameter input.

Using this methodology, we analyze and compare different causal discovery algorithms, namely PC, GES, GIES, GnIES, NoTEARS, Golem and ICP. Overall, we find GES to have the best and most consistent performance by a considerable margin. It reliably retrieves the edges it should retrieve given its theoretical guarantees. When GES detects false edges, this can be attributed to assumption violations by the Light Tunnel data.

All related code is publicly available under \texttt{https://github.com/HendrikPlett/MA}.
Mihaela Lachezarova Demireva Machine Learning Model for predicting race using traffic stops data. Role of vehicle characteristics in decision-making. Evidence from New Orleans, Louisiana. Dr. Lukas Meier
Prof. Dr. Elliott Ash
Aug-2024
Abstract: This paper observes the racial profiling in traffic stops following the approach of (Grogger and Ridgeway, 2006) of defining darkness and daylight, but also taking into consideration the effect of the vehicle characteristics. Using data from New Orleans, Louisiana and Wisconsin I trained two separate models and predicted the probability of being Black using the vehicle characteristics. I analyzed stops in a 2-hour interval in New Orleans, Louisiana and was able to show that in darkness police officers use the vehicle characteristics as a
source of information to decide whom to stop. The higher the probability of the driver’s race being Black based on the vehicle characteristics, the more likely is that police officers will stop a driver. Furthermore, I could show that police officers have a consistent decision-making process as they adjust their initial assessment with their decision whom to search.
Police officers let off White drivers during darkness and do not search them. For Blacks this is not the case.
Ignacio Gonzalez Perez Causality for infinite time series graphs Prof. Dr. Jonas Peters Aug-2024
Abstract: Causal reasoning has gained great attention over the last half century as it allows (or at least intends) to answer questions which go above those within the capabilities of classical inferential statistics using just observational data. So far, causal research has been focused mostly on the i.i.d. setting. However, many are the situations where there exists a non-trivial dependence structure between sequential observations. Motivated by this fact, the main purpose of this work is to study causal properties of time series, mainly under the structural assumption of a VARMA model with instantaneous effects. First, the global Markov property is studied, building on existing work for VAR processes without instantaneous effects. Infinite graphs which represent the dependencies of the process are defined so that separation statements translate to conditional independencies in the stationary distribution of the process. Second, faithfulness is examined as a counterpart of this Markov property. Conditions are given so that the stationary distribution of the process is almost surely faithful to said infinite graphs. In addition, an instrumental variable regression framework is developed for VARMA models with instantaneous effects. This allows to identify and consistently estimate total causal effects. Entering the realm of causal discovery, invariant causal prediction is adapted to VAR processes with instantaneous effects to estimate the contemporaneous parents of the target component of the process. Furthermore, abandoning the linearity and additive noise assumption, the global Markov property is also studied for CCC-VARCH processes. Lastly, a real-life data example with known causal ground truth is presented.
Roberto Desponds High-dimensional protein analysis for breast cancer Prof. Dr. Peter Bühlmann
Markus Ulmer
Aug-2024
Abstract: Breast cancer remains a significant health challenge, necessitating the development of more precise and effective treatment strategies. This study explores the use of protein expression profiles as predictors of drug response in breast cancer treatments. By leveraging mathematical models, including k-means clustering and neural networks, we aim to enhance the accuracy of treatment predictions. Our approach
involves analyzing protein expression data to identify patterns and correlations that can predict patient responses to various therapeutic drugs. Neural networks provide a
powerful framework for modeling complex relationships within the data. We discover clear relationships between protein expressions and their cancer type, suggesting the need to personalize cancer treatments, additionally we found encouraging results in our neural network construction for predicting treatments effectiveness. This study
underscores the potential of integrating protein expression data with mathematical models to advance precision medicine in oncology.
Michèle Wieland Aligniverse: Building a Platform to Create Multilingual Alignment Datasets for Gender Discrimination Prof. Dr. Orestis Papakyriakopoulos
Prof. Dr. Nicolai Meinshausen
Aug-2024
Abstract: Large Language Models (LLMs) can generate texts similar to human writing. The datasets used
for training are immense but can contain harmful biases, with gender bias being prevalent. This thesis
presents the Aligniverse project, which aims to create multilingual alignment datasets specifically designed to mitigate gender discrimination in LLMs. Prior efforts often focus on broad improvements of
LLMs, not specifically tailored to address gender discrimination or rely on automated approaches. In contrast, Aligniverse collects human feedback to integrate diverse and comprehensive perspectives into
the dataset. The project focuses on five key alignment dimensions: ”Stereotypical Gender Bias”, ”Toxicity”, ”Emotional Awareness”, ”Sensitivity and Openness”, and ”Helpfulness”. In order to collect human feedback, we build an online platform to host our survey. This survey tasks participants with rating, editing, and crafting answers to gender-focused prompts. Those answers are generated by unaligned
LLMs in both English and German. We perform a pilot study to verify the practicality of the survey and the quality of the collected feedback. In the end, the collected data is transformed into an alignment dataset. This transformation allows the alignment dataset to be used by commonly used methods like
Reinforcement Learning from Human Feedback (RLHF), Direct Preference Optimization (DPO), and Unified Language Model Alignment (ULMA). This work serves as a proof-of-concept for the broader Aligniverse initiative, which plans to gather extensive data and extend the methodology to cover other forms of discrimination.
Content warning.

Please be advised that this document contains content that might be disturbing or offensive to some individuals, including discriminatory, hateful, or violent elements.
Emanuel Nussli Identification of Causal Effects using Nuisance Instrumental Variables Prof. Dr. Jonas Peters Aug-2024
Abstract: Understanding causal relationships in complex systems is a fundamental challenge across scientific disciplines. Instrumental variables are a well-established tool for the identification of causal effects. We build upon the idea of nuisance instrumental variables (NIV) for linear causal effects. We prove that direct causal effects are unidentifiable through improper NIV in non-trivial settings. We propose a remedy by showing that direct causal
effects are identifiable by proper NIV. Additionally, we introduce a relaxation of NIV, which enables the identification of causal effects that are unidentifiable by non-relaxed
NIV. Furthermore, relaxed NIV enables the identification of non-causal components of interest, such as the strength of confounding between two variables. To improve the applicability of the theoretical advancements, we develop an algorithm that generates all valid NIV identification strategies for a given causal effect. We propose a pragmatic method to rank valid identification strategies based on the estimated asymptotic variance by simulating
linear structural causal models compatible with the underlying graph. We further introduce a new identification strategy called iterative NIV. Iterative NIV addresses dilemmatic graphical structures that hinder identification through existing methods. To resolve the problematic structures, we identify the total causal effect of the source-node set of such structures on the response and project the response accordingly. The causal effect of interest is then identified by an NIV strategy relative to the adjusted graph. Iterative NIV integrates covariate adjustment, standard instrumental variables, conditional instrumental variables, and nuisance instrumental variables into a unified framework. We demonstrate that the iterative NIV estimator is consistent and asymptotically normally distributed,
allowing for valid statistical inference.
Mathieu Hoff Kernelised Conditional Independence Testing Prof. Dr. Peter Bühlmann
Cyrill Scheidegger
Aug-2024
Abstract: We present a comparative study of three methods for conditional independence testing: the General Covariance Measure (GCM), the Weighted General Covariance Measure (WGCM), and the Kernelised General Covariance Measure (KGCM). To examine the null hypothesis of X and Y being independent given Z, we employ test statistics that vary in form due to different weighting schemes, following the same underlying logic. Specifically, these statistics take the shape of the sample covariance between the residuals obtained from nonlinear regressions of X and Y on Z. Our investigation will rely on both theoretical analysis and experimental evaluation, aimed at discerning the power of each method against alternative tests. Theoretical exploration involves investigating the mathematical foundations of each method, analyzing their theoretical principles and analytical properties. Additionally, we conduct comprehensive empirical evaluations using diverse datasets. By applying the GCM, WGCM, and KGCM to real-world datasets, we compare their effectiveness in identifying conditional independence patterns across various scenarios. Furthermore, we explore the impact of different bandwidth parameters on the KGCM’s performance.
Ernesto Sanchez Tejedor Diffusion Models for Potential Outcome and Treatment Effect Estimation in the Context of Organ Transplants Dr. Markus Kalisch
Dr. Manuel Schürch
Dr. Michael Krauthammer
Aug-2024
Abstract: We investigate the problem of estimating the effects of a counterfactual organ transplant,
which is of great importance in a clinical context. Estimation of these outcomes poses two
main challenges: 1) The presence of (strong) treatment assignment bias in the data, making the generalization to the counterfactual distribution difficult for classical ML models, and
2) The existence of strong correlations between the different outcomes and their evolution
over time for a single patient, which makes estimation of the marginal distribution of these
outcomes unsuccessful in capturing the overall well-being of a patient over time.
3) The amount of missing outcomes in patient-donor datasets, originating from non-compliance and dropout (attrition), which can introduce significant bias and complicate the estimation process. To address these challenges, we propose a novel method for learning these potential out-
comes over time. It leverages both the ability of diffusion models to learn complex multivariate distributions with correlation between its components and the advantageous properties of double/debiased ML to estimate treatment effects in the presence of nuisance parameters. In addition to that, we also propose a modification to a well-known CATE estimator, the T-Learner, to better predict treatment effects specifically in the context of organ transplants.
Sanelma Heinonen Transfer Learning with Graph Neural Networks for Traffic Forecasting Dr. Lukas Meier
Dr. Matteo Felder
Aug-2024
Abstract: The goal of this thesis is to test whether the transfer learning diffusion convolution recurrent
neural network (TL-DCRNN), originally developed by Mallick, Balaprakash, Rask, and Macfarlane (2021) to conduct transfer learning for traffic prediction within the California
highway network, also performs well in Swiss cities. The TL-DCRNN is a recurrent graph neural network which learns general rather than node specific traffic patterns within
a traffic graph. As such, it can be trained on one region and used to forecast traffic in another region in which we only have access to real time, but not historic, data. The work was conducted in cooperation Transcality, an ETH spin-off which specializes in real-time traffic forecasts and large-scale simulation for cities. As part of this thesis, we translate the open source TL-DCRNN model from Tensorflow 1 to PyTorch and train it on four months of traffic flow data from Zürich. Our PyTorch implementation is publicly available on GitHub at TL-DCRNN-pytorch. We test our model on traffic count data in Lucerne. In Lucerne, TL-DCRNN outperforms baselines which also do not utilize historic
training data. However, performance in Zurich does not outperform baselines, indicating that the subcluster approach to transfer learning by TL-DCRNN may be better suited for forecasting on highways than in cities.
Baptiste Lucas Estimating impacts of new car-sharing stations on induced demand Prof. Dr. Peter Bühlmann
Prof. Dr. Martin Raubal
Dr. Yanan Xin
Aug-2024
Abstract: Car-sharing, when combined with other forms of public transportation, has the potential to significantly improve overall mobility and reduce the number of privately owned cars. This sustainable solution can mitigate traffic congestion in cities by promoting the idea of renting a car only when needed. This thesis explores how car-sharing can be better integrated into daily life by leveraging data from the Mobility Cooperative, the largest car-sharing company in Switzerland. The study
aims to answer key questions about optimal station placement, the effect of new stations on existing demand, and user behavior in response to network changes. To address these questions, we analyzed data spanning from January 1, 2016, to December 31, 2023, which includes information about stations, cars, users, and
historical booking records. We employed causal inference and machine learning techniques, particularly causal forests, to examine the causal influence of new car-sharing
stations on existing network demand. Our findings indicate that adding
a new station tends to shift demand rather than stimulate it. The user study revealed that the further a new station is from existing stations, the less likely it is to attract existing customers. Despite experiencing some instability in forests when examining individual behaviors, we obtained reliable heterogeneity analysis using decile groups and general SHAP trends. The study also provided insights into user loyalty, showing that new users constitute a small fraction of total demand at new stations, even after six months. This result supports the competitive behavior observed at nearby stations. The framework developed in this thesis shows promise but has limitations in precisely quantifying effects, suggesting it is more effective
when used on groups of nearby stations. Additionally, our framework allows for predicting their respective behavior when a new station is incorporated, but the
prediction is only reliable if no other new station has been recently added nearby. Future work should focus on analyzing how demand evolves at new stations and quantifying the overall impact on demand. Improvements in the causal framework,
increased sample sizes, and simulated bookings and user behaviors could improve the reliability of the findings. If successful station clustering can be achieved, the
CausalSE approach could be highly adaptable to car-sharing networks
JohannWenckstern Virtual Tissues: Self-Supervised Transformers for Multiplexed Images of Cancer Tissues Prof. Dr. Charlotte Bunne
Prof. Dr. Peter Bühlmann
Dr. Gabriele Gut
Jul-2024
Abstract: Multiplexed imaging technologies, such as imaging mass cytometry, enable the spatially resolved
measurement of proteins at single-cell resolution within tissue slices. The resulting multiplexed
image data offers a promising opportunity to advance our understanding of tissue structure and
function in both health and disease. In cancer biology, it enables molecular-level studies of the
tumor microenvironment, whose composition and spatial organization, as revealed by these methods,
have been shownto directly affect cancer progression, treatment response, and survival. However,
the complexity and high dimensionality of the data make its analysis and interpretation by
humans or traditional quantitative and computational methods challenging. Therefore, in this work, we propose TissueViT, a new transformer-based architecture designed to learn both locally resolved and globally aggregated representations of highly-multiplexed tissue
images. A key component of our architecture is a new sparse attention mechanism, which allows
for the efficient processing of high-dimensional multiplexed imaging data. TissueViT is trained
in a self-supervised manner using masked auto-encoding and contrastive losses on a collection of
four image mass cytometry datasets, covering three different cancer tissue types and the distributions
of 93 distinct marker proteins. Through a suite of experiments, we demonstrate that the representations produced by TissueViT enable high-quality reconstructions of partially and fully masked channels, strong inference results for cell phenotypes, retrieval of similar tissue niches using an optimal transport based metric, and the prediction of patient-level annotations.
Gustas Mikutavicius Comparison of Forecasting Models and Hierarchical Reconciliation Techniques on Sales Data Dr. Fabio Sigrist Jul-2024
Abstract: Hierarchical time series forecasting involves generating predictions for many time series which form a hierarchical structure, that is some time series are aggregates of other time series. Hierarchical time series usually appear in fields such as sales, tourism or supply chains. In the case of sales, time series can be grouped by dimensions like geographical location and product category. The focus of this master thesis is to compare various forecasting methods ranging from classical statistical techniques to deep learning methods on such type of sales data. In addition, a comparison is carried out among different hierarchical reconciliation techniques which ensure that the forecasts of different hierarchical levels add up. These include both post-prediction reconciliation techniques, that are separate from the base forecasting models, as well as End-to-End methods, that combine training and reconciliation into one step. What is more, one of the End-to-End reconciliation approaches
is implemented in a novel way, which is based on reconciling the parameters of the output Gaussian distribution during both training and inference.
Alexandre Carbonell Extrapolation-Aware Nonparametric Inference: Integrating Quantile Forests and Neural Networks in Real-World Data Analysis Prof. Dr. Peter Bühlmann Jul-2024
Abstract: This thesis explores the integration of QRF and QNN, applying it to various domains where real-world data is available, with a particular focus on extrapolation-aware nonparametric
inference. As most relationships between response variables and features exhibit complex, nonlinear dynamics that often challenge traditional predictive models, this work seeks to advance the application of nonparametric methods that can more robustly predict and quantify uncertainties outside the observed data range. Explicitly addressing extrapolation significantly enhances our ability to gauge and manage the uncertainties inherent in predictions made from such models. The research employs a novel approach, called
’Xtrapolation,’ introduced by Pfister and Bühlmann (2024), entitled ”Extrapolation-Aware Nonparametric Statistical Inference,” which is rigorously tested on both simulated
data and real-world datasets from different areas. The core of the analysis centers on the integration and comparison of QRF and QNN,
alongside their extensions through conformal prediction methods, to establish bounds on extrapolation when applying the Xtrapolation method to highlight their potential in
delivering accurate and reliable predictions. Metrics such as coverage probability and prediction interval width are used to evaluate performance, emphasizing the practical significance of using advanced machine learning techniques in nonparametric inference settings. Ultimately, this thesis demonstrates the efficacy of QRF and QNN in managing the challenges posed by extrapolation in real world scenarios, providing a comprehensive framework that could influence future applications in economics, finance, and healthcare, to mention just a few.
Yiting Nian Optimization methods for Gaussian process hyper-parameter estimation Dr. Fabio Sigrist Jun-2024
Abstract: Gaussian process (GP) is a flexible statistical and machine learning model that is widely used in applications involving time series or spatial data. To make predictions based on this model, several hyper-parameters in the covariance function have to be estimated. In this thesis, we perform a systematic comparison of several commonly used optimization methods, which based on gradients, secondary derivatives, or heuristic search strategy, for the estimation of Gaussian process hyper-parameters from the perspective of training speed, convergence performance, pros and cons. Approximation methods are used for the large spatial data to reduce the computation burden. Simulated data samples of different settings and distributions will be used to test the performance of the optimization methods, then theoretical and experimental comparison will be summarized.
Marco Maninetti A comparison of machine learning methods for extrapolation Dr. Fabio Sigrist May-2024
Abstract: The ability to predict out of the training domain is a desirable characteristic for machine learning models. Research has been conducted to develop methods specifically designed for extrapolation. We perform a comprehensive comparison of the extrapolation capabilities of different machine learning models on tabular data. We start with a review of literature in the field. We propose different ways to define an extrapolation set in multidimensional settings. Then, we compare different methods in terms of point prediction, and distributional prediction. Both regression and classification tasks are considered, with either only numerical covariates, or both numerical and categorical covariates. We show that tree based methods, and their distributional variants, have the best extrapolation performance.
Yinkai Li Ordinal regression: A review of methods Dr. Markus Kalisch May-2024
Abstract: Ordinal responses are an important type of observation. Special tools are needed to analyse such data. The idea of ordinal responses, and why native methods are not recommended are discussed. Basic models of ordinal regression are introduced, with simulations and an example. The limitations are visualised, and several solutions are provided. All tools are applied to a real-world dataset. The performance of models is compared.
Liule Yang Understanding Abstractive Text Summarizations Using Interactive Causal Explanations Dr. Lukas Meier
Dr. Furui Cheng
Prof. Dr. Mennatallah El-Assady
Apr-2024
Abstract: In this thesis, the main objective is to create an interactive system for understanding abstractive text summarization. To complete this objective, I developed a causality-powered explanation model to explain the fact-to-fact relationships between the input document and the corresponding summary. Then, I designed an explanation workflow with the explanation model, and built up the interactive system based on the explanation workflow. The developed interactive system would allow users to visually explore and understand the decision-making processes of text summarization models.
Oliver Skultety Learning Descendant by Testing for Invariance Prof. Dr. Jonas Peters Apr-2024
Abstract: Causal inference is increasingly crucial topic in the world of science. With many advances
in statistical techniques and easier access to appropriate datasets, there is now a rising number techniques for discovering causal relationships from data, even without controlled
experiments. In this thesis, we develop new methods for discovering causal relationships from datasets with unknown interventions. We provide new theoretical results concerning DAGs which are then utilized to justify the methods. We demonstrate how invariance
testing can be utilized in new ways for causal discovery. We test the performance and guarantees of the methods on simulated and real world data. We demonstrate that the
methods perform well in different settings.
Nicolas Noël Koch Reservoir Computers for Volatility Forecasting: An Online Bayesian Model Comparison Prof. Dr. Peter Bühlmann Mar-2024
Abstract: Reservoir computers constitute universal approximators of an extensive range of non-linear input-output systems, relying on untrained, often randomly generated features of the input signal and only tuning a simple output map. We integrate some popular variants thereof into discrete-time regime-switching jump-diffusion models for the volatility of the price of a financial asset. Thereby, we construct Markovian, finite-memory, as well as fully path-dependent variants of both deterministic and stochastic volatility models with universality guarantees. The models are estimated along with their Bayesian model evidence in an online manner using sequential Monte Carlo methods, allowing for a continuous model comparison. We examine the consistency of model evidence with the accuracy of their sequential point and probabilistic forecasts for several one-day-ahead market variables, namely prices, realized variance, and European call option payouts. Numerical experiments using simulated and real-world data show that even low-dimensional reservoir computers effectively learn classical dynamics and achieve competitive performance comparable to their classical counterparts with inductive biases.
Meiyi Long Linear Regression and Its Robustness under Model Violations Dr. Markus Kalisch Mar-2024
Abstract: This thesis explores the robustness of linear regression models when violating error distribution assumptions. Through comprehensive simulations, we assess how ordinary least squares (OLS) regression compares to alternative strategies, such as two-stage estimation and heteroscedasticity-consistent (HC) estimates, under conditions of non-normal error distributions, heteroscedasticity, and autocorrelation. These simulations evaluate the coverage rate and width of confidence intervals. Furthermore, we examine OLS's performance in the presence of extreme outliers and discuss the advantages of robust regression techniques, including M-estimators and high-efficiency, high-breakdown point estimators like MM estimators. Our findings provide insights into selecting appropriate regression methods under various violations of classical linear regression assumptions, emphasizing the importance of robust techniques in ensuring reliable statistical inference.
Vigdis Gunnarsdottir Exploring Maximum Mean Discrepancy Networks for Generative Modelling: Theory and Applications Prof. Dr. Nicolai Meinshausen Mar-2024
Abstract: Modern generative machine learning models have reshaped data generation, outlier detection, augmentation and innovation. The main objective is to generate new data from some underlying distribution given only independent samples. A breakthrough in generative modelling was the invention of the generative adversarial network (GAN) that simultaneously trains a generator network and a discriminator network against each other. The GAN has a lot of training problems which prompted the invention of various GAN variants. In this work, we explore one GAN variant, the maximum mean discrepancy (MMD) network. MMD is a metric that measures the disparity between two probability distributions. It has the important quality of an easy-to-calculate formula for the unbiased empirical estimate that only requires independent samples from both distributions as well as a characteristic kernel. In a MMD network, the empirical MMD replaces the discriminator from the GAN framework which simplifies the training process as only one network is trained instead of two. The generator learns to create data such that the generated sample distribution is close (or approximately equal) to the real data distribution with respect to the MMD. MMD networks can successfully learn to generate from low dimensional underlying data distributions. They can also successfully sample from conditional distributions i.e.\ given some desired characteristic the generated samples should have. For higher dimensional data, like image data, the MMD network benefits from learning lower dimensional features constructed by an autoencoder instead of learning directly from the training data.
Larissa Walser Applications and Extensions of Survival Models Dr. Lukas Meier Mar-2024
Abstract: This paper navigates through theoretical landscapes and real-world applications of survival analysis. Starting with the basics, we discuss concepts such as censoring, Kaplan-Meier estimates and hazard rates. Non-parametric models - including the log-rank test - allow comparing survival experiences while the Cox Proportional Hazards model introduces a more sophisticated semi-parametric perspective that however also comes along with assumptions and limitations.

After looking at the basics, we delve into parametric models and discuss distributions such as Weibull, exponential and log-logistic. We will also consider how to select the most appropriate model for the data using different strategies and approaches. Special features such as recurrent events and competing risks require adapted evaluation methods, while splines offer insights into capturing non-linear covariate relationships.

The theoretical understanding is applied to the analysis of real Covid-19 data from Ethiopia. Navigating the complexities of the pandemic, we identify patterns and risk factors and gain insights into survival experiences. Challenges and limitations remain, particularly in nuanced model selection, highlighting the need for ongoing research to refine methods such as neural networks and machine learning.
Jorge Fernando da Silva Gonçalves Enhancing TreeVAE: Advancements in Modeling Hierarchical Latent Spaces for Improved Variational Autoencoders Dr. Markus Kalisch
Prof. Dr. Julia Vogt
Mar-2024
Abstract: This thesis enhances the generative capabilities of TreeVAE, a VAE-based model that performs hierarchical clustering and learns a flexible tree-based posterior distribution over the latent variables. It introduces two new extensions to the TreeVAE model to mitigate the typical blurriness issue prevalent in VAE architectures. These new models are termed CNN-TreeVAE and Diffuse-TreeVAE. CNN-TreeVAE incorporates enhancements seen in other successful VAE-based generative models. Empirical evaluations are conducted to identify the key modifications yielding significant improvements in generative quality and clustering accuracy. Conversely, Diffuse-TreeVAE equips TreeVAE with a diffusion model through a generator-refiner framework. In this configuration, the diffusion model refines VAE-generated images post-hoc while maintaining the clustering performance. Leveraging the advanced image synthesis capabilities of diffusion models, this refinement process effectively addresses the blurriness artifacts, leading to notably improved generated images. The thesis demonstrates the effectiveness of both approaches in enhancing the quality of image reconstructions and newly generated images, evidenced by improved FID scores compared to the original implementation.
Jonathan Steffani On maximum likelihood estimation and some general identifiability conditions in unlinked regression models Prof. Dr. Fadoua Balabdaoui Mar-2024
Abstract: We consider the regression setting where we assume a linear relation between a response variable Y ∈ R and a covariate variable X ∈ Rd via a vector of coefficients β ∈ Rd. Unlike the usual linear regression setting where we have a link between each observation of the response variable and a ”corresponding” observation of the random covariate vector, in the unlinked setting we only have access to a set {Xi}mi=1 of observed covariates and a set {Yj}nj =1 of observed responses. In this setting, the true regression vector β is not
necessarily unique in which case we say the model is not identifiable. In this thesis, we come up with several sufficient conditions on the distribution of the covariate vector X under which the corresponding unlinked linear regression model is identifiable and admits a unique true regression vector. Moreover, we examine the performance of the deconvolution least squares estimator (DLSE) developed in Azadkia and Balabdaoui (2023) and compare
it to the maximum likelihood estimator (MLE) in several identifiable scenarios. We also
explore the asymptotic behaviour of the MLE in these scenarios from which we conjecture asymptotic normality of the MLE.
Joel Alder Unsupervised Anomaly Detection in images Dr. Lukas Meier
Prof. Dr. Helmut Grabner
Dr. Thomas Oskar Weinmann
Mar-2024
Abstract: Unsupervised anomaly detection in images deals with the identification of unusual or deviating patterns in images. This field of research is of great importance in various areas, especially in the medical field for the detection of abnormal tissue and other pathological findings. The main advantage of the unsupervised approach is that no manual labelling of the images is required, which is often a challenge in the real world. In addition, this approach enables the detection of new and previously unknown anomalies. In this thesis, unsupervised anomaly detection is investigated in two application areas. The first area focuses on the detection of abnormal images in webcam streams. For this, two reconstruction approaches, namely the Principal Component Analysis (PCA) and the encoder-decoder network, are compared in terms of their temporal evolution and their ability to reliably detect abnormal images in changing environments. In addition, the density representation of the autoencoder bottleneck representation will be analysed. The findings from this first application area should help to reliably recognise anomalies in the more demanding second application area. With the two reconstruction methods for the webcam images, it is found that the selection of a suitable training data set makes it possible to recognise abnormal images even at later points in time despite temporal developments. The second area focuses on unsupervised anomaly detection in X-ray baggage scans, which is a challenging research area dealing with the identification of dangerous objects at security checkpoints. In contrast to traditional methods based on manually annotated datasets, the unsupervised approach aims to detect anomalies without prior labelling. In this work, the effectiveness of an autoencoder as an unsupervised model for baggage threat detection is investigated. The research builds on the Unsupervised Anomaly Instance Segmentation Framework by Hassan et al. A baseline is created by applying this framework to the confidential X-ray dataset. Subsequently, the findings from the application to webcam images and other possible changes are used to improve the baseline. One focus is on the novel loss function proposed in the literature. This loss function consists of two parts. In the second part, in which a fixed backbone is used, a separate self-supervised backbone is trained. The combination of these elements enables an optimised reconstruction of anomalies. The results show that the developed unsupervised autoencoder achieves a 17.45 \% improvement over the baseline. The AUC-ROC values increase from 53.58 \% to 62.93 \%, which indicates a more reliable detection of hazardous objects. Nevertheless, the result achieved is not yet suitable for practical use and requires further research and optimisation.

2023

Student Title Advisor(s) Date
Dominique Crispin Paul Scaling Experiments in Self-Supervised Cross-Table Representation Learning Dr. Markus Kalisch
Dr. Max Schambach
Dr. Johannes Otterbach
Nov-2023
Abstract: This master thesis investigates the scaling potential of deep tabular representation learning models. To this end, we present a new Transformer-based architecture that we trained on a corpus of 76 datasets using a self-supervised masked cell recovery objective for missing value imputation. The architecture uses a shared transformer backbone and tablespecific tokenisers. We pretrain models of various sizes, with around 1 × 104 to 1 × 107 backbone parameters, using 135 million training tokens. We evaluate the scalability of our architecture in both single-table and cross-table pretraining scenarios by testing the
learnt representations using linear probing on a selected set of benchmark datasets and comparing the results with standard baselines
Mario Stepanik Dynamic causal modeling of the effect of psychedelic drugs on functional brain hierarchy Dr. Lukas Meier
Prof. Dr. Robin L. Carhart-Harris
Nov-2023
Abstract: Do psychedelic drugs restructure the functional hierarchy of the human brain? This question is at the center of major theoretical frameworks which attempt to bridge the neural and psychological level of psychedelic effects (e.g., Carhart-Harris and Friston, 2019). It is believed that psychedelics act by liberating information flow from (low-level) brain areas whose activity is usually constrained by (high-level) association networks under regular waking consciousness. One way of operationalizing this hypothesis is to ask how the causal influence of such low-level regions on high-level networks changes under psychedelics. This paper examines this question by applying dynamic causal modeling to resting-state fMRI data of subjects under the influence of LSD, psilocybin, or DMT. The present analyses yield a range of insights: first, it is found that DMT causes a remarkable inundation of the default mode network by the bilateral hippocampus, which may generalize to other psychedelics. Second, the analyses did not provide evidence for a well-known hypothesis stating that the thalamus mediates (\'93gates\'94) bottom-up information flow to cortical areas (Vollenweider and Geyer, 2001). Third, they provide ambiguous results regarding connectivity changes between low-level sensory areas and high-level association networks. Finally, a structured comparison between frequentist and Bayesian hypothesis testing pipelines is shown to be useful in dealing with the potentially problematic sensitivity of dynamic causal modeling. These analyses expand our understanding of the neural mechanisms of psychedelic action and help delineate the role of different statistical analysis pipelines for future research on alterations of brain hierarchy.
Nicolas Harrington Ruiz Introducing distributional predictions to Long Short-Term Memory models for rain-runoff predictions in hydrology using the energy score Prof. Nicolai Meinshausen
Dr. Lukas Gudmundsson
Oct-2023
Abstract: Modelling in Hydrology has been transformed through the introduction of machine learning methods, and particularly the use of Long Short-Term Memory (LSTM) Neural Networks
to predict rain-runoff. These LSTM models are able to adapt and train much more quickly and conveniently than those based on deterministic methods whilst still retaining some
physical interpretability through the examination of the hidden states of the network (Lees, Reece, Kratzert, Klotz, Gauch, De Bruijn, Kumar Sahu, Greve, Slater, and Dadson, 2022). These predictions, along with many of the top systems-based models such as
the PREcipitation–Runoff–EVApotranspiration HRU Model (PREVAH) model, have been deterministic in nature and thus lack the advantages of a quantification of uncertainty
(Viviroli, Zappa, Gurtz, and Weingartner, 2009). Here we seek to introduce a new method which can act as a generative model and provide samples and distributional information for the target variable of rain-runoff. Using the energy score proposed by Gneiting and
Raftery (2007) as the loss function in the LSTM, training the model with two ”draws”from the joint distribution of precipitation, temperature, and rain-runoff with a fixed number of noise columns will result in an LSTM which can provide an ensemble output which matches the distribution of the rain-runoff, avoiding the need to train on an ensemble of models. A proof-of-concept was applied to a single catchment with data taken from
the Rietholzbach Lysimeter, where experimentation is done to explore the relationship between the amount of noise and the ability to adequately learn the distribution. Then the model is trained on multiple catchments from the Swiss-wide data to create a model which learns in space and time and can create predictions at different locations. The relationship between the noise and ability to learn is also explored at the Swiss-wide level. The results indicate that the generative model using the energy loss produces adequate confidence intervals. The increasing the amount of noise allows the confidence intervals produced by the model to expand, thus allowing the user to calibrate the width of the interval through the dimensionality of the noise. The aggregate of the ensemble, in this
work the mean and median of the members, are compared to the PREVAH model to show that additional noise can compromise the aggregate’s performance on traditional
hydrological metrics such as the Kling-Gupta Efficiency (KGE) and the Nash-Sutcliffe Efficiency (NSE).
Fabio Molo Inferring natural selection from pool-sequencing allele frequency data – an applied statistical genetics project on drought tolerance in Brassica napus at FiBL Prof. Dr. Nicolai Mainshausen Oct-2023
Abstract: Understanding the genetic basis of plant traits is becoming increasingly important in plant breeding programmes. Researchers at the Research Institute of Organic Agriculture (FiBL) in Switzerland are conducting a long-term research project to further the understanding of the genetic basis of drought-tolerance in Brassica napus (rapeseed). The research project uses a revolve-and-resequence approach for 16 generations of two B. napus
populations, one cultivated in an organic, the other in a conventional agro-environment. This approach allows to study adaptation of the two populations to their environment
across generations.

DNA of two pools of individuals from each generation and population was sequenced using low read depth pooled whole-genome-resequencing, a low cost sequencing method suited for large-genome crops. This thesis is an applied statistical analysis of the resulting data within that project at FiBL. It aims to identify candidate loci on the genome for adaptation
to drought stress by detecting associations between changes in allele frequencies across generations and naturally varying water-deficit conditions.

The thesis is structured as follows. First, an introduction to the relevant genetics concepts
is provided by means of a review of the existing literature on drought tolerance in B. napus. Second the research design, the data generating process and the data are presented. The third part discusses the statistical methods used in the thesis and why they are suited to
the data at hand. Two methods are pursued. The first is based on zero-inflated negative binomial regression models of allele counts within gene-derived haplotypes. The second uses Beta regression models of allele frequency trajectories in larger haplotype blocks across generations. The results of both methods are presented and their relative advantages and disadvantages are discussed.

No biological assessment of the resulting candidate loci is attempted. Nevertheless a first
conclusion is that both methods provide a small number of candidates suited for further biological investigation. A second conclusion is that the low-cost sequencing approach and missing data pose major challenges for both methods, likely resulting in false negative findings.
Manuel Morales Wyden AI for Venture Capital: Predicting Short-Term Success of Early-Stage Startups Dr. Markus Kalisch Oct-2023
Abstract: In venture capital, a strong demand for solutions that provide efficiency gains or improve investment decisions prevails. This thesis aims to develop a framework for predicting short-term startup success and reduce the time needed for feature collection. It does so by providing an overview of promising features based on expert interviews and an extensive literature research. The selected features span across various categories, including textual, team, financial, and product-based characteristics. Further, a theoretical foundation of different state-of-the-art statistical learning methods, namely logistic regression, random forests, and gradient boosting, is built and thereafter applied on the aforementioned features. The data for the analysis is based on pitch decks from 245 startups provided by the industry partner SNGLR Group AG, making this thesis, to the best knowledge of the author, the first study to assess the relevance of pitch deck features. Some features are extracted by using a novel large language model (LLM) method and are then compared to more traditional methods. Thereby, the LLM method appears to be working well for basic tasks, however, for more complex tasks, there is a lack of baselines to compare, since the more traditional methods are not reliable enough. The success of the early-stage startups is defined by survival. The analysis does not yield any significant results for the success prediction. While it remains open whether pitch decks have any useful predictive signal, this thesis suggests two major adjustments for future research: first, manually extracting the features to create a ground truth for both LLM comparison and success prediction; and second, extending the dataset and utilized features by possibly including external data sources.
Nicolas Kubista Methods and Software for Penetrance Estimation in Complex Family-based Studies Dr. Markus Kalisch
Prof. Dr. Giovanni Parmigiani
Dr. Danielle Braun
Oct-2023
Abstract: Reliable methods for penetrance estimation are critical to improve clinical decision making and risk assessment for hereditary cancer syndromes. Penetrance is defined as the proportion of individuals with a certain genetic variant (i.e., genotype) that causes a trait and who show symptoms of that trait, such as cancer (i.e., phenotype). We propose a novel estimation approach, accompanied by a software implementation, to estimate age-specific penetrance using a four-parameter Weibull distribution. We employ maximum likelihood techniques and a Bayesian approach to estimate the parameters of the penetrance function. We build upon existing software implementations of the Peeling and Paring algorithm to efficiently compute likelihoods.
To validate the robustness and applicability of these methods, we applied them to simulated family-based data, aiming to estimate the age-specific penetrance for breast cancer for carriers of a BRCA1 mutation. The maximum likelihood estimation provides unsatisfactory results. After a reparameterization and employing the Bayesian approach, we are mostly able to recover the true parameter values from the data-generating distribution within the 95 % credible interval. Our software implementation provides the basis for a flexible and user-friendly approach to estimate penetrances from complex family-based studies.
Andrea Thomas Nava Estimating the Distribution of Organic Aerosol in Europe: With Applications of Conformal Prediction Prof. Dr. Nicolai Meinshausen
Dr. Daniel Trejo Banos
Dr. Imad El Haddad
Sep-2023
Abstract: Organic Aerosol, also known as fine dust, is one of the main components of Particulate Matter (PM) and has severe health and climate impacts. Knowledge of its spatial distribution and source composition is crucial for making informed policy decisions. This thesis addresses this need by investigating how to estimate the spatial and temporal distribution of Organic Aerosol across Europe over the past decade. In the present work, we leverage both field measurements and predictions from a physical simulator and ultimately try to combine the strengths of both data sources. A key part of this thesis is the validation of the simulator using the available field measurements. In particular, by framing the problem as a regression task we can study, simultaneously, the effect of many potentially interacting factors on the performance of the simulator. The second contribution of this thesis is the statistical down-scaling of Organic Aerosol, via regression methods, in order to obtain a higher resolution map of Organic Aerosol concentrations over Europe. We enhance the reliability of the down-scaling across seen and unseen locations by providing an uncertainty quantification in the form of distribution free prediction intervals using Conformal Prediction and its extensions to non-exchangeable settings. Finally, we again combine field measurements with predictions from a simulator through a proof-of-concept application of the recently introduced Prediction-Powered Inference framework.
Mar Vázquez Rabuñal Sparse Partially Linear Neural Additive Models Dr. Markus Kalisch
Prof. Dr. Valentina Boeva
David Wissel
Sep-2023
Abstract: The use of complex machine learning models in critical domains, such as healthcare, often implies a trade-off between accuracy and interpretability of the results. In this work, we propose a novel neural network-based model that performs structure selection for each explanatory feature using a proximal operator. For time-to-event data, we introduce the concept of double structure selection to deal with time-dependent feature effects. We show the competitiveness of our structure selection-model in comparison to other methods on simulated and real-world datasets, in terms of correct structure finding and predictive performance. The underlying flexibility of neural networks makes the model applicable to a great variety of data types and we present the application to regression, classification and survival data.
Gianna Marano Analysis of Irregularly Spaced Time Series: A Gaussian Process Approach Dr. Markus Kalisch
Dr. David Perruchoud
Sep-2023
Abstract: Conventional time series analysis methods often assume evenly spaced observations, which may not always re
ect real-world data collection constraints. Motivated by a real-world example, this study highlights Gaussian processes as a potent tool for analyzing irregularly sampled time series data. Using a simulated blood pressure dataset designed to mimic real-world dynamics, includ-
ing cyclic, autoregressive, and long-term trend components, we evaluate Gaussian process regression's performance in estimating blood pressure values from one week of irregularly spaced measurements. We assess the accuracy of credible interval estimation for clinically relevant target measures through repeated simulations, comparing it with baseline methods, such as spline and linear regression, accompanied by bootstrapped confidence intervals. Our investigation extends to the impact of varying data density and sampling patterns, specifically comparing uniform and seasonal sampling, where data density
fluctuates with the circadian cycle.
Results consistently demonstrate Gaussian process regression's superior performance across all target measures, data densities, and sampling patterns. While linear regression, featuring a linear trend and sinusoidal component, serves as a viable baseline under low-data
scenarios, it exhibits notable estimation bias due to its inherent constraints, and thus does not improve with more data. In contrast, spline regression offers flexibility but falters with seasonal sampling due to its lack of prior function knowledge. Gaussian process regres-
sion strikes a balance between
flexibility and encoding prior beliefs about the true blood pressure function, yielding accurate results even with sparse, seasonally sampled data. Notably, it explicitly models the autoregressive component, yielding more precise credible
intervals compared to the bootstrapped confidence intervals of the baseline methods. In summary, this study shows the potential of Gaussian processes as a robust tool for the analysis of irregularly sampled time series data, as exem-plified by its application to blood
pressure estimation.
Jonas Samuel Gohlke Supervised machine learning methods for evaluating demand elasticities and willingness-to-pay in a mode-choice context Dr. Lukas Meier
Prof. Dr. Kay W. Axhausen
Lucas Meyer de Freitas
Sep-2023
Abstract: Mode choice prediction models are fundamental tools in transportation planning and policy analysis, aiding in our understanding of traveler preferences and behavior. As machine
learning continues to advance, researchers have increasingly recommended the adoption of
machine learning algorithms for predicting transport mode choices. This thesis conducts a comparative study, assessing classical multinomial logit (MNL) and mixed logit (MXL) models against machine learning models in the context of mode choice prediction. The thesis is bifurcated into two primary parts: First, it examines the predictive performance of these models, and second, it explores their interpretability and capacity to extract key economic indicators.
The findings underscore that machine learning models outperform classical models in terms of prediction accuracy. Remarkably, machine learning models exhibit their prowess
even when dealing with datasets that lack information about non-chosen alternatives. However, it is imperative to acknowledge a distinct advantage possessed by MNL and
MXL models: their ability to extract values of travel time savings (VTTS). VTTS stands as a pivotal metric in the evaluation of the economic merits of transportation projects and
policies, especially within the context of cost-benefit analyses. In summary, this thesis not only underscores the strengths and limitations of various mode choice prediction models but also emphasizes their capacity to retrieve essential economic indicators. While machine learning models shine in prediction accuracy, classical MNL and MXL models remain indispensable for deriving VTTS
Chao Zhang Non-Parametric Regression Dr. Markus Kalisch Sep-2023
Abstract: This article is primarily intended to cater to researchers who possess non-statistical backgrounds, aiming to comprehensively examine the properties and applications of classical techniques within the realm of nonparametric statistics.

We commence our exploration with a focus on Kernel Density Estimation (KDE), wherein we asymptotically compute the Mean Integrated Squared Error (MISE). We systematically analyze the impact of varying data sample sizes, kernel types, and bandwidth selection on the quality of estimation outcomes, drawing upon insightful simulations. Furthermore, we engage in a thorough discussion of KDE's limitations when extended into high-dimensional spaces. Proceeding from KDE, we introduce the Nadaraya-Watson kernel estimator, elucidate its inherent relationship with KDE, and substantiate its convergence properties. We delve into the critical aspects of the hat matrix and leave-one-out cross-validation (LOOCV) in the context of this estimator, underscoring the pivotal role of bandwidth selection. Our journey of exploration continues to encompass local polynomials, a natural extension of the Nadaraya-Watson estimator. Here, we compute the asymptotic bias and variance of this estimator. Through simulations, its enhanced performance, particularly in proximity to data boundaries, are unveiled. Lastly, we delve into smoothing splines, tracing their evolutionary path from polynomial regression and interpolation techniques. Our analysis includes a comparative assessment of MISE and coverage when juxtaposed with local polynomials, illuminating their respective strengths and limitations.

In the concluding segment of this article, we provide a pragmatic tutorial to facilitate the practical application of these methods. We offer R templates and guidance for fitting, prediction, bootstrap procedures, and cross-validation techniques, thus equipping researchers with invaluable tools to harness the full potential of these classical nonparametric methods in their research endeavors.
Max van den Broek Estimation under Restricted Interventions via Wasserstein Distributionally Robust Optimisation Prof. Dr. Nicolai Meinshausen Sep-2023
Abstract: In this thesis we combine causality and distributionally robust optimisation to develop estimators that perform well under restricted interventions. We review the field of distributionally robust optimisation and give a comprehensive updated overview of the framework and the advantages and disadvantages of different methods. The basics of distributionally robust optimisation and the main ingredients such as the radius of the ambiguity set and how to solve such a problem are discussed. We bridge the gap between causality and distributionally robust optimisation in different ways. First, we use Wasserstein distributionally robust optimisation to develop strong duality under the Pseudo-Huber loss function and combine Wasserstein distributionally robust optimisation with anchor regression. The robust Wasserstein profile function is used to derive asymptotic stochastic upper bounds that determine the radius of the ambiguity set for the 1-Wasserstein and 2-Wasserstein under the Pseudo-Huber loss. Second, we develop two novel estimators via Wasserstein distributionally robust optimisation that use new cost functions incorporating the causal structure of a linear structural casual model. The estimators can be obtained with few computational resources. The work is extended to unknown causal graphs. The construction of the estimators via Wasserstein distributionally robust optimisation allow for a paradigm shift that first determine the type of restricted interventions and then find an estimator. Restricted interventions that can be encompassed in this paradigm are presented. Simulations show that the anchor transformation Pseudo-Huber loss provides advantages, but also highlight the limitation of the robust Wasserstein profile approach in small sample sizes. The Wasserstein distributionally robust estimators show strong performance against distribution shifts if one of the causal parents influences another causal parent or when latent confounding is present.
Markus Ulmer Spectral Deconfounding for Tree Based Models Prof. Peter Bühlmann
Cyrill Scheidegger
Sep-2023
Abstract: The ability to estimate causal function has become increasingly important. However, it is only possible to estimate these functions in controlled studies or processes with a known underlying directed acyclic graph. One example is the confounding model, where a confounder affects both the response and the predictor. If the confounding is observed, the causal effect can be estimated using two-stage least squares. This estimation becomes much more difficult when the confounding variable is not observed, and its dimension is unknown. The literature uses a spectral deconfounded lasso to estimate linear causal functions in such a setting. In this thesis, we reproduce the corresponding simulation study in the linear case and use the methodology to define a spectral objective where the confounding effect asymptotically goes to zero. We develop spectral deconfounded regression trees to minimize the spectral objective. We build spectral deconfounded random forests with those trees to estimate arbitrary causal functions despite unobserved confounding effects. We conduct simulation studies to compare the deconfounded models against the classical ones. Our models outperform the classical ones in settings with confounding and perform equally well in settings without confounding.
Henry Grosse From Correlation to Causation: A Guide to Fairness in Machine Learning Prof. Nicolai Meinshausen Sep-2023
Abstract: Fairness in Machine Learning is often approached from a correlational perspective, ignoring the underlying causal mechanisms that lead to discrimination. This paper presents a comprehensive study on causal fairness, a concept that extends beyond traditional fairness metrics to consider causal relationships among features. By way of a thorough analysis of the current forefront on algorithmic fairness -- based on various authors and a series of case studies and practical implementations -- we elucidate the importance of causal reasoning in achieving truly fair machine learning models, continuously contrasting causal and non-causal methods in the process. To preface it, This paper first features the myriad of fairness definitions and their corresponding hurdles. On top of that, we then shift the focus towards causal inference, providing more robust tools for achieving fairness in algorithmic decisions. We present a range of applications of these methods to both real and synthetic datasets, validating their effectiveness through non-causal and causal measures. The ethical implications of machine learning have garnered significant attention, yet the role of causal inference in this domain remains comparatively under-explored. We argue that understanding the causal relationships between variables is key to mitigating algorithmic bias effectively.
Gellért Perényi Causality Light: Attenuating Causality with Anchor and Action Regression Prof. Dr. Peter Bühlmann
Dr. Xinwei Shen
Sep-2023
Abstract: Statistical algorithms are increasingly used in decision-making, but their reliance on independently and identically distributed (i.i.d.) data poses challenges in practical datasets where variables can shift over time or location, in general, over environments.
To address this issue, a causal perspective can be adopted to identify predictors that remain invariant despite environmental shifts. However, when differences between training and test data are moderate, causal models may be subpar compared to methods like Ordinary Least Squares (OLS). A solution tailored to the magnitude of environmental shifts that ensures robust predictions in a min-max sense is anchor regression. The first part of this thesis explores anchor regression, examining its properties and introducing an alternative formulation. Leveraging its interpolation between OLS and instrumental variable regression, we discuss scenarios where this estimator can ensure invariance. In the latter part of the thesis, we focus on a contextual bandit setup and extend the concept of environmental shifts beyond simple shifts to the variables to perturbations targeting the edges in the data generation process. We establish the framework for data generation in this scenario and introduce a new procedure called "action regression" based on distributionally robust prediction inspired by anchor regression. We derive a formula for the population version of action regression and compare its properties to anchor regression.
Finally, we conduct a comparative analysis of the methods discussed in this thesis alongside a recent approach specifically designed to address the environmental shift problem, known as Invariant Policy Learning. This comparison is performed using data collected from intensive care units (ICUs), and their predictive powers are evaluated based on their performance in the most harmful (previously unseen) environments.
Angel Garcia Lopez de Haro Bridging the Gap in Respiratory Syncytial Virus Prediction: Class Overlap and Domain Adaptation Dr. Markus Kalisch
Dr. Suwei Wang
Sep-2023
Abstract: This study presents a two-phase statistical modelling approach aimed at enhancing the prediction and understanding of Respiratory Syncytial Virus (RSV) positivity across varying patient populations. In Phase 1, we introduce a robust classification model that incorporates the complexity of class overlaps, offering a granular approach to predicting RSV-positive cases. We demonstrate that this approach outperforms traditional models
in terms of predictive accuracy, which focus on mitigating class imbalance but fail to address the challenge of class overlap. In Phase 2, we venture into domain adaptation, using techniques specific to tabular data to generalize our model to different medical
conditions and populations. While the model shows promise in transferability, it exposes certain limitations in estimating the underdocumented burden of RSV. These findings offer both methodological and clinical insights, and the work concludes by outlining
avenues for future research to refine these techniques further. This study sets the stage for more comprehensive, flexible and adaptable models for predicting RSV and potentially
other medical conditions.
Andrew Zehr Detecting Saharan Dust Events on the Jungfraujoch: An Application of State-Space Models to Time Series Segmentation Dr. Markus Kalisch
Dr. Benjamín Béjar Haro
Dr. Robin Modini
Sep-2023
Abstract: Saharan dust plays an important role in the atmospheric composition of Switzerland and Europe more broadly. Aerosol dust has numerous impacts, including on human respiratory health, the climate, and weather forecasting. The study of these events is thus of interest in many fields. Current methods to detect Saharan dust events (SDEs) rely mainly on remote sensing or only a small fraction of the available in situ measurements.
One source of in situ data is the Jungfraujoch research station in the Swiss Alps, where high-dimensional, atmospheric time series data is recorded. This project provides a survey of various time series models useful for detecting the occurrence of SDEs using this data. Neural network models are used but special focus is placed on unsupervised
state-space models, especially Hidden Markov Models. A detailed review of the theory and implementation of these models is provided. Furthermore, a survey of related extensions is provided. These models were shown to perform better than previous methods in detecting SDEs on a number of metrics. Furthermore, these models show promise in being applied
to detecting similar events, such as wildfires.
Fabian Kahlbacher Continuous Optimization DAG Learning in the Presence of Location Scale Noise: A Systematic Evaluation of Different Frameworks Prof. Dr. Peter Bühlmann
Dr. Alexander Marx
Alexander Immer
Sep-2023
Abstract: Causal discovery concerns the problem of learning the causal structures between variables of a system from observational data. To tackle the computational complexity when multiple variables are involved, continuous optimization is a growing area. However, most approaches rely on additive noise, and no systematic evaluation of non-additive location-
scale noise (LSN) has been performed. Modeling LSN or heteroscedasticity is important as it is common in many real-world systems. If heteroscedasticity is not modeled, this can lead to predicting the wrong causal structure. We build upon recent advances in continuous optimization for structure learning and provide extensions to a current method to
work in the presence of location-scale noise. Further, we consider an existing method that models LSN and has not been evaluated on synthetic data. We provide extensive synthetic experiments demonstrating the superiority of LSN methods over current continuous
methods that do not model LSN. The existing method modeling LSN achieves the best performance on all experiment types except one, where our proposed methods perform better. On the other hand, we do not observe performance improvements by modeling LSN on real-world data sets. Finally, we discuss the limitations and insights of learning the causal structure through continuous optimization-based approaches and propose multiple
ideas to improve our methods.
Arberim Bibaj Conditional Average Treatment Effect Estimation via Meta-Learners Prof. Dr. Nicolai Meinshausen Sep-2023
Abstract: In many fields, such as medicine and economics, one is interested not only in knowing the average treatment effect (ATE) of an intervention, but also in how treatment effects differ within a population depending on an individual's characteristics.
Estimating the conditional average treatment effects (CATEs)
can improve decision-making, but it is a challenging task since it is impossible to observe the causal effects on a single unit.
Recently, the literature has proposed so-called meta-learners to estimate the CATE.
A meta-learner is an algorithm that aims to estimate the CATE while allowing the use of any machine learning methods (referred to as base-learners).
The goal of this thesis is to give an overview of meta-learners and to empirically compare their performances in estimating the CATE while also trying three base-learners, namely random forests (RFs), lasso-based regression, and neural networks (NNs).
We test the meta-learners on fully-synthetic data, including various settings, and additionally on semi-synthetic data that consists of real covariates and simulated potential outcomes.
Consistent with the literature, we find that the prediction accuracy of a meta-learner can differ strongly across the base-learners.
Further, there is no meta-learner that performs best in all settings.
Nevertheless, while some conclusions about specific meta-learners can be drawn independently of the base-learners, we also find cases where the comparison of the meta-learners does depend on the base-learner.
Further, we find that the X-learner appears to be a very competitive meta-learner in all experiments and with all base-learners. Furthermore, we find two specific learners that perform well in all settings, namely the X-learner with RFs and the S-learner with NNs.
Yahuan Zheng Out-of-distribution Generalization via Invariant Prediction: An Overview and Comparative Study Prof. Dr. Nicolai Meinshausen Sep-2023
Abstract: Machine learning models that achieve good in-distribution performance often fail to generalize out-of-distribution. While classical supervised learning techniques prove to be efficient in reducing overfitting to the random noise, the i.i.d. assumption that assumes the
training and test sets are independent and identically drawn from the same distribution collapses when the test domain is significantly different from the training ones, causing machine learning models to have catastrophic performance without guarantees for distributional robustness. Modern machine learning now struggles with this “new type of overfitting” where a model could overfit to a series of training distributions and thus have poor generalization performance.

In this report, we examine a research field that has received increasing attention in recent years, known as out-of-distribution (OOD) generalization, where we wish to minimize the worst-case risks across a set of environments to achieve distributional robustness while
maintaining predictive power. Following a quick recap of fundamental machine learning concepts, we introduce the notion of causality and invariance, point out the connections between them and illustrate how using invariant relations helps improve generalization
performance under test set perturbations. With a formal formulation of the OOD generalization problem and a precise categorization of various types of distributional shifts, we particularly introduce three novel OOD generalization techniques, namely invariant risk minimization (IRM), risk extrapolation (REx), and anchor regression, using concrete experiments to compare and evaluate these methods. Based on our discussions and findings, we then propose a task-based framework of distributional robustness that outlines the golden standards centered around OOD generalization, where the type of robustness desired, the strength of perturbation across both training and test domains and the type of specific tasks to be performed all play a crucial role and should be considered with
caution when designing a solution to OOD generalization problems.
Afroditi Iliadis Deep Learning Techniques for Music Analysis: Leveraging Musical Structure and Human Preference Prof. Dr. F. Balabdaoui
Prof. Dr. A. Bandeira
Aug-2023
Abstract: The field of music generation has seen significant advancements in machine learning, particularly in deep learning techniques. However, despite numerous approaches proposed thus far, a universally accepted model for generating high-quality music remains elusive.
This thesis addresses some challenges associated with the quality of generated music and proposes strategies to overcome them. Currently, generated music lacks the ability to capture intricate features present in music produced by humans, such segmentation and human preference.
To bridge this gap, this work focuses on the development of deep learning tools that tackle two key tasks: First, extracting musical structure, including elements such as choruses, verses, bridges, and other segments within a song. Second, identifying the musical
sequences that are preferred by humans.
Through experimental investigations, the proposed tools demonstrate their effectiveness in extracting features that could play a significant role in generation of music. These findings pave the way for leveraging these features to create music that resonates with
audiences and aligns with human preferences. The contributions of this thesis provide valuable insights into the field of AI in music. The developed tools not only offer immediate applicability, but also establish a solid foundation for future research in this exciting area.
Damjan Kostovic Covariance Matrix Estimation with Reinforcement Learning: A Data-Driven Approach to Linear Shrinkage Dr. Markus Kalisch
Prof. Michael Wolf
Dr. Gianluca De Nard
Aug-2023
Abstract: There exists a plethora of covariance matrix shrinkage estimators built on years of statistical and financial theory. Our objective is to leverage this foundation and construct a data-driven shrinkage estimator, drawing upon established linear covariance matrix estimators and ideas of Reinforcement Learning. In our extensive empirical study, we find that our data-driven shrinkage estimator significantly outperforms the linear benchmark. Additionally, we observe that our estimator yields similar results to nonlinear alternatives on small to medium-sized portfolio sizes while having lower portfolio turnover and leverage numbers. In line with the literature, our empirical results show the outperformance of nonlinear shrinkage methods compared to linear methods. We additionally underline the power of data-driven learning for choosing the optimal shrinkage intensity.
Sabrina Galfetti Convergence analysis of classifiers with a vanishing penalty to margin maximizing estimator Prof. Dr. Sara van de Geer
Felix Kuchelmeister
Aug-2023
Abstract: The maximization of margins plays a crucial role in the analysis of classification models, including support vector machines. This aspect is of great importance due to its significant impact on generalization error analysis and the clear geometric interpretation it provides for the constructed models. In this thesis, we conduct an extensive analysis of the work done by Rosset, Zhu, and Hastie (2003), aiming to shed light on its strengths and identify potential missing conditions. This investigation enables us to establish a sufficient condition for the convergence of solutions obtained from penalized loss functions toward margin-maximizing separators as the tuning parameter approaches zero. The proposed condition encompasses widely used loss functions such as the hinge loss in support vector machines and the logistic regression loss. We then delve into a comprehensive examination of the primal problem associated with the penalized optimization problem. Perhaps, surprisingly we will uncover a distinct conclusion when investigating a vanishing loss function. We establish and rigorously prove sufficient conditions for convergence in this primal problem setting. This knowledge opens up avenues for further exploration and potential refinements in the development of classification models, improving their optimization and generalization capabilities in practical applications.
Lukas Looser On the On the convergence rate in the monotone distributional single index model Prof. Fadoua Balabdaoui
Dr. Alexander Henzi
Aug-2023
Abstract: The monotone distributional single index model is a semiparametric model that estimates the conditional distribution of a real valued response variable given d covariates. The conditional Cumulative Distribution Function (CDF) only depends on the covariates via a linear projection by a d−dimensional index to R via the stochastic ordering. The model for the index is parametric, whereas the conditional CDF’s are estimated non-parametrically under the stochastic order constraint.

Both the conditional distribution function and the index are unknown and estimated with a weighted least squares approach. We study the convergence rate of the estimators for the bundled conditional CDF, the conditional CDF and the index. Under appropriate conditions the convergence rates for the function and the index are at least n1/3 in the L2 distance and the Euclidean norm respectively. Simulations for various settings support these results. As the least squares estimation of the index is computationally expensive, alternative index estimators from the literature are considered as well. If the convergence is sufficiently fast, we show that the convergence rate of the combined estimator is maintained.
Kai-Christoph Lion Do Ensembles Really Need Multiple Basins? Prof. Dr. T. Hofmann
Prof. Dr. P. Bühlmann
Jun-2023
Abstract: We explore various approaches for building so-called connected ensembles consisting only of models from a single loss basin, i.e., any convex combination of its members suffers no drop in test loss. While there seems to be a fundamental trade-off between the degree of connectivity and performance, we observe that incorporating additional knowledge from other basins significantly boosts performance without reducing connectivity. More specifically, we present a novel distillation procedure, which, given a deep ensemble, allows us to re-discover any of its members in a given basin. We further show that our approach naturally extends to Bayesian neural networks. Our results require us to rethink the characteristics of neural loss landscapes. In particular, we show that a single basin largely suffices to account for the functional diversity required to build strong ensembles. As a consequence, we conclude that encouraging the exploration of multiple basins -- as commonly done in ensembling and Bayesian learning -- might not be a necessary design principle.
Fabian Otto Neural Processes for Multi-task Time Series Forecasting Dr. Lukas Meier
Dr. Simon Dirmeier
Prof. Dr. Fernando Perez Cruz
Jun-2023
Abstract: In demand forecasting, large retailers are tasked with forecasting potentially thousands of products with intricate correlational structures between them. In this setting, traditional forecasting methods which predict one time series (TS) at a time are inefficient to use and cannot account for the dependencies between items. In contrast, Recurrent Neural Network (RNN) based approaches which model all considered TS jointly are a more promising candidate for this task. Another relevant approach is Gaussian Process (GP) based forecasting. GPs have the advantages of quantifying both aleatoric and epistemic uncertainty, and they tend to work well when only few observations are available. On the other hand, they require specifying a covariance function manually and they do not scale well to high-dimensional data sets. Neural Processes (NPs) are encoder-decoder architectures that have been proposed to combine the benefits of neural networks and GPs: They approximate a stochastic process via neural networks in a Bayesian framework. In this way, a covariance function is learned implicitly from the data, and the computational complexity is reduced compared to GPs, while the theoretical advantages of Bayesian modeling are preserved. Moreover, once trained, NPs can be used as meta-learning models to obtain forecasts for previously unseen items based on limited context observations. Here, we experiment with a number of related NP-based models which have the same RNN decoder on one synthetic and two real-world data sets. We find that NPs seem capable of producing good fits on all considered data sets, but we observe that the effects of adding an encoder in the NP framework with attention and / or recurrence depend strongly on the considered data set. Based on our findings, we propose extensions of this work and a replication of our results in a larger-scale benchmark study.
Silvan Vollmer On linear separability Prof. Dr. Sara van de Geer
Felix Kuchelmeister
May-2023
Abstract: The availability of numerous variables for data analysis has become increasingly common in times of big data. This poses challenges for the widely used maximum likelihood estimator for logistic regression due to the possibility of linearly separable data. We examine the
event of linear separability in generalized linear models with a binary response. The two aims of this thesis are (i) to unite the existing literature on linear separability and (ii) to contribute to it when dealing with finite sample sizes. In the nonasymptotic regime, we propose a bound on the probability of linear separability for a generalized linear model with Gaussian covariates. Our results build on ideas from stochastic geometry involving the
statistical dimension of a convex cone. We introduce convex cones, intrinsic volumes and kinematic formulas. In the high-dimensional asymptotic regime, the number of variables scales with the number of observations. We distinguish between two approaches to analyze
linear separability in this setting. While one of them lies in the area of conic geometry and serves as a foundation for our own theory, the other one is based on the convex Gaussian min-max theorem. We provide intuition on the proofs, state the results and show generalizations of both approaches. Overall, this work expands upon the literature on linear separability in the nonasymptotic regime by considering scenarios with signal in the data.
Florian Schwarb Causal Models for Heterogeneous Data within Environments Prof. Peter Bühlmann
Dr. Xinwei Shen Dr. Michael Law
May-2023
Abstract: Most contemporaneous data sets violate the commonly made assumption of being independent and identically distributed. Therefore, it is crucial to develop methods that exploit the heterogeneities present in most data sets. In this thesis, we combine two popular
approaches to deal with correlated data: mixed-effects models and causality. We extend the two popular methods Invariant Causal Prediction (Peters et al., 2016) and Anchor Regression (Rothenhäusler et al., 2018) by using the modeling framework inherent to linear mixed-effects models. Firstly, our method Invariant Causal Prediction under Random Effects (R-ICP) considers interventions on the causal mechanism and relaxes the strict exact linearity assumption
across environments. We propose a hypothesis test based on confidence regions to test for sets that are invariant under random effects. This method for causal discovery also offers
Type-I error control guarantees and yields valid confidence regions for causal effects. In an extended setting, we provide sufficient assumptions under which the causal effect becomes
identifiable. The effectiveness of our method is shown in various simulations.
Secondly, we propose the method Robust Prediction and Causal Identification for Linear Mixed-Effects Models (causalLMM), where we adapt the underlying structural equation model used in Anchor Regression to allow for correlation within and not only between environments. This yields a closely related estimator that comes with the same robustness guarantees against shift intervention and can also be interpreted as a causal regularization
of the least-squares estimator for linear mixed-effects models. Moreover, the adapted structural equation allows for identifiability of the causal effect under a moment condition as we no longer allow feedback cycles between the response and other variables. Again,
we demonstrate the properties of this novel estimator using simulations.
Valentina Petrovic Simulation-Based Evaluation of Discrete Choice Models: An Overview and Applications Dr. Markus Kalisch May-2023
Abstract: In the last two decades, Discrete Choice Models have gained significant importance in the choice modelling framework. While they were initially mostly used in the transportation industry, the continuous improvements have lead to their expansion into a diversified range of econometric fields, including healthcare, environmental and consumer studies, as well as social sciences, such as criminology, moral decision-making and discrimination. This paper aims to give an overview of the most famous models which have been used in practice, by
highlighting their points of strength and their drawbacks. In particular, the topics that are addressed are the simplicity of the Logit and Nested Logit models, and the modelling of the random taste variation through the Probit and Mixed Logit models. Furthermore, the
mlogit package in R was extensively employed to conduct various simulation studies which highlighted the efficiency and reliability of these models. To provide an understanding of their application, guidance is given on how they can be implemented and interpreted. Finally, despite the proven pragmatic use of traditional choice modelling techniques, there are other models which are increasingly emerging in practice, namely the Hybrid Choice Models which account for latent factors such as attitude, motivation and perception, allowing for a more precise and trustworthy study of the human decision-making behaviour.
Andrej Ilievski Predicting River Discharge Using Gaussian Processes Defined on River Networks Prof. Dr. Fadoua Balabdaoui
Dr. W. H. Aeberhard
Dr. C. Donner
Apr-2023
Abstract: In this thesis, we explore the problem of constructing valid covariance matrices of Gaussian Processes defined on river networks, with the aim to predict river discharges at different locations and time points along the river stream. Traditional geostatistical applications incorporate Gaussian Processes in order to capture the spatial dependence between the locations of interest throughout the domain of definition, which is typically $\mathbb{R}^2$. However, one limitation of such approaches is their extension to constrained domains, e.g. a river network. The dependence between two observations defined through the covariance matrix of the GP is typically a function of the Euclidean distance of their locations, which first, does not trivially extend to the case of a constrained domain, and second, neglects both the flow-connectivity and the stream distance of the observations along the network, which can significantly differ from the Euclidean distance. This brings up the challenge of constructing non-trivial (spatio-temporal) covariance matrices while preserving their validity (positive definiteness and symmetry). In this thesis, we describe the procedure of defining a spatio-temporal Gaussian Process on a river network and constructing valid covariance matrices for the process, both separable and non-separable, with the aim to predict river discharges at different locations along the network. The thesis focuses specifically on discharge predictions on the Swiss river streams - provided with daily average discharges from 22 river gauge stations, we explore the performance of simple kriging discharge predictions based on different covariance matrix constructions.
Qianzhi Zhuang Multidimensional Scaling Methods And Their Applications Dr. Lukas Meier Apr-2023
Abstract: To explore patterns and relationships that exist in the data is of essence in a wide range of areas in mathematics or data science, and to answer this question, we can refer to
dimensionality reduction techniques, to be specific, multidimensional scaling, i.e. MDS. The practical method ultimately achieves statistically, a low-dimensional map from proximity data to distance data which is so powerful that can deal with not only numerical input but also rankings or preferences input.
Illustrated with naive examples and advanced examples, this paper will present the theoretical knowledge backing up multidimensional scaling together with the interpretations and applications under real life scenarios. In addition, it also discusses the issues that could probably come up using this tool as well as drawing connections with other methods such as Procrustes analysis and Factor analysis (which are both dimension reduction methods). Lastly, the latest progress especially in computing programs that aimed to improve the goodness
of such a solution will also be included.
Joppe de Bruin Topics in Conformal Prediction With applications to covariate shift and a comparison to Doubly Robust Prediction sets Prof. Dr. Nicolai Meinshausen Apr-2023
Abstract: Advancements in machine learning and statistical learning methods have significantly enhanced the quality of machine-based predictions. However, accurately predicting outcomes is often not enough, and quantifying a model's uncertainty about its predictions is also crucial. One way to convey uncertainty is by providing prediction intervals, but many current methods can perform poorly when regression models are misspecified or data is limited. To address this issue, conformal prediction has been introduced as a model-agnostic method for creating distribution-free prediction intervals with finite sample coverage guarantees. This thesis contributes to the ongoing refinement of conformal prediction methods for regression problems by reviewing the most recent developments in the field. We investigate the adaptivity of different conformal prediction intervals to uncertainty in predictions and compare recently introduced methods, such as distributional conformal prediction and conformalized Bayes regressors, to the popular conformalized quantile regression approach. To do this, we operationalize these methods beyond their proof-of-concept implementation. Using a recently introduced nested set representation, we find that the focus on the average length of prediction intervals in the literature can have detrimental effects on adaptivity. Our results suggest that distributional conformal prediction is a preferable method, refining conformalized quantile regression. We extend the analyses to predictions for covariate-shifted test points. Since finite sample guarantees are impossible to provide without knowledge of the covariate shift, we compare weighted conformal prediction to doubly robust prediction sets, which were developed to provide asymptotically valid prediction sets. While the doubly robust prediction approach performs better in larger samples, weighted conformal prediction is the better choice in small sample settings. Our findings demonstrate the great potential of conformal prediction and doubly robust prediction in reliably conveying prediction uncertainty, but emphasize the need for careful implementation and consideration to fully realize the benefits of these methods.
Jovin Simon Koller Discrete Choice Models: A theoretical introduction and validation of the mlogit package Dr. Markus Kalisch Apr-2023
Abstract: Discrete choice models are popular methods in econometrics to analyse discrete dependent variables across fields such as healthcare, transportation, marketing, and more, to understand how individuals make choices among a set of alternatives. Discrete choice models are based on the assumption that individuals choose the alternative that provides them with the highest utility, and that the probability of choosing a particular alternative depends on the attributes of the alternative and the individual’s characteristics. In order to estimate discrete choice models, several assumptions must be satisfied, such as assumptions made about the unobserved components of utility or the specification of coefficients. The thesis introduces the three most common discrete choice models, namely the multinomial, nested and mixed logit models. It provides a detailed explanation of how the choice probabilities are derived, what the underling assumptions are of each model, and outlines existing estimation techniques. Furthermore, the correct estimation of parameters using the R package mlogit from Croissant (2020) is verified for all mentioned models. Under satisfaction of the assumptions, it was shown that the multinomial logit, nested logit as well as mixed logit model achieve a coverage probability of 95% and therefore properly estimate the parameters. Lastly, an extensive practitioner’s guide is provided for a researcher that aims at taking initial steps to estimate discrete choice data set using the mlogit package.
Yongqi Wang Analysis of the Generalization Properties and the Function Spaces Associated with Two-Layer Neural Network Model Prof. Dr. Sara van de Geer Mar-2023
Abstract: The use of neural network-based models for approximation has proven to be highly effective across various domains. In this thesis, we demonstrate the universal approximation property of two-layer neural networks (2NNs), which allows them to approximate any continuous function on a compact subset in Rd uniformly well. Additionally, we explore the class of functions that meet a smoothness constraint on their Fourier transform and can be effectively approximated by 2NNs. These functions belong to the Fourier-analytic Barron spaces, which we characterize. We then introduce the concept of infinite-width Barron
spaces, where any function can be approximated well by 2NNs through an integral representation. Finally, we draw connections between these spaces and provide a comparative analysis of their properties.
Martin Gasser Fairness in machine learning: A review of approaches applied to the prediction of long-term unemployment Prof. Dr. Nicolai Meinshausen Mar-2023
Abstract: Statistical and machine learning models are increasingly used to inform high-stakes decisions, which has raised concerns that the use of such models may systematically disadvantage certain groups. This problem of “fairness in machine learning” is addressed
by a vast and rapidly growing body of research. However, this literature
remains largely academic, with examples usually taken from a canonical set of publicly available data and with little consideration given to practicability. The goal of this thesis is to review how a statistical fairness evaluation could be applied to a simple but realistic example from the context of the Swiss Unemployment Insurance. The thesis is structured as follows. First, the three standard observational definitions
of group fairness are discussed: independence, separation and sufficiency. Second, a selection of promising measures to quantify deviation from fairness is presented. These include measures based on relative risks, distances between probability distributions and extensions of partial-R2. Third, consistent estimators for these fairness measures are reviewed. Fourth, the fairness measures are applied to a model that
predicts the risk of long-term unemployment using data from the Swiss Unemployment Insurance. A gradient boosting model was trained on data from 2014-2018 and evaluated on data from 2019. A first conclusion is that the model fails the statistical fairness evaluation. This is not surprising given that the model was trained without explicitly enforcing fairness
constraints. A second conclusion is that the most convincing approach to evaluate fairness in this concrete application is by comparing each minority group to the majority group, using relative risks when predictions are categorical or the type-1 Wasserstein distance when predictions are continuous. A third conclusion points to some practical limitations in the current literature: The possibly too strong formalisation of fairness as (conditional) independence, especially in the absence of a causal interpretation; the relative scarcity of research that deals with multiple protected attributes; the little consideration given to fairness measures, especially to their interpretability, to their sensitivity to different kinds of deviation from fairness, and to the availability of inferential tools. Taken together, more practical guidance is needed on how to bridge the gap between academic research and practical applications.
Jack Foxabbott Causal Inductive Biases for Domain Generalization in Computer Vision Prof. Dr. Nicolai Meinshausen Mar-2023
Abstract: In machine learning, the independently and identically distributed (iid) assumption requires that samples in a test set are drawn from the same joint distribution as those in the training set. Domain generalization is the problem of learning a model that will continue to perform well even when this assumption fails to hold, and there is significant perturbation to the joint distribution between datasets. Recent work has framed this problem through the language of causality, explaining distribution shifts as interventions on a structural causal model, and allowing us to identify those relationships that will remain invariant under a set of such interventions. In this work, we focus on the computer vision setting, where features representing intuitive, high-level concepts must first be extracted before a classifier can be built. We explain the vulnerabilities of ERM to distribution shifts, and perform a deep dive on Invariant Risk Minimization, a recent method that uses multiple environments of data to exclude non-invariant relationships from its model. We prove the success and failure cases of IRM in a binary classification setting, noting that in some cases it will actually fail to outperform ERM. We verify our theoretical results using the Coloured MNIST dataset. We provide a thorough analysis of subsequent work that has taken inspiration from IRM, before exploring a range of recent causality-inspired methods for representation learning, some of which provide guarantees of identification of causal latent variables underlying high-dimensional datasets, like those commonly used in computer vision. We conclude that leveraging the inductive biases provided by causality holds great promise for improving sample complexity and few-shot learning in machine learning models, including but not limited to computer vision.
Georgios Gavrilopoulos Comparison of Sequential Quantile Forecasts Prof. Peter Bühlmann
Dr. Alexander Henzi
Mar-2023
Abstract: The prediction of future events has always been a subject of considerable mathematical and social interest. Prediction frequently involves forecasts for future events, either in the
form of point forecasts or in the form of predictive probability distributions. Moreover, it is often the case that we are interested in a series of events, such as the probability of rain for each of the days of the following month, so forecasters make their predictions
sequentially, after each day. For the evaluation and comparison of forecast performance, several methods have been proposed in the literature. The main tool used to compare probability forecasts is proper scoring rules. Proper scoring rules assign numerical scores to different forecasts based on the forecasts themselves and the actual outcomes of the corresponding events. Depending on the nature of the forecast,
different scoring rules may be appropriate. Inevitably, the results we obtain from these scoring rules depend on the particular outcomes that are materialized and that we observe. To account for this sampling uncertainty, some authors have proposed e-values, which are analogous to p-values, and confidence sequences, a stronger analog of confidence intervals. A significant effort has been made to discover methods that are universally valid, regardless
of the strategy of the forecasters or the probability distribution generating the real outcomes. While most of these methods have been designed to fit in the framework of binary and multi-class classification, we are especially interested in forecasts that are quantiles of probability distributions. Quantile forecasts have also been studied in the literature, but the comparison of unbounded quantile forecasts remains open to the best of our knowledge. In this thesis, we initially review the literature on the assessment and comparison of
probability forecasts. Subsequently, we attempt to extend the existing methods to cover our case of interest, namely quantile forecasts. Finally, we evaluate the methods proposed in the literature in terms of power by applying them to simulated and real-life datasets.
Lea Tamberg Prediction of wellbeing outcomes based on indicators for basic human needs satisfaction Prof. Nicolai Meinshausen
Prof. Julia Steinberger
Prof. Jason Hickel
Mar-2023
Abstract: This thesis explores the relationship between basic human needs and wellbeing outcomes and examines whether this relationship explains the empirical link between GDP per capita and wellbeing. To this end, it analyses an international cross-sectional and panel data set based on a mapping of different basic human needs to indicators reflecting their satisfaction. The data is used to test whether there is still a significant statistical effect of GDP per capita on life expectancy and life satisfaction when controlling for levels of basic human need satisfaction. In addition, the thesis investigates whether the inclusion of need satisfiers significantly improves the prediction of the wellbeing outcomes and whether the predictive performance can be further improved by classical machine learning approaches. Finally, it examines whether there are differences between life expectancy and life satisfaction regarding the importance of different basic human needs.

According to the results, the effect of GDP is smaller for both measures of wellbeing when need satisfaction indicators are added to the linear model, and it becomes insignificant in the case of life expectancy. Moreover, need satisfaction indicators significantly improve the prediction of both life expectancy and life satisfaction, with small additional improvements from machine learning methods compared to the ordinary least squares model. Different basic needs have varying importance for life expectancy and life satisfaction but universal health coverage is crucial for both wellbeing outcomes. The results suggest that basic human need satisfaction is likely to be a relevant determinant of human wellbeing. However, the dimensions of human needs considered in the analysis are also likely to be incomplete. Moreover, in the case of life satisfaction, other factors than need satisfiers might need to be included to explain its strong correlation with GDP per capita.
Antoine Albert Jeanrenaud Flexible Drifts Correction of Metabolomics Data with Change Point Detection Prof. Dr Peter L. Bühlmann Mar-2023
Abstract: With recent technological advancements, large-scale metabolomics experiments have become increasingly popular, resulting in vast datasets that require preprocessing to ensure reliability before analysis. Although several normalization and correction methods have been proposed, this field remains evolving. This thesis introduces flexwinn, a novel solution that employs change point detection to correct distorted metabolomics datasets. The thesis first establishes the background theory of change point detection, followed by an in-depth analysis of flexwinn's performance, including its advantages and drawbacks in comparison to existing methods. The results suggest that flexwinn is a competitive method that offers additional flexibility of use and robustness to bad plate information.
Carlos García Meixide Causality in time-to-event semiparametric inference Prof. Dr. Peter Bühlmann Mar-2023
Abstract: We propose a new non-parametric estimator for counterfactual survival functions under right-censoring using reproducing kernel Hilbert spaces (RKHS). We prove mathematical results concerning its asymptotic behaviour, illustrate its practical performance through a simulation study and apply it to the SPRINT trial, obtaining coherent results with parallel analysis in the literature. Our method provides a novel
tool for estimating counterfactual survival functions in observational studies using incomplete information, with potential applications in biomedicine.
Julien David Laurendeau Quantile Treatment Effects Prof. Nicolai Meinshausen Mar-2023
Abstract: Causal Inference permits us to answer some crucial questions in many fields. Sometimes the quantities of interest in these applications are "extreme" in contrast to the usual average treatment effect, for example when looking at extreme climate events in climate science, such as droughts and heatwaves for example. Keeping this in mind, we aim here to provide a comprehensive overview of the methods used to estimate quantile treatment effects, in particular extreme ones, and draw inference from these methods. "Extreme" comes in several degrees and may even go beyond the values that we actually observe, which is interesting when we want to do risk assessment. We'll go over the different levels of "extreme" we can have and how to deal with them. Finally, we will show how these methods can be relevant by applying them to a real data set of pressure and temperature variables in Europe from the EU's Copernicus program.

2022

Student Title Advisor(s) Date
Simon Niederberger Correction Strategies for Measures Computed from Erroneous Networks Prof. Dr. Peter L. Bühlmann
Meta-Lina Spohn
Dec-2022
Abstract: Measures computed from a network can provide insights into the network's properties or dynamics that depend on them. In practice, one may not observe the true network, but only an erroneous version of it, and the resulting measure may differ from the measure of the true network. This paper presents five simulation-based correction strategies that provide corrected point estimates and error intervals for a measure of interest. The methods assume a known error mechanism that led to the network error in the observed network, for example 10\% of the edges being missing at random. The Error Addition (EA) method and the more general Network Simulation Extrapolation (SIMEX) method add additional error to the observed network by applying the error mechanism to it and using the observed induced error to approximate and correct the true error. Network Correction (NC) requires a known error inverse mechanism which yields networks similar to the true network and computes the measure of interest on the resulting corrected networks. Network Correction - Error Addition (NC-EA) first applies the error inverse mechanism to the erroneous network, then applies the error mechanism to these corrected networks, and uses the thereby induced error as an approximation for the true error. Indirect Inference (II) uses an alternative inference approach in the NC-EA setting which leverages assumptions about the data-generating process. The performance and weaknesses of the methods are illustrated in a setting of random graphs with randomly missing edges, considering three measures of interest, i) the number of edges, ii) the relative size of the largest connected component and iii) a neighbourhood treatment effect.
Keyi Ma On Local Likelihood Estimation Prof. Rita Ghosh
Dr. Lukas Meier
Nov-2022
Abstract: In this paper, the local likelihood estimation technique initialized by Tibshirani and Hastie (1987) is applied to a set of spatial data of the form {ui; xi, yi}, where i = 1, 2, . . . , n. Our objective is to estimate the density f(ui; xi, yi) if ui are independently distributed observations at location (xi, yi) from a continuous univariate distribution. Given a pair of center
coordinates (xi, yi) and assuming f(ui; xi, yi) has a parametric form ˆ f(ui; θ(xi, yi)), we estimate θ(xi, yi) using our local likelihood estimator (LLE). LLE is therefore not a single point estimate but rather a function of center coordinates (xi, yi). Regarding the notion of ”local”, the kernel in our LLE will assign more weights to data points that are close to the selected center coordinates, while the bandwidths control the size of the neighborhood
surrounding xi and yi, respectively. Approaches for selecting optimal bandwidths, such as cross-validation, are discussed. Besides, the term ”likelihood” indicates that θ(xi, yi)
is solved by maximizing the local kernel-weighted log-likelihood function. In addition, we investigate the asymptotic properties of our LLE by comparing it to MLE and a simpler
LLE, which enables us to construct confidence intervals for our LLE. Lastly, the local likelihood estimate methodology is illustrated using real spatial forest data.
Rodrigo Gonzalez Laiz Explainable representations in self-supervised contrastive learning Steffen Schneider
Prof. Mackenzie Mathis
Dr. Markus Kalisch
Nov-2022
Abstract: In the last decade, the scientific world has seen an explosion of AI-based tools. Unfortunately, in many cases, it is hard to understand why they make certain decisions, and sometimes they fail unpredictably. However, if we want to make progress in the quest for scientific discovery, we need to understand and trust the tools we use.
We tackle the problem of explaining AI systems. Specifically, we study to what extent identifiable contrastive learning models produce better explanations in the form of attribution maps. First, we provide a careful definition of the ground-truth attribution map. Second, we design a synthetic dataset in a way that the ground-truth attribution map is tractable. Then, we develop a new method inspired by the properties of identifiable contrastive learning which consists on estimating the Jacobian matrix of the inverted feature encoder. We show that our method is able to estimate the ground-truth attribution map with high accuracy and that it outperforms the supervised learning baseline.
Furthermore, we compare different existing attribution methods on a new self-supervised contrastive learning model using a neuroscience dataset evaluated using a decoding task.
Kei Ishikawa Kernel Conditional Moment Constraints for Confounding Robust Inference Prof. Dr. Niao He
Prof. Sara van de Geer
Oct-2022
Abstract: In the policy evaluation of offline contextual bandits, an unconfoundedness assumption is often made. However, observational data often fail to include all the relevant variables, and such an assumption is often violated due to the existence of unobserved confounders. To address this issue, sensitivity analysis methods are often used to estimate the policy value under the worst-case confounding. They find the worst possible confounding by minimizing the policy value over an uncertainty set
containing all possible confounding parameters. Here, in most existing works, some of the constraints in the original uncertainty set are relaxed, in order to obtain a tractable uncertainty set. However, it turns out that the lower bounds obtained by these methods are often too pessimistic because they fail to impose some conditional moment constraints. In this thesis, we propose a new class of estimator that can provide a sharp lower bound of the policy value, by taking into account the conditional moment constraints. In addition to the sharp estimation, our estimator can handle extended classes of problems in sensitivity analysis such as policy learning, sensitivity models characterized by f-divergence, and sensitivity analysis with continuous
action space. Moreover, it can be shown that our estimator contains the recently proposed sharp estimator by Dorn and Guo as a special case. To construct our estimator, we leverage the kernel method to obtain a tractable approximation to
the conditional moment constraints. Additionally, we devise a generalized class of constraints for uncertainty sets called conditional f-divergence constraints, to enable systematic extensions of our estimator to the various classes of problems. In the
theoretical analysis, we provide guarantees on finite sample properties as well as the reduction in the specification error. We also conduct numerical experiments with synthetic and real-world data to demonstrate the effectiveness of the proposed
method.
Yaxi Hu Differential Private Semi-Supervised Learning for Finite Hypothesis Classes Prof. Dr. Fan Yang
Dr. Amartya Sanyal
Oct-2022
Abstract: We study the conditions for private semi-supervised learning to outperform private supervised learning for finite hypothesis classes. We focus on two different types of disjunctions and provide bounds on their sample complexity in a PAC-style framework.

We study the k-literal disjunctions over a d-dimensional binary hypercube. We demonstrate a separation between the sample complexity for non-private supervised learning and private supervised learning, showing that privacy comes at the cost of sample complexity. We show that labelled sample complexity decreases with the number of unlabelled data under the compatibility condition and mild restrictions on the data distribution. Furthermore, we show that the labelled sample complexity for private learning reduces from O(d) to O(1) for a specific family of distributions.

We also extend our results to a more general version of disjunctions as two-sided disjunctions. We present a private framework for semi-supervised learning of two-sided disjunctions and propose two implementations using the Gaussian mechanism and the Sparse Vector Technique. We show an improvement in the sample complexity compared with private supervised learning of two-sided disjunctions. Finally, we describe the conditions for each implementation to be advantageous with a family of distributions.
Tom Forzy Geo-spatial analysis of immunization coverage and its determinants in Ethiopia. Dr. Lukas Meier
Prof. Stéphane Verguet
Oct-2022
Abstract: Ethiopia has achieved substantial progress in health outcomes over the last decades, but remains among the countries with the highest and most unequally distributed number of children without access to the routine immunizations recommended by the World Health Organization. This work studies the spatial distribution of immunization coverage in Ethiopia. We assembled heterogeneous datasets at a granular level, so as to infer the associations of geographic and socioeconomic determinants with immunization coverage. We also fitted parametric and non-parametric geo-spatial prediction models of immunization coverage. These models were assessed, and illustrated the great potential of multi-sectorial approach and geo-spatial modeling in the context of public health data analysis.
Ramon Stieger Modeling the Relationship Between Dose and Cytokine Release Syndrome (CRS) Prof. Dr. Peter B¨uhlmann Sep-2022
Abstract: The occurrence of cytokine release syndrome (CRS) is an unwanted side effect of some new drugs for cancer treatment. Data suggests that patients can be desensitized with a low initial dose. This thesis aims to find a simple model for the relationship of dose and
the occurrence of CRS which allows for desensitization. To assess the performance and compare the models, simulations were conducted using different parameterized Fernandes
and K-PD models for data generation. For fitting, two classes of models were compared: Fernandes and survival model. Using integrated squared error and coverage probability as evaluation criteria, the survival models performed better than the Fernandes model. Due
to its interpretability, the dose dependent coefficient survival model is the final choice and recommended for further use.
Felix Schur Meta-Learning for Sequential Decision Making in Lifelong and Federated Settings Prof. Dr. Peter Bühlmann
Prof. Dr. Andreas Krause
Parnian Kassraie Jonas Rothfuß
Sep-2022
Abstract: We consider the problem of meta-learning the shared hypothesis space of a collection of bandit tasks. We assume the shared hypothesis space is a Reproducing Kernel Hilbert Space (RKHS) with unknown reproducing kernel that we assume to be a sparse linear combination of known base kernels. In case offline data from the collection of bandit tasks
is available, we propose Meta-KeL+, an algorithm based on Group Lasso, that metalearns the shared hypothesis space. We prove that the Meta-KeL+ estimator converges in probability to the true hypothesis space when the number of tasks or the number of samples per task increases. For sequentially solving the collection of bandit tasks, we propose AdaBL. AdaBL is a lifelong learning algorithm based on Meta-KeL+ that
can be paired with any common bandit solver. We prove that the cumulative regret of AdaBL is sub-linear and grows with the same rate as the cumulative regret of the oracle bandit solver. Furthermore, we provide federated versions of our algorithms, F-MetaKeL+ and F-AdaBL, where the meta-learner does not have direct access to data from
individual tasks. We prove that both algorithms satisfy theoretical guarantees similar to the full-information counterparts. We empirically validate our theoretical findings with
experiments on synthetic and real-world data.
Hjördís Lára Baldvinsdóttir Quantile Regression: An Overview with Applications in Climatology Prof. Dr. Nicolai Meinshausen Sep-2022
Abstract: Statistical analysis commonly consists of analyzing relationships between variables using linear regression models. Yet, these models are constrained by strict data assumptions and limited information on the conditional distribution of the response variable. To avoid these limitations, quantile regression models can be employed as they require fewer data assumptions and can estimate the whole conditional distribution of a response, instead of just the conditional mean. Here, an overview of quantile regression models and their benefits will be provided, particularly in comparison to linear regression models. Furthermore, an application of quantile regression models in the field of climatology will be demonstrated to highlight the potential of such models and their advantages in the realm of climate studies.
Qinyi Zeng Reconstruction ofCompact,Convex Sets via Linear Images of Spectrahedra Prof. Dr. Sara van de Geer
Dr. Julia Hörrmann
Sep-2022
Abstract: The problem of reconstructing an unknown shape from a finite number of noisy measurements of its support function has attracted much attention. This thesis begins with investigating one of the most basic algorithms by looking at a convex polytope whose support function best approximates the given measurements in some given directions (in the least squares sense). We study the (strong) consistency result, which says that almost surely, the approximation polytope tends to the true body in the Hausdorff metric when the number of measurements increases to infinity.


However, traditional approaches like this one typically rely on estimators that minimize the error over all possible compact convex sets, which becomes more difficult as the number of measurements grows. Instead, we study a new approach which minimizes the error over structured families of convex sets that are specified as linear images of concisely described sets (the simplex or the spectraplex which is decided based on our prior information) in a higher dimensional space that is not much larger than the ambient space. From a computational perspective, linear functionals can be optimized over such sets efficiently, and a geometric characterization of the asymptotic behavior of such estimators can be provided. The last part of this thesis includes experiments on synthetic and real data which illustrate the differences of performance between the two approaches.
Neil Bajoria Learning Positional and Structural Embeddings in Graph Neural Networks Prof. Dr. Nicolai Meinshausen
Prof. Dr. Karsten Borgwardt
Sep-2022
Abstract: Graphs offer a powerful way to model data containing relations and interactions between entities. For example, molecules, social networks and friendship networks can all be expressed in graphical forms. Graph nodes, however, lack canonical positional information. This is problematic because nodes’ relative positioning can provide vital information for many graph machine learning tasks. Thus, the inability to capture such information severely limits the performance of Graph Neural Networks. Much research has been conducted to express nodes’ positional information through the use of positional encodings which encode positional information about the nodes to obtain more expressive graph representations.

In this thesis, we investigate one such novel architecture that learns positional encodings and uses the learnt positional encodings to update the node representations. We write our own implementation and compare our results to the original paper by analysing performance on graph datasets in both graph regression and classification tasks. Additionally, we introduce an augmented architecture of the original model that can leverage information from multiple positional encodings, yielding better performance. In further experiments, we show how subsets of Random Walk Positional Encodings can be chosen that achieve state of the art performance. Inspired by these results, we attempt to implement an architecture that implicitly learns the most informative positional encodings.
Pengxi Liu Domain Adaptation from Causality Point of View: ICU Mortality Prediction as an Example Prof. Dr. Nicolai Meinshausen Sep-2022
Abstract: The increasing demand for model generalization motivates transfer learning, such that one can extract knowledge from one task and transfer it to another one. Domain adaptation
defines a form of transfer learning where models obtained from source domain are expected to perform well in target domain. We explicate the connection between domain adaptation and causal inference, investigating the methodology of invariant causal prediction such
that one could view domain adaptation problems from the perspective of causal discovery. Based on this, we further review anchor regression which relaxes the invariant causal prediction by considering modified least squares for causal minimax problems to avoid
too conservative causal parameters while encountering domain shift or confounding in causal structure in linear and nonlinear cases. Then, we propose a method to make anchor regression applicable for binary classification problems by introducing a pseudo
probability generated from logistic function and a tuning threshold parameter determining the mechanism of classification. Practically, with ICU data from MIMIC-IV, HiRID and AUMCdb, we find that NA-pattern alone can predict mortality and this prediction can be generalized across datasets through adjusted Random Forests, suggesting the causal relationship between NA-pattern and mortality. Furthermore, we use anchor regression to predict mortality with observed clinical features independent of NA-pattern to some extent.
Yves Hartmann Feature Learning and Evaluation for Railway Wheel Defect Detection Prof. Olga Fink
Prof. Fadoua Balabdaoui
Katharina Rombach
Sep-2022
Abstract: Wheels are a critical component of the railway rolling stock. Efficient maintenance of train wheels contributes to a safe, reliable and cost-effective railway operation. Rising investment in Condition Monitoring (CM) devices opens up new possibilities for data-driven
Prognostics and Health Management (PHM) applications (e.g. fault detection).
Although these data-driven solutions show great potential, data-related challenges arise. First, wheel defects are rare in operational trains. Second, sensor data from Wayside Monitoring Systems (WSM) shows a lot of variation, but only some of that variation is
related to the health condition of the wheel. Other sources of variation are operational or environmental conditions like train speed, precipitation, train load etc. For effective fault detection, a model has to be able to distinguish between irrelevant and relevant
variation of the data. One possibility to achieve this is feature learning. We define two main objectives for the learned features: (1) invariance to operational conditions and (2)
sensitivity to faults. Such a feature representation ensures robustness to changing operational conditions, as well as sensitivity for the downstream fault detection. We compare different learning paradigms of self-supervised feature learning. First, we propose the use of VICReg with data augmentation to learn invariance to non-informative variation.
Second, we use contrastive learning (triplet loss) to learn the natural degradation of the wheels. As a comparison method, we use an auto-encoder. The learned feature spaces are then assessed regarding the objectives (1) and (2). Therefore, three quantitative evaluation metrics are proposed and adapted to this specific task: first, Total Correlation (TC)
from information theory, second,
Fréchet Inception Distance (FID), a metric of probability distributions and third, a method based on the cross-correlation matrix. Based on the learned feature representation, we propose fault classifiers which can be used for real-world application. We show that VICReg based feature representations achieve very,high invariance to changing operational conditions. Further, we show that feature representations from contrastive learning are highly sensitive to faults, but lack invariance to changing operational conditions.
ii
Marco Känzig Unsupervised anomaly detection for condition monitoring of technical equipment Dr. Markus Kalisch Sep-2022
Abstract: Manufacturing is one of the major areas where sophisticated machine learning models are deployed to improve process efficiency, product quality or predict mechanical failures. This thesis focuses on the field of condition monitoring of technical equipment where complex
statistical models help to determine the optimal point for repairs. High dimensional and unstructured sensor data is used to model a health score which indicates the condition of a specific machine regarding degradation. Together with the industrial partner Geberit
Produktions AG sensor measurements from automated multi-station assembly stations are collected to develop, test and compare different condition-monitoring models. In addition, an extensive statistical exploration of the sensor data is presented in this thesis to provide a better understanding of the characteristics of the assembly stations and guide the model selection. Linear and nonlinear dimensionality reduction models, various anomaly detection
algorithms and signal processing techniques are explored, implemented and compared. Since the problem requires unsupervised learning with no data of fully degraded machines available, special model selection metrics are implemented and discussed. The results
are accompanied by a simulation study to explore the behavior of the different models under distribution changes in the raw sensor data. It is found that deep-learning based approaches such as a degradation-trend controlled variational autoencoder do not always lead to a significant improvement over other, more basic models. However, a variational
autoencoder as well as Kernel-PCA with an elliptic envelope and classic PCA combined with an isolation forest provide useful health indicators to monitor the conditions of the assembly stations. Finally, a full end-to-end data and machine learning pipeline is implemented and deployed which allows Geberit Produktions AG to further enhance and monitor the model performances.
Lukas Graz Interpolation and Correction of Multispectral Satellite Image Time Series Prof. Dr. Nicolai Meinshausen
Gregor Perich
Sep-2022
Abstract: Multispectral satellite imagery is used to model vegetation characteristics and development on a large scale in agriculture. As an example, satellite-derived Time Series (TS) of spectral indices like the Normalized Difference Vegetation Index (NDVI) are used to classify crops and to predict crop yield. Sometimes satellite measurements do not match the ground signal due to contamination by clouds and other atmospheric effects. Therefore,
traditional approaches aim to filter out contaminated observations before extracting and subsequently interpolating the NDVI. After filtering, remaining contaminated observations
and resulting data gaps are the two challenges for interpolation that we address in this thesis. For this purpose, cereal crop yield maps from 2017-2021 of a farm in Switzerland
with the corresponding Sentinel 2 satellite image TS published by the European Space Agency were examined. Contaminated observations were filtered with the provided Scene Classification Layer (SCL). We give a benchmark-supported review of different interpolation methods. Based on it, we found Smoothing Splines as a flexible non-parametric method and Double Logistic approximation as a parametric method with implicit shape assumptions to perform most favorably given the aforementioned challenges. In addition,
we generalize an iterative technique which robustifies interpolation methods against outliers by reducing their weights. In most cases, this robustification successfully decreased
the 50% and 75% quantiles of the absolute out-of-bag residuals. Moreover, we present a general interpolation procedure that utilizes additional information to correct the target
variable with an uncertainty estimate and then performs a weighted interpolation. In our setting, the target variable is the NDVI and as additional information we use the SCL, the observed NDVI and the spectral bands. Consequently, we no longer filter using the SCL, but weight observations according to their reliability. Applying this procedure, the unexplained
variance in crop yield estimations via the resulting NDVI TS decreased by 10.5%. Considering the success of the presented procedure with respect to NDVI TS, it appears promising for applications to other satellite-based TS given its cloud-correcting properties.
Andri Schnider Generation of Synthetic Financial Time Series: Stabilizing the Quant GAN and Comparison Prof. Dr. Martin Mächler Sep-2022
Abstract: The Quant GAN established by Wiese et al. (Quantitative Finance, 20(9), 1419--1440 (2020)) is a generative adversarial network architecture that uses temporal convolutional networks to generate synthetic time series that exhibit features which are characteristic for real financial data. In the literature, these are usually referred to as stylized facts. This thesis proposes the Quant Wasserstein GAN, which is an extension of the Quant GAN for which the objective function is replaced in order to improve the stability of the network. Specifically, it builds on the idea of the original network and introduces a loss that is associated with the $1$-Wasserstein distance with the goal of making the training process more stable. This model is then applied to real data and its performance in being able to generate artificial financial time series that exhibit similar properties as real ones is compared to the one of methods from classical statistics, namely ARIMA, GARCH, mixture autoregressive models and the maximum entropy bootstrap. The findings suggest that while the Quant Wasserstein GAN struggles to generate time series whose squares exhibit significant non-zero autocorrelation, it seems to be more successful at creating data whose distribution's tail heaviness is more similar to the actual return distribution. Additionally, the results indicate that the network's performance is better when the assets under consideration do not exhibit exorbitant volatility. While the maximum entropy bootstrap is the method that is by far the most successful at creating time series that have the desired features, it lacks diversity in its output as it is very strongly correlated to the original series, which makes it unsuitable for some tasks in practice.
Philipp Niggli Causal Inference Methods for Analyzing Subgroup Treatment Effects with Application in Digital Health Interventions Dr. Lukas Meier
Prof. Dr. Florian von Wangenheim
Joel Persson
Sep-2022
Abstract: We evaluate methods for estimating causal treatment effects and apply them to real-world data on digital health interventions aimed at reducing alcohol or smoking addiction among adolescents. The methods we evaluate are regression adjustment, inverse probability weighting (IPW), and augmented inverse probability weighting (AIPW) estimators. We evaluate the performance of the estimators in a simulation study. We find that AIPW estimator outperforms regression adjustment and IPW in terms of bias and variance, especially when the estimated models are misspecified. In the applications, we study the extent to which personalized chatbot messaging interventions reduce drinking and smoking among adolescents, and for which subgroups the interventions are most effective. The results indicate that the interventions are more effective for men, people below the age of 18, and people who have completed at least upper secondary school. This suggests that future digital health interventions may be targeted at those groups.
Alexander Timans Uncertainty Quantification for Image-based Tra c Prediction Dr. Lukas Meier
Prof. Dr. Martin Raubal
Sep-2022
Abstract: Despite the strong predictive performance of deep learning models for traffic prediction, their widespread deployment in real-world intelligent transportation systems has been
restricted by a lack of interpretability and perceived trustworthiness. Uncertainty quantification (UQ) methods provide an approach to induce probabilistic reasoning, improve
decision-making and enhance model deployment potential. This study investigates the application of different UQ methods for short-term traffic prediction on the image-based
Traffic4cast dataset. We compare two epistemic and two aleatoric UQ methods on both temporal and spatio-temporal transfer tasks, and find that meaningful uncertainty estimates can be recovered. Methods are compared in terms of uncertainty calibration and sharpness, and our experiments suggest that modelling both epistemic and aleatoric uncertainty jointly produces the most accurate uncertainty estimates. Obtained uncertainty estimates are spatially related to the city-wide road network, and subsequently employed for unsupervised outlier detection on changes in city traffic dynamics. We find that our approach can capture both temporal and spatial effects on traffic behaviour, and that their interaction is complex. Our work presents a further step towards boosting uncertainty
awareness in traffic prediction tasks, and aims to showcase the potential value contribution of UQ methods to the development of intelligent transportation systems, and to a
better understanding of city traffic dynamics.
Maximilian Baum Spectral Deconfounding for High-Dimensional Additive Models Prof. Dr. Peter Bühlmann Sep-2022
Abstract: As high-dimensional data becomes increasingly common, research into techniques to address common statistical problems such as confounding in the high-dimensional setting become increasingly relevant. A promising recent development to address the dense confounding problem in the case of high-dimensional linear models comes in the form of spectral deconfounding Ćevid et al. (2020). In this thesis we propose an estimator to generalize this technique to the more complex case of non-linear high-dimensional models. We provide an implementation algorithm and demonstrate the effectiveness of our proposal for a number of different research questions using simulated data. As a secondary research question, we also explore the practical problem of tuning parameter selection and find that two-layer cross-validation can be used to attain near-optimal performance.
Adnana Maria Tudose Out-of-Distribution Generalisation: Using Intensive Care Unit Data to Predict Prof. Dr. Nicolai Meinshausen
Drago Plečko
Sep-2022
Abstract: A common assumption in classical statistical and machine learning methods is that the training and testing data are drawn from the same underlying distribution. When this assumption is violated, models that obtain a good performance on the training data can perform poorly on the testing data.

This thesis addresses an out-of-distribution (OOD) generalisation problem in the medical domain. Specifically, using a time-series of observations about lab tests and vitals recorded in the Intensive Care Unit (ICU), our task is to predict patient mortality on other ICU datasets than the one used to train the model. We assume the distribution shift present in the data arises because of how patient variables are monitored is not random; if the physicians think that the patient is at risk, they tend to monitor more. However, this process differs from one hospital to another.

We develop new techniques for OOD generalisation and build on existing ones to address the domain shift problem. The experiments proposed consist of (i) using a domain adaptation approach to introduce importance weighting such that the source dataset observations contribute more or less to learning the relationship between the patient characteristics and the outcome, (ii) proposing a weighting technique that involves separating the effect of the patient health status and that of the hospital on the measurement pattern, (iii) assuming that the influence of the hospital can be represented through a measure of tendency in the missingness pattern which is subsequently eliminated, (iv) building on an existing approach that proposes to use Inverse Propensity Weighting to eliminate the spurious association between the measurement pattern and the patient outcome, (v) separating patient cohorts according to their health risk when eliminating the effect of the hospital policy.

Not every experiment consistently achieves statistically significant gains in OOD performance with respect to our baselines. Specifically, the first three experiments do not provide considerable generalisation results. However, the performance of the fourth and fifth experiments are more promising. The fourth experiment provides, in the best cases, gains in the mean AUROC ranging from 1% to 2.5% versus the results we obtain if we evaluate the model without OOD intervention. At the same time, when our model underperforms, the negative change from the baseline is considerable, which implies that future strategies to mitigate for the losses are required. Similarly, the fifth experiment provides gains of up to 1.5%, while minimising the number of cases when it underperforms.
Anna Reichart Guidelines for using covariate adjustment when analyzing randomized controlled trials Lukas Meier Sep-2022
Abstract: This thesis discusses the impact of covariate adjustment on linear, logistic and Cox regression on simulated data and formulates guidelines for its use in randomized controlled trials (RCTs). The unadjusted and covariate adjusted analyses are compared with regards to bias, precision and power of the coefficient of interest. In linear regression, the coefficient of interest ist the average treatment effect (ATE), while in logistic and Cox regression, it is the coefficient of the factor variable of the treatment effect, $\alpha_0$.

For linear regression the ATE is unbiased, regardless of whether or not the analysis includes predictive or non-predictive covariates. Inclusion of predictive covariates increases precision and power. The effect is more pronounced for higher values of the regression coefficients $\bm{\beta}$. In contrast, power and precision decrease if we include non-predictive covariates. Precision decreases linearly with subjects per variables (SPV) while power stays almost constant down to 1.5 SPV. Covariate imbalance can lead to bias of the ATE. We can also use the difference between the response and its baseline as the new response variable, but then we should include the baseline variable as a covariate in the analysis. An alternative to covariate adjustment is the inverse probability of treatment weighting (IPTW), but the IPTW leads, contrary to what is found in literature, to less powerful tests than regular covariate adjustment.

The model assumptions for the adjusted analysis are: error terms are normally distributed, have homoscedastic variance and are independent, and the covariate and response have a linear relationship. Analyzing the data as if there were no violation leads to an unbiased ATE. The adjusted analysis leads to higher precision and power than the unadjusted.
If the treatment affects the covariate, the adjusted analysis leads to a biased estimate of the ATE. If there is confounding, the unadjusted analysis will give us a biased estimate of the ATE.

Binary response variables are routinely analyzed using logistic regression. Not including predictive covariates yields a log-odds ratio (LnOR) $=\alpha_0$ that is biased towards zero. Power and standard deviation of the adjusted analyis are higher than of the unadjusted. The standardized precision, however, will be higher for the adjusted analysis. Including non-predictive covariates will lead to a LnOR whose value is overestimated, and to decreasing precision but power stays constant.

If the response is a time-to-event variable, Cox regression is a popular model to analyze the data. Similar to logistic regression, not including predictive covariates will lead to a bias of $\alpha_0$ towards zero and a loss of power and precision. Including non-predictive covariates will lead to an overestimation of $\alpha_0$ if the true survival time follows an exponential or Gompertz distribution. If the true survival time follows a Weibull distribution, the coefficient is unbiased.

For all three models, the covariates have to be prespecified in the trial protocol to avoid bias regarding the selection process.

In general it is important to include prognostic variables to gain power and obtain unbiased estimates of the treatment effect. Further we should include a small number of covariates that we suspect to be predictive, as the potential gain in power and precision is much higher than the potential loss if they are not. In addition, we protect us against covariate imbalances and we get conditionally unbiased estimates of the treatment effect.
Dejan Pendic The Role of Imputation in Empirical Asset Pricing via Machine Learning Prof. Dr. Markus Leippold
Dr. Markus Kalisch
Simon Hediger Meta Lina Spohn
Aug-2022
Abstract: Machine learning models have been used widely for return prediction in asset pricing lately. However, it is common practice to impute missing values with a constant like the median. This thesis investigates the impact of different imputation methods on the predictive performance of regularized linear models on financial data. Investors may miss on returns by imputing missing values with ad hoc methods such as the median, as it might fail to perceive the joint distribution of the imputed variables. We first apply different single and multiple imputation methods to replace the missing values in our data. Second, we fit the elastic-net and ridge regression on the imputed data to obtain return predictions. Finally, we compare our performance to return predictions based on median imputation and find indications for improved predictive performance after imputing with the method MissForest. Moreover, the Sharpe ratio of portfolios created on MissForest imputed data increases significantly compared to those on median imputed data. Thus, our results suggest that investigation of the imputation process of financial data is potentially beneficial to prevent investors to miss out on performance.
Francesco Brivio Nonparametric Maximum Likelihood Estimator of multivariate mixtures models: theory and algorithms Prof. Dr. Fadoua Balabdaoui Aug-2022
Abstract: Mixture models are extremely useful since they are able to model quite
complex distributions, which cannot be modelled faithfully by a single parametric family. This thesis studies the Nonparametric Maximum Likelihood Estimator (NPMLE) of a mixture of multivariate distributions, showing the existence of such an estimator and some important properties, as consistency, for instance. Some remarkable features of the nonparametric MLE are here extended to the multivariate version
of it. For example, under some very mild assumptions on the kernel, the MLE has to be discrete with at most n jump points. We studied and presented three algorithms to find a mathematically consistent method for estimating the multivariate mixing distribution without any assumption about the shape
of the distribution. The first algorithm was the Nonparametric Adaptive Grid (NPAG) algorithm. This method is able to handle high-dimensional and complex multivariate mixture models. It was developed for the problem of population pharmacokinetics, but we implemented it in a way that is suitable for more generic data sets. The second algorithm
we analyzed was a different way to apply the Expectation-Maximization (EM) algorithm. We worked with one example of Unsupervised learning where minimum message length (MML) techniques were applied to finite mixture models. Then, with a changing of approach from UNSUP,
we applied the EM to increased values of the number of componetns until one of them hits the same value of the like-lihood, which stays constant at some point. We finally proposed some simulations to compare how these three algorithms work.
Jonas Roth Reinforcement Learning for Minimum Variance Portfolios Prof. Dr. Markus Leippold
Dr. Markus Kalisch
Dr. Gianluca De Nard
Aug-2022
Abstract: Many portfolio managers need new ideas to create long-term portfolios which are also robust during
volatile times. Especially in current times, a portfolio with minimal variance helps investors keep
their head when the markets go crazy.
While supervised learning has already been applied in various setups in finance, direct optimisation
through reinforcement learning and letting an agent directly interact with the market is a relatively
new approach for financial applications. I analyse of a deep reinforcement learning model to create a
minimum variance portfolio. The central part of this thesis is to test the performance in terms of the
standard deviation of a global minimum variance portfolio created by covariance shrinkage methods
which yield state-of-the-art performance, against the portfolio generated by a deep reinforcement
learning algorithm.
Emil Majdandzic Quantifying the Effects of Congruence Classes on Phylodynamic Epidemiology Dr. Lukas Meier
Dr. Timothy Vaughan
Aug-2022
Abstract: Over the past few years the identifiability issues caused by "congruence classes" in inferences using birth-death-sampling models have become a known problem. However, very little is known about the exact magnitude of this phenomenon. This thesis examines the influence of congruence classes on phylodynamic inference in epidemiology.
Using a more analytical approach as well as a somewhat classical simulation study, this project shows how careful researchers have to be when using birth-death-sampling models. In theory, congruence classes can be present in any analysis and confound every inference in some way or another. In practice, however, the picture shifts and the strong interaction between the confounding factors and the analysis weakens. We conclude that the probability of encountering congruence classes in practice is vanishingly small.
Aaron Renggli Generalized Linear Mixed Models: A theoretical overview and testing the glmmTMB package Dr. Markus Kalisch Aug-2022
Abstract: In fields such as econometrics or ecology, observational data often comes in the form of counts. This means that the data can not be modelled with a continuous distribution. Generalized linear models (GLMs) are widely used, powerful and allow for flexible extensions to model discrete data.
The R package glmmTMB (Brooks, Kristensen, van Benthem, Magnusson, Berg, Nielsen, Skaug, Mächler, and Bolker, 2017) is designed to fit GLMs and extensions of GLMs, like
generalized linear mixed models and zero-altered models. Having one package to fit all different types of models makes it easier to compare models and perform model selection and inference, since all the metrics to do so are consistent across different models (Brooks et al., 2017).
This work aims at giving an overview of GLM theory and some extensions, namely overdispersion, mixed models, and zero-inflated models. Furthermore, the goal is to test how well glmmTMB actually performs in fitting different types of models for count data. Moreover, the last chapter contains a tutorial on how to fit GLMs and their extensions, and how to select an appropriate model, using the glmmTMB package in practice.
iii
Michael Schwab Overdispersion in (G)LM(M)s: Theory and Practical Insights Dr. Markus Kalisch Jul-2022
Abstract: Generalized linear mixed models are a combination of the model classes of generalized linear models and linear mixed models. This combination allows to fit discrete data as well as random effects. The Poisson regression is an example of a generalized linear model. The underlying Poisson distribution has the characteristic of equidispersion, such that the mean is equal to the variance. This makes the Poisson distribution prone to the risk of overdispersion, where the variance of the data exceeds the theoretically implied variance by the underlying distribution. The aim of this thesis is to build a solid theoretical foundation of the model classes involved in the construction of a generalized linear mixed model and to investigate how overdispersion with respect to the Poisson distribution affects the model output in the Poisson regression and its generalized linear mixed model extension. To do so, we run simulations to investigate the coverage of the confidence intervals corresponding to the model parameters. We observed that overdispersion as well as small sample sizes can lead to a non-coverage of the confidence intervals. When using models such as the Negative Binomial regression and its generalized linear mixed model extension, that take into account the overdispersion with respect to the Poisson setup, we can achieve the coverage of the model parameter confidence intervals. The thesis provides a supplementary tutorial that guides the usage of generalized linear mixed models and the involved model classes in practice.
Jsmea Hug Computational modelling of ovarian follicle 3D growth dynamics for personalized health applications Dr. Markus Kalisch
Prof. Dr. Dagmar Iber
Leopold Franz
Jul-2022
Abstract: Personalization of ovarian stimulation protocols is difficult to date due
to incomplete knowledge on the mechanisms steering follicle dynamics
and inter-patient differences. This work uses a computational modelling
approach to shed light on both aspects. We present an ordinary differential
equation (ODE) model for personalized modelling of ovarian follicle growth during the follicular phase of the menstrual cycle (up to ovulation). Novelties of the model are that (1) clinical hormone and follicle measurements can be inserted directly into the model, and (2) the adjustment of four parameters is sufficient for personalization. We show that acquired sensitivity to luteinizing hormone (LH) of the dominant follicle during the late follicular phase can ensure dominance over its competitors and potentially plays a role in follicle selection. Follicles are modelled as individual entities in terms of diameters, allowing
direct comparison to clinical ultrasound data. Further, we introduce a new data set consisting of blood hormone measurements and ovarian ultrasound images of 51 healthy women with regular natural cycles, measured every second day over two non-consecutive cycles. We present an equation describing follicle-shape transitions in terms of the volume-surface ratio. Additionally, we apply multivariate time series analysis to detect time-delayed effects between hormones and follicle growth and
demonstrate that the method is unsuitable for this application.
Olga Kolotuhina Learning of interventional Markov equivalent classes under unknown interventions Prof. Dr. Peter B¨uhlmann
Juan L. Gamella
Jul-2022
Abstract: In this thesis, we propose a greedy approach using the GIES algorithm
to determine the interventional targets and the underlying interventional
equivalence class. We assume the data to follow a family of Gaussian
distributions which arise from an unknown family of do-interventions, i.e.
we are given only the data without any information on the intervention
targets. We define a new I-Markov equivalence superclass, that we call Oequivalence, derive some interesting properties for this class and connect
it to the greedy approach and the question of identifiability.
Taru Singhal Statistical methods for wastewater-based epidemiology of SARS-CoV-2 Dr. Markus Kalisch
Prof. Dr. Tanja Stadler
Dr. Jana Huisman
May-2022
Abstract: Since the emergence of SARS-CoV-2, great focus has been placed on tracking the spread of the virus and its transmissibility. The effective reproductive number, Re, is a key epidemiological indicator that quantifies how many new infections are caused on average by an infected individual. It aids in informing public health decisions, the current status of the epidemic and evaluating the effectiveness of interventions. There have been many efforts to estimate the Re using case report data, and recent efforts have leveraged the viral shedding in wastewater as an independent and unbiased source to track the epidemic. As new variants emerge with differing infectivity and severity levels, tracking individual variants' transmissibility and spread in a population is imperative to support control strategies and general preparedness.
In this project, we first explored inherent differences between clinical and wastewater data pertinent to the Re estimation and inference. We assessed the validity of assumptions made to estimate Re for wastewater data and evaluated effects of changes in wastewater sampling methodology. We also applied existing wastewater Re methodology to different sources of variant data to obtain variant-specific Re estimates from wastewater. Some of these wastewater variant data have their own estimates of uncertainty. To account for these uncertainty estimates when available, we modified the construction of the Re confidence intervals. Finally, we performed simulations to better understand mechanics of estimating the Re using wastewater data and look into disease and associated shedding conditions where wastewater-based epidemiology may be more suited than others.
Olanrewaju Labulo Comparison of different statistical methods to develop a predictive model for small aneurysm patients: Swiss SOS cohort study Dr. Markus Kalisch May-2022
Abstract: This study compares the predictive performance of the machine learning models with conventional variations of Logistic Regression, on a dataset containing various outcomes of patients with small brain aneurysms
Raphaël Andrew Fua Algorithms and methodology for fast and memory efficient online change point detection Prof. Peter Bühlmann
Dr. Solt Kovács
Apr-2022
Abstract: Online change point detection aims at monitoring data streams and raising alarm as soon as possible after an abrupt change of the underlying stochastic process, while giving some guarantees for false discoveries. Ever larger data sets in general require both computationally and memory efficient algorithms. This is particularly true for online change point problems where thousands of new observations might arrive every second, and it is critical that the algorithms can process these data fast enough.

This thesis focuses on the univariate Gaussian setup with unknown changing mean values and independent observations. We introduce new algorithmic approaches achieving logarithmic memory and time complexity throughout each iteration. Our new approach has some distinct advantages in terms of general applicability. Contrary to similarly fast and/or memory efficient competitors, our algorithm is very simple and also easy to generalize for several other distributions beyond the Gaussian case, as well as multivariate data streams. In simulations we find that our algorithm has competitive statistical performance compared to state-of-the-art algorithms.
Maria Ísbjörg Füllemann Maximum likelihood estimation of the log-concave component in a semi-parametric mixture with a standard normal density Prof. Dr. Fadoua Balabdaoui Apr-2022
Abstract: The two-component mixture model with known background density, unknown signal density, and unknown mixing proportion has been studied in many contexts. One such context is multiple testing, where the mixing density describes the distribution of p-values, with the background and signal densities describing the distribution of the p-values under the null and alternative hypotheses, respectively. In this setting, the mixing proportion describes
the proportion of p-values coming from the null, i.e. the proportion of true null hypotheses. The goal is to estimate the mixing proportion and the signal density. In general, this model is not identifiable, this means that the representation of the model as a two-component mixture is not necessarily unique. Thus, we look at the identifiability of the model under the assumption that the background density is standard normal and the signal belongs to a family of log-concave densities. We look at the maximum likelihood estimator of the signal density and propose an EM algorithm to compute it.
Rawda Bouricha Copy number calling from single-cell RNA sequencing data Prof. P. Bühlmann Apr-2022
Abstract: Single-Cell RNA-Seq provides transcriptional profiling of thousands of individual cells. It provides useful information at the single-cell level
of how they differ across thousands of cells within a sample. In recent years, several methods have been developed for the detection of copy numbers through scRNA-seq data, instead of scDNA data. One of them is considering the smoothing as preprocessing for breakpoint detection. This research will focus on working with SCICoNE, a single-cell
copy number calling and event history reconstruction statistical model.
Several smoothing such as rolling mean smoothing and sparse fused lasso have been tested to ameliorate the detection of mutations from the transcriptome regarding their specific characteristic of overdispersion. Another aim of this study is to investigate the runtime. Many clustering methods have been developed lately to improve the computational efficiency of working with this huge dataset. We want to check if we can improve both the copy number detection and the runtime.
Tobias Winguth Parametric and Nonparametric Models for the Analysis of Longitudinal Data using R Dr. Lukas Meier Mar-2022
Abstract: The analysis of longitudinal data poses some challenges, mostly due to its special grouping structure. Measurements that are made repeatedly over time on the same subject are generally correlated. Additionally, data from longitudinal studies is often unbalanced, with irregular observation times and missing data. For correct estimation and inference these special features need to be taken into account. Hence, methods that were originally developed for independent data need to be adjusted. The focus is set exclusively on continuous outcome variables which are assumed to follow a normal distribution. In addition to linear fixed- and mixed-effects models, we consider nonparametric methods that allow for a more flexible modeling of the mean response. As an alternative approach, the broken stick model can be used to transform unbalanced data into a balanced set of repeated measures. Traditional methods like repeated measures ANOVA or summary measure analysis which often require balanced data can then be applied. Complementary examples are given and implemented in R.
Elias Rapuano A Review on the K-means Diffusion Map Clustering Algorithm Prof. Dr. Sara van da Geer
Dr. Matthias Löffler
Mar-2022
Abstract: Clustering high dimensional data can be complicated for clusters that can not be separated in a linear manner. We review the literature on an algorithm that combines diffusion maps and $k$-means. This produces a non-linear clustering algorithm for this setting. diffusion maps are based on a random walk on the similarity graph of the data. This random walk is then used to construct a distance measure which captures the local geometry of the data. After the diffusion map separates the clusters it is possible to use linear approaches.
The $k$-means algorithm then minimizes within-cluster variance. This retrieves the exact clusters under the right conditions. We introduce the diffusion maps and $k$-means algorithm and its semidefinite relaxation. We establish the connection between diffusion maps and the Laplace-Beltrami operator.
We show exact recovery of the underlying clusters under certain conditions even using the relaxation. We then compare the performance of the algorithm on various examples as well as give practical implementation details. This masters thesis is based on the paper \cite{chen2021diffusion}
Stephan Skander A Forward Selection Algorithm for the Monotone Single Index Model Prof. Dr. Fadoua Balabdaoui Mar-2022
Abstract: We propose a new estimation method for the Monotone Single Index Model which com-
bines variable selection, specifically forward selection, as well as estimation of both the link function and the coefficient vector in a sparse setting. We compare it to a classical estimation procedure and another forward selection algorithm by Shikai Luo and Subhashis
Ghosal, whilst recalling important definitions and results on convergence.

Emanuel Zwyssig PERMANOVA Models: Advantages and Drawbacks in Practical Applications Dr. Lukas Meier Mar-2022
Abstract: Permutational Multivariate Analysis of Variance (PERMANOVA) is a statistical model originally developed to test differences of means between groups for ecological count data. It can be used in situations where the data does not follow a multivariate normal distribution and the test involves an arbitrary dissimilarity measure. A description of the mathematical background as well as an intuitive explanation of the test with arbitrary dissimilarity measures as an implicit transformation of the data into Euclidean space is offered. In a series of simulations, PERMANOVA is compared to its parametric counterpart MANOVA, and advantages and drawbacks of both models are established. Further simulations investigate the runtime and identify the most powerful strategies to conduct tests of crossed and nested factors in the framework of the \verb|R| \verb|vegan| package. These strategies are then applied to test hypotheses concerning a data set of microbial compositions in a bioreactor. Finally, a summary of implications for the practitioner is offered.
Amine Chaker Random Matrix Theory for Machine Learning: Kernel Methods as an Example Prof. Sara van de Geer Feb-2022
Abstract: Machine Learning (ML) applications nowadays are more and more reliant on using high dimensional data as training input. However, standard ML algorithms were initially conceived (and are guaranteed to work) for low dimensional data. Hence, their behaviour, when trained on large data, remains largely misunderstood and they can even produce counter-intuitive, if not erratic, results. Thus, we need a thorough investigation of fundamental ML models when the dimension of the data is large (i.e. comparable to the sample size).
Random Matrix Theory (RMT) offers an original and fruitful angle to tackle this problem. Indeed, numerous ML algorithms use the spectral properties of matrices constructed from the training data in order to derive insights. By harnessing the powerful machinery offered by RMT, we can study the asymptotic spectral behaviour of those matrices to be able to understand how ML algorithms behave in a large sample large dimensional setting.

In this thesis, we first offer a self-contained introduction to RMT. We introduce prerequisites and elementary tools, before proving fundamental results in RMT dealing with the limiting spectral densities of some random matrix models. Next, we build on those fundamental results to present some of the recent ML-oriented state-of-the-art RMT results. In particular, we study the spectral behaviour of kernel matrices that are at the core of the ML models known as kernel methods. Understanding the (sometimes surprising) asymptotic behaviour of these matrices for data arising from various statistical models enables us to understand the performance of kernel methods on high dimensional data.
A special attention will be given to kernel spectral clustering, and we will show how our seemingly theoretical results can readily be used in real-world applications by making kernel spectral clustering tailored to the high dimensional setting.
Pit Rieger Measurement Invariance in Confirmatory Factor Analysis: Methods for Detecting Non-invariant Items Dr. Markus Kalisch
Prof. Dr. Marco Steenbergen
Feb-2022
Abstract: Violations of measurement invariance (MI) of a given confirmatory factor analysis (CFA) model can arise as a result of non-invariant items and pose a significant threat to the validity of latent variable comparisons across subgroups of a study population. While methods for detecting such items under partial MI exist, there hasn't been a systematic study to compare their performance. This thesis makes three contributions. First, two versions of a novel detection approach are introduced. The advantage of the novel approach is that it is arguably much easier to interpret than existing methods. Instead of relying on likelihood inference, it builds on residuals and only requires a basic understanding of linear regression, thus being much more accessible to a broad audience of applied researchers. Second, the performance of these two methods and four existing methods is assessed in a simulation study. This enables a comparison of detection methods and offers guidance for choosing a method in applied research. Finally, the detection methods are applied to different CFA models for measuring populist attitudes using survey data, demonstrating that they can easily be generalized to fairly complex measurement models.
In terms of performance, the findings indicate that the existing methods have difficulty detecting non-invariant items reliably. Of the four existing detection methods, only one can be recommended conditionally. At the same time, one version of the novel approach performs very well across all settings and can thus be recommended more generally. For the exemplary application, the results corroborate findings of significant issues with respect to cross-cultural validity at the model level, but also provide a starting point for model improvement to be taken up by further research.

2021

Student Title Advisor(s) Date
Johannes Bernstorff Estimating Individualized Dose Rules via OutcomeWeighted Learning Dr. Christina Heinze-Deml
Prof. Dr. Stefan Feuerriegel
Dec-2021
Abstract: Individualizing drug doses to patients promises to improve effectiveness in medicine with better health outcomes. Patients’ response to medical treatment is determined by patient-level characteristics as well as prescribed drug dose levels. Optimal Individualized Dose Rules (IDR) map patients based on their individual-level characteristics to their optimal dose level, i.e., the drug dose withthe best clinical outcome for that patient. Estimation thereof is a challenging statistical problem and receives increasing attention. An important line of work, Outcome-Weighted-Learning (OWL), establishes the direct estimation of treatment rules and extends to continuous treatments, i.e., dose levels, with a local approximation of the OWL objective for discrete treatments. We propose to estimate optimal IDRs via the OWL objective with neural networks. Furthermore, we are interested in OWL’s potential in the continuous treatment setting when faced with selection bias in observational data. To account for selection bias, we adapt inverse conditional density weighting to the training of neural networks and propose an alternative deconfounding strategy based on balanced representations of the covariates in a latent space. We evaluate our proposed methods on comprehensive simulation studies and compare the performance of our proposed methods to existing baseline methods.
Damian Durrer The Dialects of Econospeak - Exploring the Difference between Orthodox and Heterodox Language in Economics with Machine Learning Dr. Markus Kalisch
Prof. Elliott Ash
Dr. Malka Guillot
Dec-2021
Abstract: This thesis explores the differences in the language used in heterodox and orthodox economics journals with machine learning. We train and evaluate various document classification models with a set of articles which were published in labeled journals between 1990 and 2020. The differences between the heterodox and orthodox dialect of economics are then explored by analyzing the feature weights in the models. We apply the models to the articles published in the top 5 economics journals during the same time, evaluate the proportion of heterodox articles by journal, and identify the most heterodox articles and authors. Various models from different categories of classification systems are fitted and compared: A benchmark model using a logistic regression classifier on a tf-idf weighted feature matrix, and two competitors using word2vec embeddings, BiLSTMs and DistilBert transformers. The benchmark model outperforms both competitors on the classification task. We achieve a reasonable performance in discriminating articles published in heterodox economics journals from articles published in orthodox economics journals using automated text classification systems. Our analysis indicates, that in the top 5 economics journals the use of heterodox language has been declining over the past three decades.
Clemens Blab A stochastic framework for technical change with environmental impact Dr. Markus Kalisch
Prof. Dr. Lucas Bretschger
Dec-2021
Abstract: Technical change is regarded as crucial remedy to mitigate the environmental risk of our economic activities while allowing us to make further economic progress. In this thesis, a two intermediate sector economy is studied in which polluting machines of a dirty and a clean sector are used to produce final output. The pollution causes stochastic environmental shocks that damage the intermediate sectors of the economy. Research spent on a sector expands the variety of machines that are used to produce the intermediate goods. Through technological progress induced by research the intermediate output of the two sectors can be increased. A benevolent social planner optimises the production, then allocates consumption to the household in the economy and research expenditure to the two sectors. The social planner's stochastic optimal control problem shows that in this economy consumption is significantly influenced by the research allocation. Moreover, an implication of the consumption growth path in the social planner equilibrium is that the research expenditure is allocated to just one sector under certain specifications of the model parameters.
Edward Alexander Günther Style Adaptive Semantic Image Editing with Transformers Dr. Lukas Meier
Prof. Dr. Luc van Gool
Dr. Yuhua Chen
Oct-2021
Abstract: The semantic image editing aims to edit the images according to the provided semantic labels. Existing works always assume there is a deterministic one-to-one mapping between the synthesized
image and the original image, given the semantic label. The assumption renders the existing approaches inflexible and ineffective when more requirements come into place, e.g., customize the
color of the edited objects. In order to address the limitations, we study the novel style adaptive semantic image editing (SASIE) problem, where the semantic labels of the edited areas and
the additional style reference images are given to synthesize and stylize the corresponding pixels adaptively. We propose a new Transformer-based framework for the SASIE problem, in which the intra-/inter-image multi-head self-attention blocks are developed for intra-/inter-knowledge transfer. The content of the edited areas is synthesized according to the given semantic label, while the style of the edited areas is inherited from the reference image. Extensive experiments on different datasets prove the effectiveness of our proposed framework for semantically editing the images and
stylizing the edited areas adaptively.
Zipei Geng Nonparametric Variable Selection under Latent Confounding Prof. Dr. Peter Bühlmann
Dr. Mona Azadkia, Dr. Armeen Taeb
Oct-2021
Abstract: Variable selection under latent confounding is a classic problem in causal inference. Recently, Chatterjee (2020) proposed a rank correlation and Azadkia and Chatterjee (2021)
laid out an simple but effective approach based on Chatterjee’s rank correlation to define a new measure of conditional dependence and a new algorithm for variable selection. This
fully non-parametric approach is based on rankings and the method of nearest neighbors. In this thesis, we incorporated the measure, namely Conditional Dependence Coefficient,
and the algorithm, namely Feature Ordering by Conditional Independence (FOCI), to conduct variable selection using generated data with latent confounding. Upon several
exploratory simulation experiments, we proposed to use proper confounder estimation method together with FOCI to not only provides a view of the empirical distribution of latent confounders and computationally control the false discovery proportion when selecting the signal variables, but also theoretically remove the spurious dependence between the non-signal predictors and the response. In the light of this founding, we developed a
new FOCI function to include the estimated confounders always in the conditioning set which is different from the original function, and a resampling scheme with a heuristic threshold.

As for the confounder estimation, we proposed two methods which are principal component analysis (PCA) and variational autoencoder (VAE). We then theoretically justified the use of PCA in the sense of relative information loss. Our simulation experiment results have shown that FOCI with VAE will be better and more efficient to nonlinear relation-ships between the predictors and confounders than using FOCI with PCA, given sufficient sample size. While PCA is efficient when the hidden confounders have linear relationships with predictors, it also performs well when we only have a relatively small set of data, no matter the relationships between predictors and confounders are linear or not. In the context of magnitudes of latent confounders and the signals, we found that if signals are
dominated by confounders, this will result in the FOCI’s failure to select Markov blankets. We also developed comparison between FOCI family and classic methods on real dataset,
which had shown competitive MSPE and generalization ability when using FOCI family. Finally, several future research directions are given at the end of the thesis.
Marco Hassan Parameter Learning in Bayesian Networks under Uncertain Evidence − An Exploratory Research Prof. Radu Marinescu
Dr. Markus Kalisch
Sep-2021
Abstract: This study deals primarily with the topic of parameter learning in Bayesian Networks. In its first sections it proposes a general literature overview of parameter learning in the case of Maximum Likelihood Estimation under complete evidence. A general intro-duction to the concept of likelihood
decomposability is provided before rigorously exposing the EM-algorithm as a viable choice for parameter
estimation when the decomposability property does not hold and the maximization problem is a nasty multimodal function. Given the properties of the EM-algorithm, we will argue that it is possible to generalize the algorithm to deal with the case of parameter learning in a full-bayesian
learning setting by adjusting the maximization step. The second part of study leverages the classic theory exposed in the previous sections to deal with the case of parameter learning in the case of uncertain evidence. Augmenting the arguments proposed in Wasserkrug et al. (2021) the study will
argue that upon finding a constrained joint distribution for the probabilistic graphical model that satisfies the constrains imposed by the uncertain evidence, the EM-algorithm might be used for obtaining a sensible network parameterization with slight adjustments as the virtual evidence method of Pearl (1987). The study shows then by example that that such methods are easily integrable in classical statistical software as in the extension of the merlin engine.

Keywords Graphical Models · Iterative Methods · Bayesian Networks · Bayesian Statistics · Parameter Learning ·
Bayesian Learning · Uncertain Evidence · EM-algorithm · I-projection · Inference · Clique Algorithms
Josefa Arán Paredes Diagnostic Tool for Interference Features in GATE Estimation Prof. Dr. Marloes H. Maathuis
Meta-Lina Spohn
Sep-2021
Abstract: Interference in experimental studies happens when a unit's outcome depends on the treatment received by other units. The regression adjustment estimator of the global average treatment effect (GATE) can be biased or have too much variance if one uses the wrong
features to account for network interference. This thesis proposes a diagnostic procedure to validate the choice of interference features for GATE estimation. We study combinations
of experimental designs, underlying network structures and interference feature de finitions that can be used for estimation as well as the consequences that the misspeci fication of features has on the GATE estimate. In order to test stable unit treatment value assumption (SUTVA) violations and hypotheses about the extent of network interference, we adapt two existing tests to our problem: a spatial autocorrelation permutation test from the spatial statistics literature and a conditional randomization test from the fi eld of causal inference with interference. We discuss how to use these tests to guide the selection of interference features, based on the residuals from the adjusted regressions, supported by examination of the evolution of the fi tted regression coefficients. In simulation experiments, we see that the power for testing higher order interference may be insufficient. However, we show that our interference tests are powerful enough for detecting SUTVA violations on sufficiently
large networks, enabling the practitioner to diagnose their adjustment model.
Carina Schnuck Provably Learning Interpretable Latent Features Prof. Dr. Fanny Yang
Dr. Armeen Taeb
Sep-2021
Abstract: Deep neural networks regularly break records when it comes to achieving high prediction accuracy on complex classification tasks. However, in high-risk applications, users who are ultimately responsible for the decisions they make may be reluctant to follow predictions given by a machine learning model unless they are provided with interpretable explanations.
Motivated by such settings, we identify and visualize core latent features in image data that our proposed model uses to predict a target label. To this end, we posit an underlying anti-causal data generation mechanism and introduce PD-VAE, a variational autoencoder (VAE) that jointly optimizes the full likelihood of the image data and target label as well as the conditional likelihood of the image data conditioned on the target label.
Unlike previous methods, without explicit prior knowledge of the underlying latent factors, our framework can provably learn disentangled core latent features and attain high prediction accuracy in a standard supervised learning setting. We demonstrate the effectiveness of our framework on MPI3D, CelebA, and chest X-ray datasets.
Name: Carina Schnuck
Chrysovalantis Karypidis The (multiple) knockoff filter: A powerful variable selection with FDR control Dr. Lukas Meier Sep-2021
Abstract: Today’s data sets often contain a large number of variables, and the researcher is interested in finding those few explaining the response of interest without too many false discoveries. The control of the false discovery rate (FDR) ensures that most findings are true effects and can be reproduced by follow-up research. This thesis deals with the knockoff filter, a modern vari- able selection technique with FDR control. We start by elaborating on fixed-X knockoffs that achieve FDR control in low-dimensional Gaussian linear models. We show in simulations that the knockoff filter is more powerful than existing popular selection rules while controlling the FDR. In a novel follow-up simulation, we conclude that the choice of a proper score function, a key ingredient of the knockoff filter to compare the importance of an original variable and its knockoff, does not affect the FDR control but has a great influence on the power. We suggest that the researcher selects one that embeds the most information. Continuing with the extension to model-X knockoffs, which can be applied to almost any model regardless of the dimensionality, we develop open research questions that should be examined by future re- search, and we aim to answer two of those. First, we show that model-X knockoffs are superior to fixed-X knockoffs in a linear model, even when the model is misspecified. Second, we study whether the aggregation of multiple knockoff runs leads to power improvements. We com- pare the three aggregation schemes union knockoffs, p-value knockoffs and ADAGES. While ADAGES and union knockoffs have more power than model-X knockoffs with one run, this is not always the case for p-value knockoffs. ADAGES leads to the largest power but suffers from empirical FDR values above the pre-specified nominal level in some settings. Union knock- offs provide empirical FDR values that are very close to model-X knockoffs but improve their power. Thus, multiple knockoffs can indeed be used to improve the power, but the researcher has to accept (slightly) larger FDR values. We also provide a user-friendly implementation of all three aggregation techniques with our package multiknockoffs in the statistical software R.
Keywords: Variable selection, false discovery rate, knockoff filter, Lasso
Marcel Müller Uncertainty in remaining useful life prediction of aircraft engines using Monte Carlo dropout and Gaussian likelihood in deep neural networks Dr. Lukas Meier
Prof. Dr. Konrad Wegener
Sep-2021
Abstract: Nowadays, highly complex machines are used extensively in industry and also in everyday life. Reliable functioning of these machines is essential for company success and to protect human lives. For this reason, these machines need to be maintained. The optimal timing for this maintenance depends on economic and risk factors. In the field of Prognostics and Health Management, methods for the detection of incipient faults, fault diagnosis and prediction of the fault progression of machines are being researched. This thesis ties in with this and investigates the implementation of uncertainty estimates in the prediction of the remaining useful life of aircraft turbines using a deep neural network (DNN). The prediction uncertainty is estimated based on Monte Carlo dropout as a variational Bayesian inference method and a Gaussian likelihood assumption. A particular focus of this work is to investigate whether DNN reported in the research literature can be augmented with an estimate of predictive uncertainty without changing the model specification. We further investigate the feasibility of determining the optimal model specification of DNN, where an estimate of predictive uncertainty has been implemented, on the basis of a controlled (quasi-)experiment. Moreover, we present tools for comparing different models in terms of predictive accuracy and uncertainty. In this respect, we put emphasis on the use of an adequate performance metric that considers the actual practical usage of the DNN. A special focus is also put on the decomposition of the estimated prediction uncertainty into aleatoric (data uncertainty) and epistemic uncertainty (model uncertainty). Our work shows how the additional estimation of predictive uncertainty estimates can contribute to verifying the usability and reliability of DNN in safety-critical predictive maintenance application areas.
Zeno Benetti Fraud Detection in Ethereum Using Web-scraping and Natural Language Processing Techniques Dr. Markus Kalisch
Victor Obolonkin
Sep-2021
Abstract: The objective of this thesis is to discern Ethereum fraudulent smart contracts (de ned as smart contracts related to Ponzi schemes) from non-fraudulent ones on the Ethereum
blockchain. For this purpose, we employ web scraping techniques in order to retrieve data on the transactions of each smart contract. More importantly, we retrieve the opcodes sequence of each smart contract, which is to say the set of instructions that determine the contract's behaviour on the network. The sequence of opcodes of each smart contract is thus embedded using natural language processing (NLP) techniques, to then feed a classifi er ensemble. As is typical for most problems concerned with fraud detection, the dataset we work on
is characterized by a vast class imbalance. The model we propose e ffectively addresses this issue through (i) leveraging on the resampling of the training set, through (ii) setting
a lter for `obvious negative (i.e. non-fraudulent) instances', and through (iii) weighting each classi fier's predictions in the ensemble based on the estimated balanced accuracy of the classifi er. The class imbalance has also important implications on the metric through which we are to assess the classifi er. In this regards, whereas the most common metric used in the literature is arguably the F1-Score, we found the balanced accuracy to be more
suited for out setting. Chapter 1 provides the necessary background for the reader to familiarize with the topic. It covers the conceptual di fference between a traditional at currency such as the Swiss franc and a cryptocurrency. It goes on to explain concepts central to cryptocurrencies,
foremost the Proof of Work (PoW) protocol and the smart contract feature. It then provides a literature review on fraud detection, focusing on common challenges such as that of imbalanced datasets. Chapter 2 illustrates the retrieval process for transactional and
opcodes data. It also discusses the NLP techniques we use to embed sequences of opcodes into numeric vectors. Chapter 3 presents the proposed model ensemble, explaining the value added by each di fferent stage of the ensemble. In particular, it focuses on how the
ensemble addresses the challenge posed by the vast class imbalance. Chapter 4 examines the adequacy and usefulness of the model and, more broadly, of the thesis. In this regard, it discusses not only technical considerations of statistical nature, but also economic and legal ones.
Leonardo Kokot Halfspace learning typical interpolation and 1-layer neural network guarantees Prof. Dr. Sara van de Geer
Felix Kuchelmeister
Sep-2021
Abstract: When giving the guarantees on test set performance of interpolating neural networks, machine learning researchers would traditionally use uniform convergence framework type of bounds. Since these bounds give information on what is the worst performance neural network that you could get, when fitting neural network on a given training data, it has been shown to fail (look into Nagarajan and Kolter (2019)) when applied to complex hypothesis space such as neural networks. This is observed in practice, where we observe that neural networks tend to perform much better than what would be expected by looking at the given guarantees. Therefore, in Theisen, Klusowski, and Mahoney (2021) they try to overcome this issue and look into the distribution of behaviours of random (typical) interpolating neural networks instead. They prove results on convergence of behaviour of typical interpolators as dimension of data grows, but all that under the assumptions which seem to be very unrealistic. Therefore, we build on top of their work and analyze noiseless version of the halfspace learning model (1 layer neural network) under much more realistic assumptions. For training sample size n = 1, we prove convergence of behaviour of typical and worst case interpolators as dimension grows to infinity. Typical interpolators converge to a behaviour of a classifier with the error rate of 0.5, while at the same time worst case interpolator converges to having an error rate of 1.0. After that, we support these theoretical findings with simulations under the scenario when the number of training samples n is greater than 1. Finally, we compare results obtained by the simulations and theoretical analysis to the results that would be given by existing neural network guarantees under the same conditions.
Theresa Blümlein Learning Optimal Dynamic Treatment Regimes Using Causal Tree Methods with Application to Medicine Dr. Markus Kalisch
Prof. Dr. Stefan Feuerriegel
Joel Persson
Sep-2021
Abstract: Dynamic treatment regimes (DTRs) have emerged as a method to tailor treatment decisions to patients over time by considering heterogeneous patient information. Common approaches for
learning optimal DTRs are typically either based on algorithms for outcome prediction rather than treatment effect estimation or assume a linear dependence between patient histories and
treatment responses. However, patient information (e.g., electronic health records) is often highdimensional
and exhibits unknown non-linear dependencies. To address these shortcomings, we develop novel approaches for learning optimal, explainable DTRs that effectively handle complex patient data. Our approaches are based on data-driven estimation of heterogeneous treatment effects using non-parametric causal tree methods, specifically causal trees and causal forests. Our proposed DTR approaches are doubly robust, explainable, and control for time-varying confounding. To the best of our knowledge, this thesis is pioneering by adapting causal tree
methods from heterogeneous treatment effect estimation (i.e., causal tree and causal forest) in the static setting to learning optimal DTRs in the sequential setting. We evaluate the effectiveness
of our proposed approaches, DTR-CT and DTR-CF, in several simulation studies and provide an application using real-world data from intensive care units. Our approaches outperform
state-of-the-art baselines in terms of cumulative regret and the accuracy of proposed versus optimal decisions by a considerable margin. By addressing current challenges in practice and
research, our work is of direct relevance to personalized medicine
Srividhya Padmanabhan Analysis of Multistate Models by Solving Differential Equations Under the Bayesian Paradigm Dr. Lukas Meier
Dr. Ulrich Beyer
Sep-2021
Abstract: In the field of pharmaceuticals, during drug trials, decisions on the efficiency of an experimental drug are based on evidence of significant hazard reduction induced by the treatment. For example, in patients with cancer, drugs that can slow down disease progression and delay the occurrence of death are beneficial. In such cases, in addition to an initial state of Stable disease and final state of Death, clinicians and drug developers are also interested in multiple intermediate states, like, Response to treatment, Progression of disease, etc. Data from such drug trials document the history of the disease in the patient, marking the time of transition to the different states. We can model such data as multistate processes. These models help analyse the instantaneous rate of transition, probability of transition to the different states and the overall survival (OS) probabilities given the current state and event history of the patient. We can now compare the transition rates, OS probabilities between an experimental and a control treatment to quantify any hazard reduction induced by the experimental drug. Popularly, these models are analysed under the Frequentist paradigm using maximum likelihood estimators on semi-parametric stratified regression models.

However, for the estimates defining hazard reduction, an interpretation that quantifies the uncertainty in terms of probability is more intuitive for clinicians than the classical Frequentist interpretation that gives confidence intervals. For this purpose, we are interested in analysing the multistate models under the Bayesian paradigm. The Kolmogorov forward equations help to represent the multistate process as a system of differential equations. Using numerical integration for a product integral to approximate the results from these differential equations is computationally expensive and inefficient. In this study, assuming suitable priors and models for the distributions of transition rates, these differential equations are solved under the Bayesian framework using Stan software. This provides accurate results, which are also computationally quick. The Frequentist and Bayesian approaches are discussed and a comparison of the OS probabilities from both the frameworks is presented in this thesis work.
Paul Maxence Maunoir Fast solution path methods for change point detection Prof. Peter Buehlmann
Dr. Solt Kovács
Sep-2021
Abstract: In this thesis, we consider algorithms for the computation of solution paths in change point detection problems. In particular, we propose a new efficient way of computing the narrowest-over-threshold solution path (Baranowski et al., 2019) when using seeded
intervals (Kovacs et al., 2020). We demonstrate an empirical near-linear running time and we provide a proof that our approach delivers the correct solution. Furthermore, we propose four new and flexible methodologies for change point detection applicable beyond
piecewise constant mean setups but also more generally to piecewise linear mean and piecewise linear continuous mean setups. All of these methods also run in near-linear time and we investigate their empirical statistical properties. Lastly, we propose two new ways
of selecting a solution from a solution path based on the intuition that the best solution must display some stability properties.
Kaye Iseli Independent weak learners Non-asymptotic bounds for ensemble learning algorithms Prof. Dr. Sara van de Geer
Felix Kuchelmeister
Sep-2021
Abstract: Over the last years research has been done to try to elucidate the mystery behind the success of boosting procedures. This thesis tries to analyse boosting from a non-asymptotic viewpoint. The aforementioned is done by discussing the boosting algorithm and studying its efficiency by presenting an upper bound, which results also
in a consistency result, for the generalization error of a majority vote classi fier under the assumption of the existence of independent weak learners on the training set. In particular the existence of weak classi fiers is investigated, and how reasonable the
independence assumption is. A simulation presents an example of this result. Eventually, some relaxations of assumptions needed for the consistency result are presented, as well as an extension of the upper bound for the generalization error on the test set
for the one-bit compressed sensing model.
Stefan P. Thoma Estimating Relevance within Replication Studies Prof. Dr. Martin Mächler
Prof. em. Dr. Werner Stahel
Aug-2021
Abstract: This study investigates the application of Relevance as defined by Stahel (2021b) to three (Many Labs) replication studies.
It starts with an introduction to the replication crisis in psychology and discusses how replacing p-values with Relevance could facilitate a more nuanced and reliable scientific process.
The re-analysis using Standardized Mean Difference (SMD), log Odds Ratios (LOR), and the adjusted coefficient of determination R2adj is then presented.

The relevant effect (SMD) of experiment three by Albarrac´ın et al. (2008) failed to replicate with none of the eight attempts by Ebersole et al. (2020) producing relevant results.
They further lacked precision to either establish equivalence between the groups nor to prove the original effect to be an anomaly.
Re-analysis of the replication of Schwarz et al. (1985) by Klein et al. (2014) successfully produced relevant LOR at 15 out of of 36 attempts.
Although precision of the estimates varied, almost all point estimates were relatively close to the original effect.
The four replication attempts of LoBue and DeLoache (2008) by Lazarevic et al. (2019) conclusively showed the effect of the experimental group on R2adj to be negligible.

Relevance proved to be a widely applicable and intuitive measure which, unlike p-values, can embed evidence against as well as for the null-hypothesis and does not confuse precision with magnitude. Still further studies are needed to develope concrete procedures for establishing negligibility of effects.
Alexander Roebben Password Guessing with Integer Discrete Flows Prof. Dr. Fadoua Balabdaoui
Prof. Dr. Fernando Perez Cruz
Aug-2021
Abstract: Recent passwords leaks provide knowledge about how individuals set their credentials. Attackers try to guess weaker passwords by levering on this sensitive information. The goal for the security community is to reduce vulnerabilities by exposing always more weaknesses. Creating standard password guessing tools requires expert knowledge on the semantic, which is difficult to acquire. New approaches based on
generative deep learning have emerged, giving some insight into the
distribution of passwords. In this paper, we build upon another unsuper-vised deep learning tool, called Integer Discrete Flow (IDF), which enables exact likelihood computation and the transformation of passwords into latent vectors. It is the first application of this method to this task. It combines probability estimation with sampling capacity. Inserting dependencies in the prior distribution of the latent variables increases the number of password guesses, making the model an efficient competitor to other deep learning techniques, while moderately improving the probability estimates.
Marco Hassan Parameter Learning in Bayesian Networks under Uncertain Evidence − An Exploratory Research Prof. Radu Marinescu
Dr. Markus Kalisch
Aug-2021
Abstract: This study deals primarily with the topic of parameter learning in Bayesian Networks. In its first sections it proposes a general literature overview of parameter learning in the case of Maximum Likelihood Estimation under complete evidence. A general introduction to the concept of likelihood decomposability is provided before rigorously exposing the EM-algorithm as a viable choice for parameter estimation when the decomposability property does not hold and the maximization problem is a nasty multimodal function. Given the properties of the EM-algorithm, we will argue that it is possible to generalize the algorithm to deal with the case of parameter learning in a full-bayesian
learning setting by adjusting the maximization step. The second part of study leverages the classic theory exposed in the previous sections to deal with the case of parameter learning in the case of uncertain evidence. Augmenting the arguments proposed in Wasserkrug et al. (2021) the study will
argue that upon finding a constrained joint distribution for the probabilistic graphical model that satisfies the constrains imposed by the uncertain evidence, the EM-algorithm might be used for obtaining a sensible network parameterization with slight adjustments as the virtual evidence method
of Pearl (1987). The study shows then by example that that such methods are easily integrable in classical statistical software as in the extension of the merlin engine.
Johannes Hruza Conditional and Unconditional Independence Testing based on Distributional Random Forests Prof. Dr. Peter Buehlmann
Domagoj Ćevid
Aug-2021
Abstract: We introduce a new coefficient ϡ (denoted by the Greek letter sampi) which can be used to measure unconditional independence (X ⊥ Y) as well as conditional independence (X ⊥ Y | Z); both for univariate and multivariate X,Y,Z. It is model free and it has the property that ϡ(X,Y,Z)=0 if and only if X ⊥ Y | Z. Our proposed consistent estimator is based on Distributional Random Forests (DRF) and inherits its good estimation properties in high dimensions. Simulations have shown that it can outperform state of the art (conditional) independence coefficients for a variety of settings in high dimensions. In addition, we provide a new algorithm based on DRF which can take any conditional independence coefficient and simulate a null distribution for any given data. Last we have included some applications such as variable selection and causal graph estimation.
Pascal Kündig The R Implementation of the Critical Line Algorithm: Platform Stabilization through Weight Thresholding and Result Analysis Prof. Dr. Martin Mächler Aug-2021
Abstract: The Critical Line Algorithm invented by Markowitz calculates the efficient frontier using the concept of turning points. An implementation of this algorithm is provided in today’s
R package CLA. Since many mathematical calculations on the computer are subject to numerical inaccuracies, the results of this implementation are also platform-dependent. This can mean that the resulting asset composition at the turning points differs between
the platforms. In order to increase the platform stability of the Critical Line Algorithm, we propose three different weight thresholding approaches. In a result analysis, we examine the effects of these three approaches and show that weight thresholding succeeds in avoiding different asset compositions between the platforms. We demonstrate that the
choice of the optimal weight threshold is subject to a tradeoff. The larger the threshold is, the more stable and concentrated the asset weighting is. However, a larger threshold also means that the deviation from the original weighting increases, and a subsequent
quadratic optimization to find the optimal portfolio can become unsolvable.
Alexander Thomas Arzt Meta-Labeling - A Novel Machine Learning Approach to Improve Algorithmic Trading Strategies Dr. Markus Kalisch Aug-2021
Abstract: In 2018, Marcos Lopez de Prado proposed a novel machine-learning concept to improve algorithmic trading strategies,
called meta-labeling. Meta-labeling essentially refers to (1) assigning a binary label to past trades of a
trading system based on their outcome (win or loss), (2) constructing a set of time-series features that temporally
align with the labels, and (3) fitting a machine learning classification model to the features and the labels.
(4) Subsequently, the trained classifier is used to estimate the probability of profitability for every new, unopened
trade. (5) The position size for each new trade is then determined based on its corresponding probability estimate
before the trade is opened. The goal is to assign larger position sizes to trades with a high estimated probability
of being profitable. The purpose of this thesis is to investigate if meta-labeling does indeed improve trading
performance when applied to real trading systems and to derive essential conclusions regarding the concept's practical implementation. Four profitable trading systems are meta-labeled using an all-or-nothing scheme for position sizing.
As a result, the performances of three systems improve. Further, it is found that (1) combining several classifiers
into an ensemble model improves meta-labeling results, that (2) meta-labeling's effect is not different on a portfolio level, and that (3) incorporating feature selection into the procedure does not provide a meaningful benefit. The results demonstrate that meta-labeling does indeed improve trading system performance, and thus it should be of relevance to any trader.
Jakob Heimer Statistical Methods in Radiomics Prof. Dr. Marloes Maathuis Aug-2021
Abstract: Radiomics is the systematic analysis of imaging data in its raw, numerical form. Typically, this analysis deals with a high-dimensional dataframe of computed statistics, which are called radiomic features. This thesis aims to understand and experimentally derive which
statistical problems occur in radiomic research. To achieve this, four different experiments are conducted based on two real radiomic datasets; a lymph node and a myocarditis dataset. In the first experiment, the importance of the placement of the feature selection
within the cross validation loop is demonstrated. The inference experiment replaces the binary response with a synthetic response based on the features, and compares the sensitivity and specificity of feature selectors to detect these true signals. The prediction experiment combines each feature selector with each classifier and compares the predictive performance of the 144 combinations. A feature stability experiment estimates the
variability of selected feature subsets. It is shown that the multicollinearity of radiomic dataframes decreases the selection bias introduced by incorrect cross-validation. Nonlinear feature selectors and classifiers overperform. The feature selection is not sensitive,
specific, or stable. It is concluded that reproducible radiomic signatures are difficult to achieve, especially considering other sources of instability, such as scanning, segmentation, and feature generation. Radiomics can supply additional value by adding to clinical models with a focus on prediction more than inference.
Yuxin Wang Back-Testing on the Low-Volatility Trading Strategy in the Chinese Stock Market Dr. Markus Kalisch
Prof. Dr. Didier Sornette
Dongshuai Zhao, CFA
Aug-2021
Abstract: The goal of this thesis is to back-test the effectiveness of trading strategies that combine the the Fama and French (1993) three-factor model, the Carhart (1997) four-factor model and the class of low volatility factors in the Chinese stock market. The three-factor model
captures the common factors for the stock return by including the market factor, size factor and the value factor. The four-factor model considers additionally the momentum factor.
The class of low volatility factors aims to capture the idiosyncratic risks of each assets after considering the common factors. In this paper, we use the three-factor model as the building block, and treat the market factor, size factor and value factor (and momentum
factor from the four-factor model) as the only common factors in the Chinese stock market, and we first test the effectiveness of these models. Next, we extend them with different low
volatility factors: idiosyncratic volatility, idiosyncratic momentum, idiosyncratic skewness and idiosyncratic kurtosis factor respectively. We back-test these strategies on the Chinese
stock market from 2011 (2009 for the three-factor model) until 2019. Our test results indicate that the three- and four-factor model, as well as models that combine these two models with one of low-volatility factors are all significant, and the IS factor has the best performance. The low-volatility factors are all important factors in explaining the stock returns. In addition, the variation being explained is rather low (around 45%), suggesting
theres more factors to be explored.
Massimo Fabio Höhn Explainable Deep Learning Models in Natural Language Processing Dr. Lukas Meier Aug-2021
Abstract: The objective of this thesis is comparing two different groups of interpretability frameworks with regards to the robustness of their interpretations. For our experiments, we use a neural network for a NLP-related binary classification task and generate interpretations using Integrated Gradients (self-explaining interpretabilty framework) and LIME (post-hoc interpretability method). We examine their robustness properties with and without adversarial training. While robustness of interpretation scores of both IG and LIME are not fundamentally different when using a non-robust model, we can show that adversarial training has a significant impact on the robustness of Integrated Gradients, but less so for LIME.
Dongyang Fan Robust Small Leak Detection on Multivariate Time Series Sensor Data Using Machine Learning Methods Prof. Dr. Fadoua Balabdaoui
Dr. Simon Round
Sarala M.
Aug-2021
Abstract: High-voltage direct current (HVDC) allows for the efficient long-distance transmission of bulk electricity and will help enable a carbon-neutral future. During operation there are power losses in the HVDC station equipment, especially the valves, which control the power flow with semiconductors. A liquid cooling system extracts the heat from the valves and dissipates it to the outside environment. Coolant leakage is one of the major problems appearing in the system, and can cause the station to stop the electricity transmission. Today, multivariate time series signals from a sensor network are collected to monitor and detect the appearance of leaks.

Currently, in the studied station, a measurement-based method is implemented for the purpose of leak detection, which fails to detect leaks with a rate smaller than 9%/day. This thesis develops data-driven methods using machine learning tools to give robust and precise detection of the presence of small leaks, in an unsupervised manner.

As raw signals are provided, data preparation includes outlier removal, offset compensation, resampling and variable selection. The method development is based on two important properties of a coolant leak: 1) leaks can last over a long time range; 2) leaks are a rare event. Thus, models are built up representing dependencies under normal conditions and leaks can be detected from the evolution of differences between model outputs and observations. Dependencies over contemporaneous signals are explored via regression-based methods (XGBoost and Feedforward Neural Network) and autoregressive properties are explored using forecast- (Gated Recurrent Unit and Graph Neural Network with attention mechanism) and reconstruction-based methods (Transformer-VAE).

Two sets of mechanisms are introduced for leak detection on outputs of each model: one to distinguish tiny leaks from fluctuations in signals and model predictions, and one to respond shortly after the appearance of big leaks. All the methods are able to detect leaks which are 30 times smaller than the existing method. For big leaks with a rate of 10%/day, some of the methods require less than half the time of the current method to detect them. The two regression-based methods are the quickest for all leak rates.

From the results, it is recommended to implement XGBoost method for leak detection, which responds quickly and can provide insights on feature importance. The model requires at least 2 to 3 months of stable data for training. During the early operation stage of an HVDC station, an improved version of the current measurement-based leak detection method that requires no model training can be used, with a fixed reference level. Even with no labels in training data, the developed methods are shown to be robust against unusual situations caused by high summer temperatures, and are successful in detecting all the leaks without reporting any false alarm
Yunrong Zeng Linear Mixed-Effects Models: Parameter Estimation, Covariance Structure and Hasse Diagrams Dr. Markus Kalisch Aug-2021
Abstract: : Linear mixed-effects models (LMMs) are powerful modeling tools that allow for the analysis of data-sets with complex, hierarchical structures. Intensive research during the past decade has led to a better understanding of their properties. This thesis is an applied research on LMMs, aiming to provide profound understanding of these models and be instructive to proper model selection when dealing with real life problems. Starting with a tutorial on estimating linear mixed-effects models, we conduct simulation studies to verify the difference between maximum likelihood (ML) and restricted maximum likelihood (ReML) methods, and to compare certain methods for computing confidence intervals. The next two chapters focus on collecting concrete examples, classifying LMMs to several categories, and finding suitable methods to interpret them.
David Dreifuss Monitoring of SARS-CoV-2 variants by genomic analysis of wastewater samples Dr. Markus Kalisch
Prof. Niko Beerenwinkel
Jun-2021
Abstract: Since the onset of the COVID-19 pandemic, testing and detection efforts have been at the forefront of public health strategies. Starting late 2020, focus has greatly turned to genomic surveillance, due to the emergence and spread of SARS-CoV-2 variants linked to higher transmissibility, disease severity and mortality. Surveying viral loads in municipal wastewater has demonstrated since relatively early in the pandemic to be a cost-efficient, unbiased and rapid approach to monitor the spread of the virus in a community, greatly complementing clinical data. As the wastewater based epidemiology field is emerging, many important questions arise about the type of knowledge that can be derived from sewage samples and about the best methodology. In this work, we demonstrate how multiplexed reverse transcription digital PCR (RT-dPCR) and next generation sequencing (NGS) of wastewater extracts can be used for early detection, quantification and even epidemiological characterization of newly emerged or introduced SARS-CoV-2 variants. We show that, despite the inherently challenging nature of the data, estimates can be made remarkably precise. The methodology developed is then implemented in a national surveillance program of SARS-CoV-2 genomic variants in wastewater treatment plants around Switzerland.
Giulia Cornali Metacognition of Control: A Bayesian Approach to Experienced Control within Interoception Prof. Dr. Peter Buehlmann
Alexander Hess
May-2021
Abstract: The feeling of being in control about external situations or processes internal to our body is a constant companion in everyday life situations and has a significant influence on the behaviour and well-being of a person. This feeling of being in control refers to a metacognitive evaluation of events, where ``metacognitive'' stands for the self-reflective nature of perception about control. Recent theories in the field of computational psychiatry propose a critical role of metacognition in fatigue and depression. Therefore, it is of major interest to understand how a feeling of control arises and how similar events can lead to sometimes diametrically opposed control experiences. Here, we focus on the domain of interoception which is defined as the perception of the body state. We analyze and describe a possible mechanism for experienced control about one's own bodily states. Moreover, we propose a generative model of control that incorporates our assumptions about the brain's mechanisms underlying the concept of metacognition of control according to Bayesian brain theories. We will use existing models for interoceptive learning (ie the brain's processing of interoceptive signals) and link them to a set of novel models for a metacognitive evaluation of control. In parallel, we outline the design of an experimental study compatible with the structure and assumptions of our models. In the study, participants' control experience is manipulated by induced perturbations in the interoceptive domain of breathing in the form of inspiratory resistances. Participants can both exert control on the probability of experiencing breathing under an inspiratory load and learn about the underlying probabilistic structure of the task to better predict future bodily states. By requiring participants to predict respiratory perturbations and report their control experience on each trial of the study, we are able to infer on subject-specific beliefs as well as the values of parameters of our models characterizing their behaviour. We assess the models in our proposed model space using a set of simulation analysis. In a first step, we find suitable prior configurations for our models by analyzing the possible range of behaviour produced by our models. In a second step, we focus on the individual effect of the different parameters of our models and successfully demonstrate their recoverability. This serves as a demonstration of the functionality of our model space and its utility for future use in the analysis of the outlined study.
Samuel Koovely A mathematical framework for COMIC-Tree: an undirected graphical model for T-cell receptors specificity Prof. Marloes Maathuis
Dr. M. Rodríguez Martínez
May-2021
Abstract: T-cells are a core component of the adaptive immune system: they play a major role in mounting an effective and tailored response to foreign pathogens, and they are also relevant in the context of cancer and certain autoimmune diseases.
T-cells receptors are protein complexes present on T-cells’ surface that are responsible for identifying foreign and own antigens. Given the complexity of protein-protein interactions, this identification process exhibits a quasi-stochastic behaviour that can be modeled with probabilistic and statistical models.
Graphical models can represent a multivariate distribution in a convenient and transparent way as a graph. In this paper we introduce COMIC-Tree, an undirected graphical model for protein-protein interactions, and DrawCOMIC-Tree, a greedy algorithm based on conditional mutual information for learning COMIC-Tree structures. We provide a solid mathematical foundation for them, highlight some theoretical aspects, and test them empirically on a dataset of T-cell receptors.
Foong Wen Hao Confidence Intervals for the Effective Reproductive Number of SARS-CoV-2 Prof. Dr. Marloes Maathius Apr-2021
Abstract: The monitoring of the effective reproductive number R_e of the Sars-Cov-2 has been a crucial step in controlling the recent COVID-19 epidemic. It is a major indicator of the epidemic growth, used as an assessment of whether the epidemic is growing, declining, or remaining at a constant rate of growth. Since the beginning of the epidemic, there has been many methods developed and deployed to accurately estimate the $R_{e}$, that allows for near real-time monitoring of the spread of the disease. However, it is not enough to get an accurate estimate, as its uncertainties is also a key component that needs to be taken into consideration. To better account for the uncertainty of these estimates, we have derived confidence intervals from several adaptive bootstrap methods that take into account the temporal covariance structure of the observation data. The confidence intervals are evaluated and validated on synthetic data obtained through simulation from an assumed model. By comparing validation results of some chosen metrics from each bootstrap procedure, we give the procedure that shows the most promising result and discuss it's shortcomings.
Jialin Li Mixed Copula: Analysis of Parameter Estimation in R Implementation and Application to Stock Returns Prof. Dr. Martin Machler Apr-2021
Abstract: Mixed copula model allows different tail dependence structures for the upper and lower tails which lead to more flexibility in application. A mixed copula consists of several com- ponent copulas and a vector of weights. The estimation of weights involves parameter transformation, the centered log ratio transformation. The R package copula can esti- mate parameters for each component copulas, lambdas which are the transformed weights and the corresponding standard errors. In this thesis, through a variance transformation technique, the delta method, estimating standard errors for original weight parameters are available allowing statistic inferences like confidence intervals. Asymptotic normality of maximum likelihood estimates of mixed copula parameters becomes invalid due to the existence of zero weights. Examples of estimation process of mixed copula for convergent and non-convergent cases are presented. Coverage rate of confidence intervals of parame- ter estimates of mixed copula model with zero weights are estimated through simulations. The influence of starting values on parameter estimations are investigated via two simula- tion tests, which are testing starting values for different sample sizes and testing multiple starting values on the same sample. A model selection method is proposed and applied to finding an appropriate mixed copula model for a health care industry stock combination from the NASDAQ Global Select Exchange market.
Jiawei Ji Generalized Linear Models and their extension to Dr. Markus Kalisch Apr-2021
Abstract: Generalized Linear Models (GLMs) are generalizations of linear models that allow data to have error distribution other than normal. This property allows more choices of for the response variable such as categorical or binary data. One important assumption of the
model is that the data need to be independent, but in practice we often see correlation among observations. Generalized Estimating Eqaution (GEE) and Generalized Linear Mixed Model (GLMM) are proposed as two distinct approaches to deal with correlated
data. In this thesis, we mainly focus on the GEE model with respect to its properties and performance. Simulations are also implemented to check some important results.
Anna Maria Maddux Behaviour Estimation in Dynamic Games Nicolò Pagan
Giuseppe Belgioioso
Fadoua Balabdaoui
Apr-2021
Abstract: A main concern of game theory involves predicting the behaviour of players from their underlying utilities. While the utilities of the players are often unknown, their behaviour is generally
observable. This motivates the formulation of the static inverse problem which aims to infer the underlying utilities from an observed Nash equilibrium. By means of inverse optimization we recast the static inference problem as an optimization problem, which we solve via linear
programming under mild assumptions on the utility function. We extend the static inference problem to address dynamic games by leveraging the concept of a better-response dynamics. The
dynamical inference problem aims to identify the underlying utilities from an observed sequence of actions between players that follow a better-response dynamics. Under mild assumptions on
the utility function it can be solved efficiently via linear programming. The solution to the static inference problem respective to the dynamical inference problem is a polyhedron which contains
all utility function parameters that best rationalize the observed behaviour. We introduce two measures based on the Löwner-John ellipsoid and the maximum volume inscribed ellipsoid to capture the coarseness of the solution set in relation to the parameter space.
To illustrate our approach we cast the classic example of demand estimation under Bertrand-Nash competition as a static inference problem and a dynamical inference problem, where the observed prices constitute a Nash equilibrium and a better-response dynamics, respec-tively. In numerical simulations we show that if the observed prices are an exact Nash equilibrium the static inference method recovers the true underlying parameters of the demand function and observing only a few price pairs are sufficient to achieve a very refined solution set. Furthermore, our results
are consistent with other papers on demand estimation. Equivalent results are obtained by the dynamical inference method if the observed prices follow an exact better-response dynamics. We
further validate the dynamic inference method by estimating the demand of Coke and Pepsi from their observed prices from 1968-1986. A distinguishing feature of our dynamic inference method is that it applies to dynamic games, which have not yet necessarily converged to an equilibrium but it is merely based on the assumption that players aim to improve their utilities with respect to previous actions.
Zhufeng Li Model-X Knockoff Framework for Gaussian Graphical Models Prof. Dr. Marloes Maathuis Apr-2021
Abstract: In many applications of variable selection problems, we need to identify important variables influencing a response variable from a large number of potential variables. Meanwhile, we wish to keep the false discovery rate (FDR) under control. In this thesis, we start by studying some classical multiple testing control criteria and procedures. Then, we review some classical knockoff filters for variable selection problems, with both fixed design by Barber and Candès or random design by Candès et al.. Since the structural learning on the Gaussian graphical model (GGM) can be seen as many variable selection tasks (i.e. node-wise variable selection), our main focus is to develop a new model-X knockoff filter achieving graph-wise FDR control on Gaussian graphical models. Based on the fixed-X GGM knockoff framework of Li and Maathuis, we construct knockoff copies and feature statistics node-wisely. The threshold vector served as the decision rule is computed globally through a combinatorial optimization. A theoretical upper bound on the FDR is provided for our newly proposed procedure. Our new method is more applicable in the sense that it does not require the hyperparameter $(a, c_{a})$ as in the fixed-X GGM knockoff framework. We conducted some simulations to give an intuitive insight about how tight our bound is and compare our method with the existing method. Some future research directions are given at the end of this thesis.
Xuanyou Pan GLMMs in practice Dr. Markus Kalisch Apr-2021
Abstract: This thesis focuses on practical aspects of Generalized Linear Mixed Models (GLMMs). A simulation study shows that GLMMs have better performance (in coverage rate of confidence intervals) than Generalized Linear Models (GLMs) in presence of random effects. The thesis also takes a look at Zero-Inflated Poisson (ZIP) model for count data with excessive zeros. A simulation study shows that ZIP model performs well on data with excessive zeros sampling from Bernoulli distribution with fixed rate. Besides, normal GLMMs also have relative good coverage rate on fitted confidence interval for fixed treatment effect parameter. A real case study shows that normal GLMMs predict the frequencies of non-zero response variable well, while ZIP model has better prediction on the overall frequencies.
Zheng Chen Man Detection of disease progression in patients with Multiple Sclerosis using smartphone based digital biomarker data Prof. Dr. Peter Bühlmann
Dr. F. Model
Dr. F. Dondelinger
Mar-2021
Abstract: Disease progression detection in multiple sclerosis (MS) is often based
on the Expanded Disability Status Scale (EDSS), which is burdensome to perform and suffers from reliability problems. The advancement of smartphone technology provides new opportunities for measuring
patient performance frequently in daily life. We implemented an algorithm
for disease progression event detection taking into account the heterogeneity of assessment frequency and noise of smartphonecollected data using confidence bounds. The performance and agreement with clinical progression events was investigated on a cohort of
approximately 450 MS patients using the Floodlight app. The algorithm is working as intended when considering patient plots. Most (68%) progression events found among 3 features in Floodlight are sustained until the end of observation. We did not find significant agreement
with progression events detected in their corresponding clinical anchor (AUC < 0.52). However, we found that sustained improvement events between the Floodlight Pinching Test and the in clinic 9-Hole Peg Test were concordant (AUC = 0.64, 95% CI: [0.53, 0.75]). Longer follow-up data is required to ascertain these findings.
Peshal Agarwal Unsupervised Robust Domain Adaptation without Source Data Prof. Dr. Luc Van Gool
Prof. Dr. Peter Lukas Bühlmann
Mar-2021
Abstract: Unsupervised domain adaptation refers to the setting where labeled data on the source domain is available for training, and the goal is to perform well on the unlabeled target data. The presence of a domain shift between source and target makes it a non-trivial problem. We study the problem of robust domain adaptation in the context of unavailable target labels and source data. The considered robustness is against adversarial perturbations. This work aims to answer the question of finding the right strategy to make the target model robust and accurate in unsupervised domain adaptation without source data.
The major findings of this work are: (i) robust source models can be transferred robustly to the target; (ii) robust domain adaptation can greatly benefit from non-robust pseudo-labels and the pair-wise contrastive loss. The proposed method of using non-robust pseudo-labels performs surprisingly well on both
clean and adversarial samples for the task of image classification. We show a consistent performance improvement of over 10% in accuracy against the tested baselines on four benchmark datasets.
Afambo Nitya Causal Fairness for Predictive Models Prof. Marloes Maathuis Mar-2021
Abstract: In this thesis we investigate from a theoretical and practical point of view, how tools from causal inference can be used to address fairness related problems that arise when using statistical models for prediction tasks. One can naturally define unfairness as the causal effect of a sensitive attribute (such as race or gender) on an outcome of interest along certain disallowed causal pathways. Under the assumption that observations are generated from a structural equation model, we show how one can remove these unfair effects in a natural way, as well as how to obtain so-called fair predictions.
Daria Izzo Semantic Role Labeling and Coreference Resolution for Knowledge Graph Enrichment Prof. Dr. Peter L. Bühlmann
Dr. Luis Salamanca
Dr. Fernando Perez-Cruz
Mar-2021
Abstract: In this thesis, we apply an existing semantic role labeling (SRL) model on texts from the Swiss Federal Archives. This data contains speeches from the National Council and the Council of States from the years 1891 to 1980. The model annotates all sentences with the predicate argument structure and whilst finds the keywords in the sentences. Furthermore, those tags are used to extract triplets of the form subject, predicate and object where each entity can be composed of several words. We use this extracted information to capture in a lucid knowledge graph how the parliament speeches are built, and understand the rhetoric of the parliament members. The subjects, objects and predicates and the metadata of the speeches are represented as nodes and related to additional metadata such as the year of the speech, the speaker, etc. In addition, attributes of the nodes and different relation types between the nodes enrich the graph database and allow for more enhanced queries.

On top of that, we implement a coreference resolution (CR) model in the triplet generation step which improves our results. This method is used in order to replace the pronouns by entities or part of the sentence that they refer to. In this way, uninformative words are removed and important relations between persons or topics are added. All this results in more concrete and informative triplets and a better understandable content of the graph.

We conclude by presenting a more specific application of the implemented methodology: the identification of populist speeches and the assessment of populism's evolution throughout time.
Maic Rakitta Generalized Linear Anchor Regression Prof. P. L. Bühlmann
Lucas Kook
Mar-2021
Abstract: When test data differ from training data, predictions derived from conventional learning algorithms often fail. However, using only the direct causes of a response leads to reliable predictions, provided that the causal effects are estimable from the training data under the typically strong assumptions of causal models. Rothenhäusler et al. (2021) relax these causal assumptions and construct an objective for robust predictions for shifted distributions by proposing a new regression technique, which they call anchor regression (AR). AR models heterogeneity generated by exogenous random variables called anchors. This method does not focus on identifying the causal parameters, because, for shifted distributions, the true causal parameters can be outperformed in terms of worst-case prediction performance. This is achieved by decorrelating the exogenous anchors from the residuals through a causal regularizer using a least-square loss. Depending on the extent of regularization, the anchor regression estimator interpolates between ordinary least-squares and two-stage least-squares. While this interpolation has been shown to be useful, the use of a least-squares loss does not apply to all types of responses. If we allow the response to be generated by any distribution of the exponential family, the squared error loss is inappropriate and motivates the use of a more general, likelihood-based loss function.

Hence, in this work we propose the Generalized Linear Anchor REgression (GLARE) estimator, which constitutes a generalisation of AR to Generalised Linear Models (GLM). The anchor objective is based on minimizing the negative log-likelihood under a suitable causal regularizer, which becomes compatible with the GLM framework by replacing the least-squares residuals with deviance or Pearson residuals. In this thesis, we implement the GLARE estimator in an R package, and empirically investigate it by means of a simulation study. The simulation study shows that we can improve worst-case prediction performance using GLARE compared to Maximum Likelihood Estimation (MLE) for Gaussian, Binomial, and Poisson distributions under valid and invalid instrumental variable assumptions.

Theoretical results for the GLARE estimator, for instance the identifiability of the causal parameter under valid instrumental variable assumptions, and applications to real-world data are still lacking. However, the successful implementation in an R package and the basic theory for GLARE in this thesis pave the way for future theoretical work and applications.

Keywords: anchor regression, diluted causality, heterogeneous data, worst-case predictions
Jeanne Fernandez Topological Comparison of Generative Adversarial Networks Prof. Marloes Maathuis
Prof. Karsten Borgwardt
Mar-2021
Abstract: A well-known assumption in Data Analysis is the Manifold Hypothesis, which assumes that high dimensional data lies on a low dimensional manifold, embedded in the high dimensional space. This assumption justifies the study of datasets though the lens of algebraic topology, assessing the structure of the hypothetical underlying manifold. Using the framework of persistent homology, one can approximate the topological structure of such a manifold, even if a finite set of samples is available.
In this work, we are interested in data emerging from Generative Adversarial Networks (GANs). We aim at evaluating how close the topological structure of the generated dataset is close to the one of the original dataset. We use this to critique models and compare the topological structure of datasets emerging from different GANs. Our study is based on the concept of Geometry Score, introduced by Khrulkov and Oseledets in 2018, which is a probabilistic adaptation of Betti numbers. We present some experiments and propose some extensions of the Geometry Score.
Tanja Finger Random Intercept and Random Intercept Random Slope Models: Correlation structures and assessing the quality of the lmer function in R. Dr. Markus Kalisch Feb-2021
Abstract: We review the fitting of linear mixed models in R and the maximum likelihood estimation of the parameters to
prepare the reader with a solid basis. In order to achieve this, two methods to compute confidence intervals
are also explained. This thesis derives the correlation structures of Random Intercept and Random Intercept Random Slope models. It uses the R package lme4 and shows the respective matrices and vectors relevant for the fitting process. Furthermore, the thesis aims to establish the quality of the function lmer for fitting Random Intercept and
Random Intercept Random Slope models. In pursuance of this, we set up a simulation study with three different
datasets which evaluates the influence of the number of grouping effect levels, and at the influence of some model violations. We use coverage proba-bilities and the histograms of our estimates to ascertain how well which
model behaves.

Our results suggest that a linear mixed effect model should have at least ten levels of random effects for the
coverage probability to be accurate. Moreover, they also demonstrate that it is not a problem if the error distri-bution is not normal, the results of R remain approximately correct.
Emilie Epiney Residual analysis of Linear Mixed-E ects Models Dr. Lukas Meier Feb-2021
Abstract: Grouped data structures are common in various fields such as health or social sciences, for example when doing multiple measurements on subjects in clinical trials or when selecting at random a few schools to monitor in educational studies. Observations between groups are independent, but those within the same clusters are not. Linear mixed-effects (LME) models provide the necessary statistical framework to fit this kind of data properly by introducing a random term in a classical statistical regression. This new source of variation captures the grouping effect adequately, but also adds a layer of complexity which needs to be taken into account when developing model diagnostic tools.

This master thesis presents the mathematical framework for LME models and derives the different quantities which may be considered as residuals: conditional and marginal residuals, and best linear unbiased predictors. It then reviews and implements various transformations and plots that can be used to detect violations of the model assumptions.

To determine whether a particular trend is due to randomness or a model misspecification, we included a confidence band on the plots. It is created by simulating new datasets from the fitted model, refitting the model, and then adding the respective smoother to the plot. When 20 or more simulations were made, the results were concluding and helped model validation.

Simulated datasets were used to determine the effectiveness of the different visualisation techniques. We found that least confounded residuals, as proposed by Nobre and Singer \cite{bimj} did not help in diagnosing non-normality of the error terms since they tend to follow a normal distribution regardless of the true distribution of the error terms. They are moreover not well-defined as they rely on a singular value decomposition which is not always unique.

Finally, we provided an introduction to time series along with plots to identify the need for more complex correlation structures, although it is not possible to implement those in the package \lstinline{lme4}. We implemented plots of the residuals for the covariance matrices and of the autocorrelation parameters and the partial auto-correlation parameters.

2020

Student Title Advisor(s) Date
Basil Maag Modeling Comorbidities using Hidden Markov Models Dr. Markus Kalisch
Prof. Dr. Stefan Feuerriegel
Nov-2020
Abstract: In medicine, comorbidities refer to the presence of multiple, co-occurring diseases. Due to their co-occurring nature, the course of one comorbidity can often be dependent on the courses of other diseases and, hence, treatments can have significant spill-over effects. Despite the prevalence of comorbidities among patients, a comprehensive statistical framework for modeling the longitudinal dynamics of comorbidities is missing. In this thesis, we propose a probabilistic longitudinal panel model for analyzing comorbidity dynamics in patients. Specifically, we develop a coupled hidden Markov model with a personalized transition mechanism, named Comoribidity-HMM. The specification of our Comoribidity-HMM is informed by clinical research: (1) It accounts for different regimens (i. e., acute, stable) in the disease progression by introducing latent states that are of clinical meaning. (2) It models a coupling among the trajectories from comorbidities to capture co-evolution dynamics. (3) It considers between-patient heterogeneity (e.g., risk factors, treatments) in the transition mechanism. Based on our model, we estimate a spill-over effect that measures the indirect effect of treatments on patient trajectories through coupling (i.e., through comorbidity co-evolution). We evaluated our proposed Comoribidity-HMM based on 675 health trajectories where we investigate the joint progression of diabetes mellitus and chronic liver disease. Compared to alternative models without coupling, we find that our Comoribidity-HMM achieves a superior fit. Further, we find that treatments targeting diabetes introduce a positive spill-over effect. Here a diabetes treatment decreases the risk of an acute liver disease. To this end, our model is of direct relevance for both treatment planning and clinical research in the context of comorbidities.
Skofiar Mustafa Equality constraints in DAG models with latent variables Prof. Dr. Marloes Maathuis Nov-2020
Abstract: Directed acyclic graph (DAG) models with latent variables provide a formal approach of the study of causality in a setting where not all components of a study or experiment are known and are therefore widely used in economic studies, machine learning and statistics. One considers constraints on the joint probability distribution in order to restrict the
set of suitable joint probability distribution for a DAG model with latent variables. In the absence of latent variables in a Directed Acyclic Graph (DAG), the corresponding joint probability distribution has only conditional independence constraints. As soon as
latent variables occur in a DAG, the set of constraints on the joint probability distribution expands to equality constraints, of which conditional independence constraints are a part
of, and inequality constraints. This thesis deals with equality constraints in DAG models with latent variables. In a rst
step we introduce basic de nitions and present the latent DAG model, the ordinary Markov model and the nested Markov model. Next we describe Tian's algorithm which uses the latent DAG model as a basis and explain the necessary building blocks for the algorithm and the algorithm itself by means of some examples. Then, we introduce nested Markov models, which are motivated by Tian's algorithm, introduce the associated theory and create the connection between the two theories. Finally, we reformulate Tian's algorithm
in the context of nested Markov models.
Lucas Kohler Bayesian Network Structure Learning and Bayesian Network Clustering with an Application to mRNA Expression Data Dr. Markus Kalisch
Prof. Dr. Niko Beerenwinkel
Oct-2020
Abstract: Directed acyclic graphs (DAGs) deliver a versatile tool to describe and understand inter- related random variables emerging naturally in many applications. Learning the structure of such models becomes prohibitive quickly, caused by the DAG space’s enormous size. Many efficient methods have been introduced in the last two decades to address this task. A Bayesian approach to structure learning is to approximate the graph posterior, for example, by employing Markov Chain Monte Carlo (MCMC) sampling. While being computationally very demanding, using appropriate search space restrictions still allows for handling many variables. We investigate how to extend such a search space restriction to improve the performance in terms of accuracy and speed. The result combines greedy approaches for DAG learning with methods for neighborhood selection. It has the ability to enhance the learning process of the considered MCMC method, especially for highly connected graphs.
The methods are then tested in a clustering framework involving Bayesian Networks. In a real-world application using gene expression data from breast cancer tissues, the Bayesian Network clustering algorithm shows the ability to separate the samples according to the known breast cancer subtypes and reveals interesting behavior for specific genes.
Wu Yue Estimation of Static Parameters in State Space Models Prof. Dr. Kuensch Hans Rudolf Sep-2020
Abstract: State space models (SSMs) are popular in many elds such as nance or ecology, which
consist of both observed and unobserved variables through time. With the observed data, the basic tasks are estimation of the unobserved state process and the unknown parameters of the model. In this work, we consider a state space model in shery science, and simulation studies are conducted. Di erent particle ltering variants' performance in estimating
the state process are compared, assuming all model parameters are known. If
there are unknown parameters, the task is much harder. We implement a particle Markov chain Monte Carlo algorithm which embeds a particle lter to estimate the parameters.
Orhun Oezbek Automated Information Extraction from German Financial Statements Prof. Dr. Marloes H. Maathuis
Benjamin von Deschwanden
Sep-2020
Abstract: For investors, traders, and researchers, fi nancial statements contain valuable information about a company's performance and future. The digitalization of financial statements is becoming easier every day with new technologies. Nevertheless, manual data entry is still the primary approach for extracting fi nancial information from fi nancial statements. Compared to other documents such as invoice recognition, information extraction from - financial statements bear additional challenges. The purpose of this project is to automatically
extract information from German nancial statement PDF documents using machine
learning while also addressing additional challenges speci c to nancial statements.
Xiang Ge Luo Learning Bayesian Networks from Ordinal Data Dr. Markus Kalisch
Dr. Jack Kuipers
Sep-2020
Abstract: Bayesian networks are powerful frameworks for studying the dependency structure of variables in a complex system. The problem of learning Bayesian networks is tightly associated with the given data type. Ordinal data, such as stages of cancer, rating scale survey questions, and letter grades for exams, is ubiquitous in applied research. However, existing solutions are mainly for continuous and categorical data. In this thesis, we propose an iterative score-and-search method - called the Ordinal Structural EM (OSEM) algorithm - for learning Bayesian networks from ordinal data. Unlike traditional approaches with the multinomial distribution, we explicitly respect the ordering amongst the categories. More precisely, we assume that the ordinal variables originate from marginally discretizing a set of Gaussian variables, which follow in the latent space a directed acyclic graph. Then, we adopt the Structural EM algorithm and derive closed-form scoring functions for efficient graph searching. Through simulation studies, we demonstrate the superior performance of our method compared to the alternatives and analyze various factors that may influence the learning accuracy.
David Deuber A Quantile Extrapolation Approach for Extreme Quantile Treatment Effect Estimation Prof. Dr. Marloes Maathuis
Prof. Dr. Sebastian Engelke
Jinzhou Li
Sep-2020
Abstract: Quantile treatment effects are used to quantify causal effects on extreme events like heat-waves and floods. However, extreme quantiles are located in a part of the distribution where data is sparse or even unavailable, which makes estimation difficult. Although existing methods are able to estimate extreme quantile treatment effects to some extent,
they cannot extrapolate outside the range of the data. In this paper, we combine a quantile extrapolation method from extreme value theory with estimators of counterfactual quantiles in order to construct estimators of the extreme quantile treatment effect. For
quantile extrapolation, we consider different estimators for the extreme value index of the counterfactual distribution. In particular, we propose a Hill type estimator for heavy-tailed distributions. We show asymptotic normality for our extreme quantile treatment effect estimators and present conservative variance estimation procedures. The finite sample behaviour of the estimators is analysed in different simulation settings. In addition, we apply the Hill based estimator to a real data set to estimate the extreme quantile treatment effect of a job training program. In contrast to existing methods, inference for the extreme quantile treatment effect based on quantile extrapolation is asymptotically
valid even for very extreme cases. In simulations, the extreme quantile treatment effect estimator using the Hill type estimator leads to conservative confidence intervals and can outperform existing approaches in terms of mean square error. To the best of our knowl-
edge, this is the first result about extrapolation based estimation of the extreme quantile treatment effect.
Mark McMahon Benchmarking Non-linear Granger Causality Discovery Methods Prof. Dr. Marloes Maathuis
Prof. Dr. Julia Vogt
Ričards Marcinkevičs
Sep-2020
Abstract: Granger causality is a commonly used method for inferring relationships between variablesin time series data. While many classical techniques for inferring Granger causality assumelinear dynamics, most real-world interactions can be considered non-linear. Therefore,there is great interest in providing methods for extending the Granger causality conceptto the non-linear setting. For this, there is an ever-growing list of methods that have beenproposed, and this thesis aims to provide a structured study of a selection of these methods.In this study, these methods are compared in terms of their theoretical composition, as wellas their empirical performance on a selection of datasets. We first provide a descriptionof each method and their features, then we move on to analyse their performance underdifferent settings while attempting to draw conclusions based on this. While conductingthis testing stage, the issue of selecting hyper-parameters for these models became ofgreat interest. Our testing set-up was thus altered to include methods for addressing thisproblem, starting with analysing one particular commonly used technique. Based on theresults obtained from this analysis, we propose a new method of selecting hyper-parameterswhich we hope can offer improvements in this area.
Milan Kuzmanovic Total Effect Estimate--based Test Statistic for Causal Graphs Prof. Dr. M. H. Maathuis
PhD Leonhard Henckel
Sep-2020
Abstract: One of the primary goals of causal analysis is the estimation of causal effects from observational data, which is generally an impossible task without strong modeling assumptions. Under the assumption that the data were generated by a causal linear model with known causal DAG, unknown structural coefficients and jointly independent errors, total causal effects of single interventions in the model can be identified and estimated via covariate adjustment. In fact, for any valid adjustment set, the total causal effect is identified by the population regression coefficient when conditioning on that set, and the corresponding OLS estimator is a consistent estimator of that effect. This means that the existence of multiple valid adjustment sets implies an overidentifying constraint on the joint distribution of the variables in a causal linear model, since the total causal effect can be identified in distinct ways. We propose a test statistic that can be used for testing this constraint by comparing the corresponding OLS estimators of the total causal effect for different valid adjustment sets, under the null hypothesis that the assumed linear causal model setting with given causal DAG is the true data generating mechanism. The test primarily relies on the asymptotic joint normality of the random vector of OLS estimators for different conditioning sets, and a standard Wald-type testing procedure with linear constraint matrix for multivariate Gaussian vectors that results in a Chi-squared distributed test statistic. While the literature on testable implications of causal linear models is mainly centered on causal structure itself, we shift the focus of our analysis from detecting misspecified causal structures, to discovering inconsistency in the estimation of the total causal effect of interest. We investigate the behavior of the test in repeated sampling under the null hypothesis and present simulation results that support our findings. Finally, power simulations show that there are both instances where this test can be highly useful in practice, and those in which the power is too low for meaningful practical applications.
Yinghao Dai Post-processing cloud cover forecasts using Generative Adversarial Networks Prof. Dr. Marloes Maathuis
Dr. Stephan Hemri
Dr. Jonas Bhend
Sep-2020
Abstract: Numerical weather prediction models for cloud cover typically exhibit biases and underdispersion. In this thesis, we focus on post-processing ensemble forecasts of cloud cover from COSMO-E and IFS-ENS using deep learning techniques. In our first approach, we use a dense neural network that outputs an ensemble of 21 predictions. This model is able to significantly outperform both COSMO-E and a state-of-the-art classical statistical post-processing method called global ensemble model output statistics (gEMOS), in terms of continuous ranked probability score (CRPS). However, the produced forecasts do not look realistic, and ensemble copula coupling (ECC) is needed to inherit the spatial structure from the COSMO-E forecasts. In our second approach, we interpret the post-processing problem as an image processing task, and use a conditional generative adversarial network (cGAN) to generate cloud cover forecasts; something that -- to the best of our knowledge -- has not been done before. This model also outperformed both COSMO-E and gEMOS in terms of CRPS, but was not able to outperform the dense neural network. In return, the cGAN produces forecasts that are much more realistic and better calibrated than those from the dense neural network.
Yessenalina Akmaral Table Structure Recognition Prof. Dr. Marloes Maathuis
MSc Sven Beck
Sep-2020
Abstract: In this thesis, we present a weak-supervised approach to building a dataset of table images annotated with column, row and cell positions. Using this approach, we have built a large annotated table dataset from word documents downloaded from the Internet.
We further train a state-of-the-art instance segmentation model, Mask R-CNN, for the detection of rows, columns, cells and cell content positions. We fine-tune the model parameters to adapt it to our problem. We propose a rule-based post-processing algorithm to resolve any overlapping predictions made by Mask R-CNN, and have experimented with different approaches to building table structure from predictions of column, row, cell and cell content positions.
Throughout our experiments, we used different evaluation metrics and different benchmark datasets to choose parameter values, a best performing approach and to compare our approach with other methods. We demonstrate that our column detector achieves state-of-the-art results on a UNLV dataset. The combination of column, row and cell predictions have comparable performance to state-of-the-art models used on ICDAR and cTDaR datasets.
Kristin Blesch Challenging Single-Parameter Models of Income Inequality to Represent the Distribution of Income: A Data-Driven Approach of U.S. County-Level Income Distributions Dr. Markus Kalisch
Jon Jachimowicz
Sep-2020
Abstract: Economic inequality is predominantly measured by single-value indices such as the Gini coefficient. Doing so, however, may fail to capture crucial differences in the shape of different income distributions. Drawing on recent theoretical and empirical advances in the measurement of inequality, we focus on the measurement of income distributions—operationalized as Lorenz curves—and assess the number of parameters necessary to adequately represent these. We employ a data-driven approach using fine-grained data on income distributions at the U.S. county level (N=3,056), and use the maximum likelihood framework for the parameter estimation of Lorenz curves. This enables us to apply the Akaike Information Criterion (AIC_c) for model selection, and we additionally verify the reliability of AIC_c in model selection for our given setting in a simulation study. Our analysis reveals that all considered single-parameter models are outperformed by higher-order models. In particular, the two-parameter Ortega Lorenz curve model fairs best across specifications. Taken together, our findings question the widespread use of single-parameter inequality measures such as the Gini coefficient, and highlight the necessity of using alternative measures that more aptly capture the distribution of income. Instead, we propose that future research should center on alternative, multi-parameter measures of income inequality, such as the parameters of the Ortega Lorenz curve model, for which we provide estimates at the U.S. county- and state-level.
Reto Zihlmann Generalized Linear Mixed Effects Models in Genetic Evaluations Dr. Lukas Meier
Dr. Peter von Rohr
Aug-2020
Abstract: Health and reproductive traits are increasingly important in cattle breeding programms all around the world. In contrast to productivity traits, health and reproductive traits are often measured on a nominal or ordinal scale which makes classical breeding value estimation
via linear mixed e ects models (LMMs) inappropriate. Despite extensive litherature, application of generalized linear mixed e ects models (GLMMs) and threshold models in practical breeding value estimation remains challenging due to limited availability of software implementation for this speci c purpose. In this study we present available software
packages, show their weaknesses and implement improvements. The implemen-tations were tested on simulated data sets and compared with respect to computation time and accuracy of the estimated breeding values. The best implementations were applied to realworld data sets of some major Swiss cattle populations. Traits of interest were multiple birth, early-life calf survival and carcass con rmation scores. GLMMs and threshold models clearly improved the prediction of breeding values compared to LMMs when applied to simulated binary and ordinal traits. Bayesian implementations performed relatively slow for small data sets but returned trustworthy standard errors of the estimated breeding value by accounting for the uncertainty of variance component estimation. The improvements also came at a higher computational cost, however, the cost was largely reduced by assuming known variance components. A similar strategy was successfully applied to the much larger real world data sets by separately estimating variance components and animal breeding values. This study shows that GLMMs and threshold models can and should be applied for non-normal traits in order to improve the properties of estimated
breeding value and obtain unbiased herita-bility estimates which allow for well-informed constructions of selection indices.
Cyrill Scheidegger Conditional Independence Testing: The Weighted Generalised Covariance Measure Prof. Dr. Peter Bühlmann
Dr. Julia Schulte
Aug-2020
Abstract: In this thesis, we introduce the weighted generalised covariance measure which is a test for conditional independence. Our test is an extension of the recently introduced generalised covariance measure by Shah and Peters (2018). To test the null hypothesis of X and Y being conditionally independent given Z, our test statistic is a weighted form of the sample covariance between the residuals of nonlinearly regressing X and Y on Z. We propose different variants of the test for both univariate and multivariate X and Y . We give conditions under which the tests yield the correct type I error rate. Finally, we compare our tests to the original generalised covariance measure using simulation. Typically, our test has power against a wider class of alternatives compared to the generalised covariance measure. This comes at the cost of having less power against alternatives for which the generalised covariance measure works well.
Muhammed Ali Tamer Regularized high-dimensional covariance estimation methods and applications in Machine Learning and Portfolio Theory Dr. Markus Kalisch Aug-2020
Abstract:
Covariance estimation plays an essential role in modern multivariate statistical analysis and many applications in practice require accurate estimates of high-dimensional covariance and precision matrices. In re- cent research numerous parametric and non-parametric methods have been proposed and developed to overcome the curse of dimensionality in covariance estimation under a small sample size.
In this thesis, we give an overview about latest developments on esti- mating regularized covariance matrices and focus mainly on approaches which are based on componentwise regularization (banding, tapering, thresholding), shrinkage and penalized likelihood methods. First, in order to familiarize the reader with a better understanding of the imple- mented methods, we provide a brief theoretical introduction to the reg- ularization techniques used in high-dimensional covariance construc- tion. Next, we conduct a Monte Carlo simulation study to compare the performance of the selected methods under several different covari- ance structures. For all simulations we cover the three distinct scenar- ios p < n, p = n and p > n, and report as a measure of performance the Kullback-Leibler divergence. In addition, we also evaluate the accu- racy of the regularized covariance estimators by computing the squared Frobenius distance and the spectral loss.
Lastly, we show a real-world data application of high-dimensional co- variances to a supervised machine learning task and Markowitz’s min- imum variance portfolio optimization problem. As for the latter, com- parisons are made with U.S. stock market return data.
Sofia K. Mettler Diagnostic Serial Interval as an Indicator for Contact Tracing Effectiveness during the COVID-19 Pandemic Prof. Marloes H. Maathuis Aug-2020
Abstract: Background

The clinical onset serial interval is often used as a proxy for the transmission serial interval of an infectious disease. For SARS-CoV-2/COVID-19, data on clinical onset serial intervals is limited, since symptom onset dates are not routinely recorded and do not exist in asymptomatic carriers.
Methods

We define the diagnostic serial interval as the time between the diagnosis dates of the infector and infectee. Based on the DS4C project data on SARS-CoV-2/COVID-19 in South Korea, we estimate the means of the diagnostic serial interval, the clinical onset serial interval and the difference between the two. We use the balanced cluster bootstrap method to construct 95% bootstrap confidence intervals.
Results

The mean of the diagnostic serial interval was estimated to be 3.63 days (95% CI: 3.24, 4.01). The diagnostic serial interval was shown to be significantly shorter than the clinical onset serial interval (estimated mean difference -1.12 days, 95% CI: -1.98, -0.26).
Conclusions

The relatively short diagnostic serial intervals of SARS-CoV-2/COVID-19 in South Korea are likely due to the country’s extensive efforts towards contact tracing. We suggest the mean diagnostic serial interval as a new indicator for the effectiveness of a country’s contact tracing as part of the epidemic surveillance.
Carissa Whitney Reid Bayesian Network Structure Learning from Mixed Data Prof. Dr. Marloes Maathuis
Dr. Jack Kuipers
Aug-2020
Abstract: Bayesian networks and other graphical models are powerful tools for defining and visu- ally representing the joint distributions over complex domains. However, most common methods for learning Bayesian networks are incapable of handling mixed data. This is an important challenge as heterogeneous data is ubiquitous in most disciplines. The main issue with learning Bayesian networks in the mixed case is representing the conditional distributions between nodes of differing data types — as this is necessary to derive the likelihood functions and conditional independence tests used in structure learning.
One method to address this problem is data coercion to a single data type either by discretising continuous variables or by using kernels to represent the data as continu- ous variables. Another direction employs parametric methods using node-wise regression. This thesis explored the equivalence between score- and constraint-based structure learning methods by deriving a scoring criterion based on preexisting constraint-based methods de- veloped for mixed data. The goal was to derive a scoring criterion using two approaches: the parametric likelihood-ratio test approach; and the non-parametric kernel alignment approach. Generally, the parametric methods were the best performing methods. Specif- ically, the results showed that the Likelihood-Ratio Test PC was generally better able to reconstruct more accurate graphs than any other considered method.
Colin David Grab A Maximum Entropy Deep Reinforcement Learning Approach to Financial Portfolio Optimization Prof. Dr. Bühlmann Aug-2020
Abstract: An algorithm for financial multi asset portfolio optimization with transaction costs is pre- sented within the framework of deep reinforcement learning. After an introduction into the basic frameworks and definitions of reinforcement learning, the thesis proposes a novel approach of a model-free algorithm that combines multiple ideas from deep reinforcement learning and applies them to the financial portfolio optimization problem. Specifically, based on the concept of auxiliary tasks and predictive knowledge, an approach to con- struct a self-supervised state estimator for financial market environments is introduced. Taking advantage of this concept, it serves as an input to a novel portfolio optimization algorithm based on maximum entropy reinforcement learning. The underlying mathemat- ical derivation and the findings for the portfolio optimization process are presented. It is found that despite the rather complex and sophisticated approach, the algorithm leads to a result which possesses a clear and intuitive financial interpretation.
Kevin Duncan Grab Power Market Prices and the Order Book A Study in the German Intraday Power Market Prof. Dr. Sornette
Prof. Dr. Bühlmann
Aug-2020
Abstract: To prevent blackouts, the flows in and out of the electricity grid have to be kept in balance. Therefore, every power market participant has to announce his supply or demand for a future point in time to the transmission system operator. If a participant does not abide to his own forecast, i.e., causes an imbalance, a fee is due. In order to meet his obligation, a participant is able to trade electrical energy on the “Intraday” market. The amount of the aforementioned fee is directly linked to an average price, which is calculated from all executed trades at the end of the trading period. It is therefore very valuable to be able to anticipate this average price already during the trading period, i.e. before the majority of relevant trades are executed, in order to make informed trading decisions. This thesis explores options to predict this average price by exclusively taking into account “Order Book” information. Research concerning order book information in power markets is still scarce. Furthermore, to the best of the author’s knowledge, no previously published work investigates the predictive power of order book information for short term price predictions in power markets. We propose to model the sign and magnitude of a price change separately. Employing Linear Models, Random Forests and LSTM Recurrent Neural Networks, we can show that the volume distribution in the order book contains information about future prices. The findings of this thesis are based on the complete order book data of the EPEX Spot German continuous intraday power market for the year 2019, which was generously made available by BKW Energie AG.
Ruicong Yao Minimax convergence rate under local differential privacy Prof. Dr. Sara van de Geer Aug-2020
Abstract: As personal data being more and more detailed, data privacy has become an important problem in statistical analysis. Many privacy preserving procedures have been derived in the past decades. Among them, local differential privacy is most widely used for its safety and easiness. In this setting, users data is first perturbed and then sent to a reliable server. Therefore, there is no risk even if the server is attacked. However, the question is whether the cost of statistical efficiency is acceptable?

In this thesis, we will investigate minimax convergence rate under local differential privacy. We will first present private Le Cam, Fano and Assouad methods for minimax problems and explain the technical parts in detail. Then we will use them to estimate the minimax convergence rate of location family model, nonparametric regression model and convex risk minimization. Simulation results will be provided to illustrate the theory. Finally, we will prove two necessary conditions for a mechanism to be optimal under approximately local differential privacy in the minimax sense.
Harald Besdziek Random Matrix Theory, with Applications to Data Science Prof. Dr. Sara van de Geer Jul-2020
Abstract: Developed in the 50s of the last century in the context of quantum physics, the theory of random matrices (RMT) has become a very active research area in probability theory, with connections to a wide range of mathematical fields. Recently, RMT is more and more used in data science because it enables to deal with high-dimensional data sets for which many classical statistical procedures fail. This thesis provides an introduction to RMT targeted at readers with data science or statistics background. Starting from the very basics, it moves to cutting-edge topics of research in RMT and its applications to data science.
The thesis is structured in five chapters. Chapter 1 gives a brief motivation and informs about the prerequisites. Chapter 2 reviews facts from linear algebra, and presents two concepts from advanced probability theory which have a deep connection to RMT: concentration of measure and universality. Chapter 3 introduces different types of random matrices and proves the semicircular law and Marčenko-Pastur law which are famous results about the spectrum of random matrices. Chapter 4, the main part of the thesis, analyzes the operator norm of a random matrix, derives the joint distribution of the eigenvalues of Gaussian random matrices, and finally uses this joint distribution to give precise results on the spacings between the eigenvalues – a topic that is closely connected to current RMT research. The concluding Chapter 5 introduces linear spectral statistics which are statistical estimators depending on random matrix eigenvalues, and uses them for covariance estimation in high-dimensional models. The techniques developed in the previous chapters enable to prove that the classical statistical theory fails in this setting, and provide a solution.
Naïm Lucien de Viragh Resampling, Block Resampling and the Conservative Block Length Selection Dr. Markus Kalisch Jun-2020
Abstract: The bootstrap and subsampling are two core resampling techniques. For iid data, we
have the n-bootstrap and the m-bootstrap which draw n and m < n observations with
replacement, respectively, from the sample of size n. We further have subsampling which draws m < n observations without replacement. For dependent data, we have analogous block resampling techniques which make up the resamples from data blocks of length m. (The three m's are di erent entities.) We rst introduce the n-bootstrap, the m-bootstrap, (block) subsampling and the block bootstrap and compare their theoretical properties. Next, we present algorithms from the
literature for solving the arguably biggest practical obstacle to the m-bootstrap, (block) subsampling and the block bootstrap: How to choose m. We then explore the nite-sample performance of the resampling techniques and these algorithms in the construction of condence intervals. For iid data, we look at two prime examples of n-bootstrap consistency
and inconsistency. The latter should be solved by the m-bootstrap and subsampling. For non-iid data, for practical reasons, we consider linear regression coecients and only the block bootstrap. Due to highly dissatisfactory results for the task of choosing m for the block bootstrap, we propose a new algorithm for it in the context of con dence interval
construction. We show that it performs excellently for linear regression coecients. Lastly, we include a stand-alone tutorial on block bootstrapping in R (including our algorithm) with detailed code examples.
Luca Mosetti THE EFFECT OF MACROECONOMIC AND FINANCIAL VARIABLES ON CREDIT RISK Dr. Fadoua Balabdaoui
Mirko Moscatelli
May-2020
Abstract: A number of studies incorporate macroeconomic explanatory factors in modeling rating transition probabilities, in order to establish linkages between underlying economic conditions and credit risk migration. The project aims at constructing a model linking transition matrices between credit quality states with macroeconomic and financial variables for system-wide data on loans. These aggregate transition matrices could be employed to monitor the evolution of credit quality in the economy and to generate conditional forecasts. A novelty with respect to the existing studies is that transitions will be based on supervisory categories of loan quality (performing, unlikely to pay and insolvent) rather than bond ratings, and will refer to both unlisted and listed firms.
Zhen Cui Constructing an Estimator Based on Wasserstein Distance Prof. Sara van de Geer May-2020
Abstract: Parameter estimation is a signi cant topic in statistical theory. In this work, we will build up an estimator based on the Wasserstein distance. The concept of the Wasserstein distance comes from optimal transport, and the estimator belongs to minimum distance method. We will rst
introduce the two related topics. Then we present some properties of our estimator and the Wasserstein distance, which includes the simple form of the Wasserstein distance on the real
line, the convergence of the Wasserstein distance and the estimator and the limiting rate of the Wasserstein distance. To illustrate the theoretical results, we will present several examples with
simulations and veri cation. Lastly, there will be a discussion about our method from the view of information geometry.
Daniel Smith An Introduction to Permutation Tests with Applications in ANOVA and Linear Models Submission Date: 14.04.2020 Adviser: Dr. Dr. Lukas Meier Apr-2020
Abstract: Permutation tests de ne a family of non-parametric tests which are conditioned on the given data. Whilst many testing procedures are based on the assumption that the underlying distribution is known to a certain extent, permutation tests are less restrictive. By conditioning on the data, permutation tests are distribution-free tests, and can, therefore, be applied under minimal assumptions to a broad class of decision problems. The central assumption made refers to the exchangeability of the permuted units, rendering the testing procedure exact. These two properties - the exactness and property of being distribution-free - allows permutation tests to be very attractive alternatives to parametric
testing procedures, who often rely on assumptions that may be dicult to meet or to verify in real data. This thesis provides a general introduction to exact and approximate permutation tests applied on ANOVA designs and linear models. Chapter one introduces the set-up for
general statistical testing problems which are applied to parametric tests, as illustration. Chapter two provides an introduction and a general set-up for permutation tests, focusing on the one- and two-sample permutation tests for the location parameter as references.
Whilst chapter two illustrates the intuition of permutation tests, chapter three is devoted to their mathematical framework, and veri es the exactness and the distribution-free property. Chapter four devotes itself to ANOVA models in a permutation framework. On the one hand, a general approach is introduced for exact testing procedures in order to test
the signi cance of a factor contained in an ANOVA model whenever an exact permutation
test exists. On the other hand, this chapter is equally devoted to approximate test procedures for cases where no exact test exists. In the last chapter, permutation tests for linear models are presented. Firstly, cases in which an exact test are examined, followed by a comparison of three approaches for approximate permutation tests in a linear model framework. The inserted R code was created by the author in order to illustrate some of the central aspects of the testing procedures. More sophisticated implementations of many permutation
tests in R can be found in the packages lmPerm and coin. To get a clean and nicely
rendered output of the self-created R functions, the print methods listed in the appendix should be executed in R before using the functions. However, all functions can be executed without these new print methods, creating a list as output.
Belinda Mueller Real-time Classification of Breathing Phases based on Short-time Stationary Audio Signals Dr. Markus Kalisch
Prof. Dr. Tobias Kowatsch
Yanick Xavier Lukic
Apr-2020
Abstract: The mobile application Breeze developed by Shih, Tomita, Lukic, Reguera, Fleisch, and
Kowatsch (2019) guides slow-paced breathing training to promote mental health and an improved cardiac functioning. The app instructs a specific breathing pattern and simultaneously records and classifies its user’s breathing into inhalation, exhalation and pausing in order to provide visual, gamified biofeedback. Thus far, Breeze achieves a high classification accuracy but relies on a large temporal context for its predictions, leading to a feedback
latency which might constitute a disincentive to the user. This work has re-designed the existing classification algorithm to achieve a real-time classification of breathing phases
without compromising on accuracy. The central ingredients for this new approach are i) audio features that rely on only a short, stationary temporal context, and ii) classification methods that manage a good bias-variance trade-off to accommodate the high variance
that is introduced by different users in different acoustic environments.

We evaluated our approach offline on three-minute breathing recordings of 20 subjects that were collected by Shih et al. (2019). In this study, the participants were instructed to follow a specific temporal pattern of inhalation, exhalation and pausing. To test our
approach, we set up a multi-class classification problem with the classes inhale, exhale and pause, plus an additional noise class that subsumes all background noise that makes the breathing signal indiscernible. We found that with gradient tree boosting we can achieve an accuracy of 86.2% based on a short temporal context of 23ms. These results
represent an important step towards launching Breeze with real-time, valid biofeedback. Further research is needed to identify potential run-time bottlenecks of our approach when implemented in a smartphone, and to assess its robustness to the high variance between different users, acoustic environments and smartphone microphones of different providers.
Yll Haziri Unsupervised Feature Selection by AutoEncoders with Local Structure Preservation Prof. Dr. Marloes H. Maathuis
Jinzhou Li
Mar-2020
Abstract: High-dimensional datasets in the machine learning setting bring difficulties in terms of computational cost, accuracy, and low interpretability of the learned models.
Feature Selection (FS) is a dimensionality reduction technique aiming to choose an easily interpretable subset of features without significant loss of information. This thesis concentrates on unsupervised FS, which is a less studied and more challenging problem due to the lack of class labels that usually steer the search for relevant features.
We start our journey in this area by first reviewing state of the art methods, which make use of spectral clustering tools and score the features based on their importance for preserving the local neighborhood-structure of the initial dataset. Afterwards, we study a method that uses AutoEncoders with structural regularizations in order to select a subset of features able to reconstruct every initial feature.
The main contribution of this thesis is the incorporation of these two approaches into one method. We first propose augmenting the existent AutoEncoder setting with a constraint that ensures the preservation of the initial local neighborhood-structure in the transition to the lower-dimensional representation provided by the hidden layer of the AutoEncoder.
Secondly, we propose a method that learns a pre-constructed neighborhood-structure preserving embedding with the help of a modified deep neural network, and selects features based on their importance for the construction of this target.
We also present three evaluation measurements to assess the quality of the selected subset
of features, where we propose a way of selecting penalization parameters of the proposed techniques, that optimize the results provided by one of the evaluation indicators.
In the last part of this thesis, we perform experiments on different benchmark datasets
in order to compare the performance of the proposed methods with existing techniques. Specifically, we experimentally check, whether additionally preserving the local neighborhood- structure improves the feature selection process. As expected, the proposed methods show an improvement in terms of measurements that quantify how well the selected subset of features is able to differentiate among the natural clusters in the data.
Alexia Pastré Learning orders of mutations in cancer progression Prof. Dr. Marloes Maathuis
Prof. Dr. J. Quackenbush
Dr. R. Burkholz
Mar-2020
Abstract: Genetic mutations accumulate over time and contribute to the development of cancer. The order in which such mutations occur provides insights into cancer progression and might help on the long run to identify
key time points for drug intervention via targeted treatments. We model
the successive accumulation of mutations by a cascade process evolving in
discrete time on an (unobserved) Directed Acyclic Graph (DAG) formed by interacting mutations that are equipped with a binary state. The aim of this work is to infer the DAG and the cascade model parameters
based on mutational data summarized at the gene level. These imply likely orders of mutations in time, yet we observe only the end state of the proposed cascade process. In this setting, we utilize the Bayesian
network learning framework developed in BiDAG (R) to infer a posterior
distribution of DAGs and cascade model parameters. We apply this methodology to colorectal cancer data and contrast results from two more general parametric propagation models. Additionally, we show the consistency of our method on synthetic data.
Hongkyu Kim Regression Discontinuity Design Prof. Dr. Marloes Maathuis Mar-2020
Abstract: The regression discontinuity (RD) design is a branch of the observational study, which locally resembles the randomized experiment and is used to estimate the local causal effect of a treatment that is assigned fully, or partly by the value of a certain variable and a threshold. As the RD design is the subject of the causal inference, important concepts of the causal inference are covered to properly proceed discussions. Based on those concepts, the fundamental idea and structure of the RD design are explained including two sub types of the design: the sharp and the fuzzy RD designs. Furthermore, the assumptions of the RD design is formulated, which have been slightly different in different fields. In order to accurately estimate the local causal effect without confounding, we introduce the bandwidth and use the data that are within the bandwidth away from a threshold only. Since there is still no settled way of finding a ``good" bandwidth, we propose a novel approach for bandwidth selection along with two existing methods. Performances of these bandwidth selection methods are compared with simulated data, and it can be inferred that the newly proposed method may yield better results. At the end, we intentionally violate the unconfoundedness assumption and analyze three potential confounding models with simulated data.
Fabian Patronic Predictions in Mixed Models Dr. Lukas Meier Mar-2020
Abstract:
This thesis gives an introduction in mixed models and compares empirically different meth- ods of estimating the prediction intervals in a simulation study. The issue in estimating such intervals lies in the additional source of variation, that is induced by the random effects. The point estimate is estimated usually by the best linear unbiased estimator BLUE, except for the Bayesian method, which uses the mean of the posterior distribution. Marginal and Conditional prediction uses the prediction error made by the best linear unbiased predictor BLUP. Those errors are estimated by the distribution of the BLUP (Henderson 1950). Prediction intervals estimated by bootstrap methods simulate the er- ror made by the BLUE compared to the bootstrapped samples. It showed that the method using Bayesian statistics has the best coverage rate. It uses the posterior distribution and its quantile as the prediction interval. Finally, a cheese tasting example is shown and a guidance how to use the different methods in R is given.
Mattias Hemmig CAUSAL INFERENCE AND LOW-RANK ESTIMATION Prof. Dr. Peter Bühlmann
Dr. Armeen Taeb
Mar-2020
Abstract: In this thesis, we study causal Gaussian models that have a small number of latent confounders. More precisely, our goal is to estimate the Markov equivalence class of the under- lying directed acyclic graph structure of the observed variables, given a sample of independent observations. This problem can be formulated as an optimization problem where the latent con- founder structure corresponds to a low-rank constraint, which is non-convex and difficult to deal with. We study and implement al- gorithms to approach this optimization problem and make it more computationally feasible. One of the proposed algorithms is based on nuclear norm regularization and one is based on projected gra- dient descent.
Daniela Nguyen The Rasch model and its extensions from a statistical point of view Dr. Lukas Meier Mar-2020
Abstract: The Rasch model is the most classical and popular model for psychological and educational testing. In this thesis, we introduce the Rasch model and its properties along with its correspondence to logistic regression. Due to the inconsistency of the full maximum
likelihood estimators, we present two alternative approaches for the item parameter estimation based on the consideration of the people parameters. We also present different extensions of the Rasch model which can consider polytomous responses or relax some restrictive assumptions of the Rasch model.
Throughout this work, estimation methods and models are supported by some implementations in R.
Lorraine Electre Bersier Estimating the real demand based on censored data Prof. Dr. Marloes H. Maathuis
Alexandra Stieger-Federer
Mar-2020
Abstract:
In order to plan accurately the rail transport of goods across Switzerland, the real demand has to be known in advance. However, during the booking process at SBB Cargo, said de- mand is not stored. As a result, most of the prediction made is based on the effectively transported quantity of goods and not on the customers’ demand. To be able to propose in the future a more attractive offer, this master thesis derives estimates of the percentage of unsatisfied clients’ demands.
The booking system of SBB Cargo is first thoroughly investigated and explained. The most important factors constraining the demand are found to be number of train drivers and locomotives, the routes’ availability and the trains’ capacity. The focus in this thesis is on the last one.
Then, various statistical methods used to estimate the distribution of censored data are explained. Several censored parametric models are proposed and their right-censored log- likelihood functions are derived. Furthermore, it is described how the goodness-of-fit of censored parametric distributions can be tested and two nonparametric maximum likeli- hood estimation methods for censored data are explained.
Afterwards, it is observed that the trains’ load factor variable appears to be censored at a value around 90% and 110%. In order to obtain the unconstrained demand from this variable, censored parametric distributions are fitted. The Weibull distribution is shown to be the most accurate one. Using this, it is observed that around 8.9% of the clients’ demands are unsatisfied due to the limited trains’ capacity. This value is very encouraging. Indeed, SBB Cargo estimates that around 11% of the clients’ demands are unsatisfied due to all the constraining factors. Finally, this shows that the use of the censored Weibull distribution, to estimate the demand from the bounded trains’ capacity variable, returns very promising results.
Helgi Halldórsson Causal Inference on Partially Observed Data using Partial Deletion and Multiple Imputation Prof. Dr. Marloes Maathuis
Leonard Henckel
Mar-2020
Abstract: Missing data is a common problem which can significantly affect any analysis. Partial Deletion and Multiple Imputation (MI) are two methods which allow us to use estimators which require complete data. In this thesis, we provide sufficient graphical conditions which allow us to modify the Adjustment Formula for inferring on causal effects using data with missing values while using Available Case Analysis, a special case of Partial Deletion, or MI. The graphical conditions make use of m-graphs, or missingness graphs, which are a useful extension of causal graphs that allow us to encode causes of missingness into causal graphs. For MI, we focus on Joint Modelling (JM), which imputes each missing value per row simultaneously. A short discussion on Fully Conditional Specification, an alternative to JM, follows. We also present a modified Adjustment Formula which takes advantage of both PD and MI, by first removing rows with missing values in a subset of the variables, and then imputing the remaining missing values. Finally, we relax the conditions of previous results such that any bias caused by the missing data is avoided, but confounding bias remains.
Ramona Wechsler Detecting predictive biomarkers for non-small cell lung cancer patients undergoing immunotherapy: A retrospective analysis of data from the University Hospital Zurich Prof. Dr. Marloes H. Maathuis
PD Dr. med. Alessandra Curioni
Dr. Stefanie Hiltbrunner
Mar-2020
Abstract: The aim of this thesis is to reveal possible predictive biomarkers for the effect of immuno- therapy on overall survival and tumor response for non-small cell lung cancer patients. The used data was collected from patients treated with immunotherapy from March 2014 to January 2019 at the University Hospital of Zurich. The analyses are performed on two groups of patients. Patients who received chemotherapy before immunotherapy are called further line IT patients in this work and patients who got immunotherapy as first line treatment, sometimes combined with chemotherapy, are referred to as first line IT patients. The applied methods are survival analysis, random forest and logistic regression.
One possible predictive biomarker was detected for each of the patient groups with survival analysis for the effect of immunotherapy on overall survival. Further line IT patients with a higher PD-L1 expression in tumor cells had a longer overall survival than patients with a lower expression. In the first line IT group patients with a higher lymphocyte count had a longer overall survival than patients with a lower lymphocyte count. These results have to be treated with caution, since the risk of false positive findings is very high in this work. The revealed biomarkers should be verified with further studies.
No predictive biomarker was detected from classification with random forest or logistic regression of tumor response after three and after six months of immunotherapy. The patient groups were small and therefore the risk to miss real predictive biomarkers is high for all analyses in this thesis.
As a separate part in this work the risk of p-value hunting is described and a simulation to show the issue is performed. P-value hunting and multiple testing are huge problems in many clinical studies.
Lorraine Electre Bersier Estimating the real demand based on censored data Prof. Dr. Marloes Maathuis
Alexandra Stieger-Federer
Mar-2020
Abstract: In order to plan accurately the rail transport of goods across Switzerland, the real demand has to be known in advance.However, because of different reasons according to the booking process at SBB Cargo, the real demand is not available.As a result, most of the prediction made is based on the effectively transported quantity of goods and not on the customers'
demand. To be able to propose in the future a more attractive offer, this master thesis derives estimates of the percentage of unsatisfied clients' demands.

The booking system of SBB Cargo is first thoroughly investigated and explained. The most important factors constraining the demand are found to be number of train drivers and locomotives, the routes' availability and the trains' capacity. The focus in this thesis is on the last one.

Then, various statistical methods used to estimate the distribution of censored data are explained. Several censored parametric models are proposed and their right-censored log-likelihood functions are derived. Furthermore, it is described how the goodness-of-fit of censored parametric distributions can be tested and two nonparametric maximum likelihood estimation methods for censored data are explained.

Afterwards, it is observed that the trains' load factor variable appears to be censored at a value around $90\%$ and $110\%$. In order to obtain the unconstrained demand from this variable, censored parametric distributions are fitted. The Weibull distribution is shown to be the most accurate one. Using this, the percentage of unsatisfied clients' demands due to the limited trains' capacity is computed. The value found is very encouraging. Indeed, this value is close to the value computed at SBB Cargo using all the limiting factors. Finally, this shows that the use of the censored Weibull distribution, to estimate the demand from the bounded trains' capacity variable, returns very promising results.
Davide Luzzati Self-induced crises in a DSGE model Dr. Fadoua Balabdaoui
Prof. Michael Benzaquen
Feb-2020
Abstract: In recent years, new ways of ”Rethinking Macroeconomics” have started to make their way into Central Banks and think tanks. However, Dy- namic Stochastic General Equilibrium (DSGE) models have been and still are the workhorse for monetary policy despite their poor perfor- mances in face of the Financial Crisis of 2008-2009. In this work we set out a model which takes inspiration from a standard money-in-the- utility DSGE, but which differs for the presence of a feedback mecha- nism on the propensity of individuals to hold cash and bonds. Namely, that people look also at what others do before optimizing their stan- dard utility function, in a similar fashion to the KUWJ (Keeping Up With the Joneses) phenomenon. Our aim is twofold. Firstly, we want to show that the presence of such mechanism is responsible for the system to go through a phase transition, from an economy with one equilib- rium point to one with three. Secondly, we plan to make our model as consistent as possible without incurring into Keynesian anomalies such as the Zero Lower Bound or relying on Rational Expectations theories for solving the Euler equation for consumption.
Elliot Leeroy Beck Domain generalization and dataset shift Prof. Nicolai Meinshausen
Dr. Christina Heinze-Deml
Feb-2020
Abstract: Machine Learning algorithms and in particular artificial neural networks are currently used in a great variety of computer vision applications. In cases where the training and the test data are generated from the same distribution, these methods achieve good prediction performance on many tasks. However, under a distribution shift between the training and test data, the same methods tend to perform poorly. Domain generalization (DG) methods attempt to address this issue. In this thesis, we implement a selection of state-of-the-art DG methods. Moreover, we use benchmark datasets to compare the results of our implementations with results reported in the original papers. We find that reproducing the original results is a difficult task due to various reasons. We also find that the distribution shifts contained in DG benchmark datasets are probably not challenging enough to provide evidence for the generalization ability of newly proposed DG methods.
Stanimir Ivanov Fixed-Interval Survival Analysis with Random Forests for Default and Prepayment of Individual Residential Mortgages Prof. dr. M.K. (Marc) Francke
A. A. (Alex) de Geus
Feb-2020
Abstract: The Dutch National Mortgage Guarantee (Nationale Hypotheek Garantie, or NHG) is a government backed scheme for residential mortgages that provides insurance in case of default due to an unfortunate event such as illness, unemployment, divorce or the death of a partner. A central problem for the NHG, shared by all other insurance schemes is the determination of risk for an individual guarantee. This thesis develops a random forest model for fixed-interval survival analysis that provides dynamic predictions of year ahead annual mortgage default and prepayment probabilities conditional on all presently available information for an individual mortgage. It takes into account baseline individual mortgage characteristics as well as time-varying economic data. It is fitted as a classification forest with a splitting rule that stratifies events by duration. Prediction can be done with existing random forest software after the application of a post processing step that recovers the estimated interval hazard probabilities at each leaf in the forest. An important consideration is that bootstrapping a balanced number of events by sub-sampling majority classes is necessary in order to mitigate issues caused by class imbalance. Compared to generalized linear Cox models for fixed-interval data, the fixed-interval random forest is able to estimate meaningful event-specific survival curves that separate defaults and prepayments from active mortgages. Although distinguishing between prepayments and defaults is still difficult, the event-specific survival curves can provide insight into the risk profile of an individual mortgage.
Christoph Schultheiss Multicarving for high-dimensional post-selection inference Prof. Peter Bühlmann
Claude Renaux
Feb-2020
Abstract: We consider post-selection inference for a high-dimensional (generalized) linear model. Data carving (Fithian et al., 2014) is a promising technique to perform this task. However, it suffers from the instability of the model selector and hence leads to poor replicability, especially in high-dimensional settings. We propose the multicarve method inspired by multisplitting, to improve upon stability and replicability. Furthermore, we extend existing concepts to group inference and illustrate the applicability of the method for generalized linear models.

2019

Student Title Advisor(s) Date
Eufemiano Fuentes Pérez Review of bootstrap principles and coverage analysis of bootstrap confidence intervals for common estimators Dr. Markus Kalisch Dec-2019
Abstract:
The bootstrap is a statistical technique that has been around for 40 years, since it was introduced by Efron (1979). Its use in practice is widespread, but many practitioners do not fully understand its limits, and under which circumstances it works or does not work. This thesis tries to address that, first by diving into the theoretical underpinnings of the bootstrap and then by analysing its performance in some scenarios in practice. Chapter 1 starts with an introduction to the bootstrap, its fundamental principles and how it works. Chapter 2 gets into when the bootstrap works (consistency) and when it does not, and if it works at which rate it does so (accuracy). Chapter 3 explores a number of first- and second-order accurate bootstrap confidence intervals, introduces a general technique to improve bootstrap intervals called double bootstrap and presents the two most well-known R packages to implement the bootstrap. Chapter 4 illustrates the results from a coverage analysis of bootstrap intervals in different scenarios for the sample mean, sample median and sample (Pearson) correlation coefficient, computed via simulations. Chapter 5 makes a summary of the thesis and presents a list of conclusions. Finally, Chapter 6 outlines interesting points and topics we wanted to explore further but had no time for them.
Pietro Cattaneo Practical approaches for fitting additive models Dr. Lukas Meier Oct-2019
Abstract: An usual challenge when working with data is to determine the function which relates some predictors to a response variable. Often this relationship is quite complex and a smoothing approach is required in order to detect its shape.

The aim of this thesis is to provide different methods for the approximation of such unknown smooth functions. In particular, these approximation results will be used for the fitting of additive models. The basic theoretical features are presented and supported by practical simulations.
Xi Chen Error Correction for Building Reliable Sequence de Bruijn Graphs Dr. Markus Kalisch
Prof. Dr. Gunnar Rätsch
Dr. Andre Kahles, Mikhail Karasikov
Sep-2019
Abstract: Although de Bruijn graphs serve as a fundamental data structure for many applications in bioinformatics, reliable construction of these graphs from high-throughput sequencing data remains a challenge, especially in complex settings such as metagenome sequencing where genomes are sequenced with uneven coverage and sequencing errors are hard to correct from rare genomes. Here, we present a new error correction method which incorporates the information of error patterns across multiple metagenomic samples, and show that the proposed method outperforms other state-of-the-art methods for sequencing error correction on simulated datasets. The algorithm has two steps. First, it tries to recover the underlying genomes from which sequences are generated, and then decompose the graph based on the information of the inferred genomes. We further show that the proposed method generates graphs that are less fragmented, and can yield better results for downstream analysis, such as genome assembly.
Tommaso Portaluri Reproducibility: a quantitative measure for the success of replication studies Dr. Markus Kalisch
Prof. Em. Werner Stahel
Sep-2019
Abstract: With the ongoing crisis of science, reproducibility of research has become critical to science credibility. But what makes a replication successful? Regardless of the rising attention that the topic has gained in a variety of fields, there is no consensus on how to define replication success. Moreover, current assessments of replication either rely on a mix of qualitative judgements on significance and direction of the effects or provide a standardised quantification. This thesis aims at overcoming both problems by proposing a quantitative measure of the success of replication, the Effect Size Discrepancy, which allows comparison of unstandardised effect. A general mathematical framework is provided, alongside some operalisations for most common statistical models; for demonstration purpose, an example of computation in a real replication project is also provided.
Jiawen Le A Machine Learning Approach to Systematic Trading Strategies Prof. Dr. Peter Bühlmann
Prof. Dr. Markus Leippold
Sep-2019
Abstract: The predictability of a collection of stock return predictors is comprehensively studied in this research. In particular, moment-based variables such as signed jumps and realized skewness are computed from intraday high-frequency data. Based on each individual variable, stocks are sorted into quintile portfolios and a zero-net-investment long-short portfolio is constructed by buying stocks in the top quintile and selling the stocks in the bottom quintile. For a number of variables, the long-short portfolio reports economically large weekly and monthly returns, and the returns remain statistically significant after adjusting the effects from systematic risk factors. This suggests that the relation between the predictor variable and future stock returns is strong and this relation is not captured by the systematic risk factors.
To investigate the benefits of simultaneously incorporating multiple variables in return predictions, a variety of machine learning methods are employed to combine predictors. Portfolios based on machine learning forecasts are constructed accordingly. For all the machine learning methods, a monotonically increasing pattern in average weekly returns is observed from the bottom quintile to the top quintile, indicating that the actual weekly returns are consistent with the machine learning forecasts. Compared to the single-variable sorted portfolios, machine learning portfolios deliver relatively stable returns all the time, including the time periods with significant market turbulences. The results for the monthly machine learning portfolios are similar, except that the predictability of the combination of predictors appear to be weaker than in weekly return predictions.

Keywords: Cross-sectional return prediction; moment-based variables; machine learning strategies; high-frequency data.
Isidora Durić Trend Filtering on Graphs Prof. Dr. Nicolai Meinshausen Sep-2019
Abstract: Trend filtering is a widely exploited technique among a variety of disci- plines, either in a univariate setting or on graphs. The l1 graph trend fil- tering estimate is defined as the minimizer of a penalized least squares criterion, in which the penalty term sums the absolute kth order dis- crete graph differences. Due to the usage of the l1 norm, it exhibits a high level of local adaptivity which cannot be manifested by the usual l2 norm-based graph smoothers. In addition to this, the characteristic which l1 graph trend filtering estimates have due to the l1 norm is a piecewise polynomial structure. The aim of this thesis is to demon- strate the great value of l1 graph trend filtering method and its char- acteristics through application and to make suitable code for its imple- mentation. First, in order to apply the method to our dataset, we need to make a model for which purpose we use the statistical software R. After all necessary tools are made, we conduct data analysis on a real climate dataset. The method is adequately adapted for two different im- plementations, each one corresponding to each way the data is chosen. In the end, we make a comparison of the two setting performances.
Keywords: trend filtering, graph smoothing, l1 regularization, piecewise polynomial estimates, local adaptivity
Hannah Muelder Aggregating Individual Social Group Perceptions in Social Networks Prof. Dr. Marloes Maathuis
Prof. Dr. Christoph Stadtfeld
Dr. András Vörös
Sep-2019
Abstract:
Social groups fulfil important and various functions in society and personal lives, setting and shaping norms, behaviours, providing guidance or support. Their properties, such as organisational set-up, function and structure, are thus diverse. Research on social groups in social network analysis is largely focused on analytical concepts (e.g., cliques, clans), or community detection. Both approaches derive groups from dyadic relations, for example friendship perceptions, and are optimised to find groups with specific structural properties. This is not necessarily in line with members’ perception, or combinable with the variety of social group structures in society. A method to identify social groups that are various in structure and function, is thus required. We approach this issue by using individual social group perception (ISGP) data, study participants’ self-reported groups and their members. We first propose a hierarchical clustering to derive groups by aggregating overlapping perceptions. Secondly, we explore the structure and composition of these groups, in order to validate the method and explore future research.
Our group perception data is a cross-sectional observation of first-year Bachelor students from the Swiss StudentLife Study, a longitudinal social network survey. The variables include network data, e.g. friendships, individual and group attributes. Central is a new measure of ISGPs: individuals report groups they are members of, and names of other group members. Problematic here is, that each members’ perception can vary with respect to who belongs to the group. In order to find sensible representations of social groups, we need to aggregate the individual perceptions step-wise into Aggregate Social Group Perceptions (ASGPs). An ASGP consists of a list of egos, that report the aggregated ISGPs, and a list of alters, who are nominated as group members.
For the aggregation procedure we develop a measure of similarity based on the overlap between different group perceptions. This measure ranges from 0 to 1, where 1 represents full accuracy: all egos agree on all members, and 0 inaccuracy: no overlap between group members’ perceptions. Importantly, the measure allows for merging ISGPs and ASGPs to perform meaningful clustering. The clustering aims at consistent ASGPs, specifically accurate and complete ones. Completeness is defined as the ratio of egos over members. We run hierarchical clustering on the ISGPs, and choose a suitable level of aggregation based on maximum average ASGP consistency.
The output comprises of new representations of social groups. These we validate by comparing them to the dyadic friendship ties within each ASGP, homogeneity of node at- tributes and group attributes of the ISGPs they are composed of. We find that ASGPs are more diverse in structure than detected communities. They include for example otherwise isolated nodes. The ASGPSs we produce thus add an additional factor to the analysis of network mechanisms and their dynamics - such as information diffusion or adaptation.
Leslie O’Bray Learning Vector Representations of Graphs Using Recurrent Neural Network Autoencoders Prof. Dr. Marloes Maathuis
Prof. Dr. Karsten Borgwardt
Dr. Bastian Rieck
Sep-2019
Abstract: Increasingly, more and more data is stored as graph-structured data, across a wide range of fields such as bioinformatics, social network analysis, telecommunications, and others. Traditional clas- sification methods are often unavailable for graph-structured data, since many operate on vector data, and it isn’t immediately clear how to best compress the rich information in the adjacency ma- trix into a vector, particularly when the ordering of the nodes in the adjacency matrix is random. Nevertheless, finding a way to compare and classify graphs would be useful in many domains, and is an ongoing area of research. This thesis investigates a new method that proposes using recurrent neural network autoencoders to learn a vector representation of graphs, to be used for compar- ison and classification, which we evaluate on bioinformatics benchmark datasets. We write our own implementation of the proposed method to compare our results with the original authors’ findings, as well as compare it to other related state-of-the-art methods. Then, we conduct a few experiments to further understand the method and try to improve its results.
We found the method yielded similar results to what the authors claimed, but also that the method gives largely comparable performance to the existing sequence-based methods that learn node embeddings. Through our experiments we learned that the specific classifier used to classify the learned vector representation is interchangeable, which enabled us to reduce the computa- tional complexity of the method to be linear, rather than quadratic, in the number of graphs. Additionally, we found that performance can be improved by updating the graph embedding function used to summarize all the vectors representing a single graph. Finally, we tested sim- plifying the approach by performing the classification directly in the neural network. While this yielded lackluster results, it provided useful insight into what network architectures would be better suited for such a problem, providing direction for future work in the area.
Lorenz Herger Multiple Testing Adjustments based on Random Field Theory Dr. Markus Kalisch Sep-2019
Abstract: Researchers in fields such as brain imaging and biomechanics frequently face large-scale multiple hypothesis testing problems where spatial dependencies between the individual hypotheses should be taken into account. To address this type of problem, a set of special- ized methods based on the mathematical theory of random fields has been developed and widely adopted. These methods rely on a number of parametric assumptions and the un- derlying mathematical theory is complex. In this thesis, we aim to provide a thorough and accessible overview of the theory and the assumptions behind these methods. Moreover, we conduct simulation experiments to evaluate one of these methods, known as peak-level inference, under a range of different conditions and benchmark the results against those achieved with an alternative nonparametric approach. In all of our simulation experiments peak-level inference succeeds in controlling empirically measured family-wise error rates at the desired level. However, we also observe that the method can become very conser- vative if a relatively narrow set of conditions is not met. The alternative nonparametric technique for multiple testing correction, on the other hand, achieves observed family-wise error rates that are close to the desired level in all of our simulation experiments.
Koen Vernooij Neural Architecture Search and Efficient Transfer Learning with Google AI Prof. Dr. Marloes Maathuis
Prof. Dr. Ce Zhang
Dr. Andrea Gesmundo
Aug-2019
Abstract:
Devendra Shintre Modelling Forex Market Reflexivity using Self-Exciting Point Process and Ensemble Learning Prof. Dr. Didier Sornette
Sumit Kumar Ram
Dr. Markus Kalisch
Aug-2019
Abstract:
Efficient market hypothesis (Bachelier, 1900) has continued to dominate the dis- course of finance and advocates the absence of arbitrage opportunities based on technical analysis. Its applicability and limitations have been pointed out as styl- ized facts in the economics literature. The thesis tries to address the inefficiencies or reflexivity in forex market.
We assume that the forex market is reflexive (Soros, 2015) and model the endogenous component of the market as a multi-variate self exciting conditional point process. We use power law memory kernels for modeling the endogenous correlations (Bouchaud, Kockelkoren, & Potters, 2006). Based on this market model we design a set of features using the past log returns, for a fixed time window, which take advantage of an ensemble of predefined power law memory kernels to classify the future returns, with the help of a random forest classifier (RF).
An algorithmic trader is designed to exploit the predictions from our model, which takes long and short positions, uses trailing stop loss to cut the excess loss and makes trades within a fixed holding period. We train RF model on 23-10-07 14:17 to 15-11-15 17:18 (∼ 8 years, 1 min sampling forex data for "AUDCAD" pair), train the trader on 15-11-15 17:18 to 12-04-16 00:32 (5 months). We test RF on 15-11-15 17:18 to 05-09-16 04:34 (10 months) and trader on 12-04-16 00:32 to 05-09-16 04:34 (5 months).
With our trained model we predict the drift of the price time series during holding period with an increment of 10% precision when compared with the random predictions. Our trader achieved Sharpe ratio of 1.93 on test set and outperforms the Buy and Hold, and noise trader strategies with sufficient margin.
Junhyung Park Kernel Measures of Conditional Independence Professor Sara van de Geer Aug-2019
Abstract: Hilbert-Schmidt Independence Criterion (HSIC) and Finite-Set Independence Criterion (FSIC) are non-parametric kernel tests of (unconditional) independence, based on the cross-covariance op- erator between two random variables. In this thesis, we explore direct analogues of HSIC and FSIC in the conditional setting (where the conditioning variable is continuous), which we call the Hilbert-Schmidt Conditional Independence Criterion (HSCIC) and Finite-Set Conditional Indepen- dence Criterion (FSCIC). Necessary and sufficient conditions for conditional independence are es- tablished using HSCIC and FSCIC; we do not have a strict equivalence in the case of FSCIC, but an almost sure condition, which can nevertheless be used in the practical setting. Empirical estimates for these criteria are obtained via function-valued regression, and unlike previous results, which re- strict the output reproducing kernel Hilbert space (RKHS) to be finite-dimensional, we show that the regression is consistent for infinite-dimensional RKHSs too. Finally, we demonstrate how these estimates can be used to test for conditional independence through several simulation studies.
Yilei Zhang Robust Regression Models: Anchor Regression and Lasso Prof. Dr. Nicolai Meinshausen Aug-2019
Abstract: The heterogeneity in large-scale data can lead to problems in traditional predictive mod- els, which usually assume the data used in training and prediction are from the same distribution, because this assumption is not likely to hold anymore. Therefore we intro- duce the concept of distributional robustness, which enables us to obtain estimators that can perform robustly among a set of distributions. In the first part, we introduce an- chor regression, a method leveraging exogenous variables in a structural equation model to estimate the impact of changing outside factors on training distribution. Anchor re- gression can thus give us estimators that have robust predictive performance among a set of shift-intervened distributions, where the strength of intervention is adjustable. In the second part we present that the well-known l1 and l2 penalized least square regressions, i.e, lasso and ridge regressions, also have robust properties, and we can exploit the connec- tion between the regularizer and robustness to construct convex penalties, which allows us to obtain robust estimators in more general cases. Finally, we show that the three robust regression models, anchor regression, lasso and ridge can be written into a uniform minimax form, which sheds lights on the connections and differences between these robust regression models.
Francesco Masiero Analysis of Online Anonymous Quotes for Non-Life Insurance Pricing Processes Optimization Dr. Lukas Meier Aug-2019
Abstract: Today, new data sources are arising in insurance companies, thanks to the ongoing digi- talization experienced by this industry. Exploitation of this information is key to optimize the pricing processes for insurance products. In this thesis, we analyse online anonymous quotes produced on the AXA online car insurance premium calculation tool, with the aim of modelling the offer request probability, as well as its price elasticity. In the first part of the thesis, we apply different statistical learning techniques, namely L1 regular- ized logistic regression, generalized additive models and tree ensemble methods such as XGBoost, introducing first their theoretical foundations. We conclude that L1 regularized logistic regression offers an interpretable, yet powerful technique to model our data. A novel approach, based on generalized additive models with L1 regularized logistic regres- sion variable selection, is also a competitive alternative. XGBoost provides comparable prediction performance, paying the additional cost of more difficult model interpretation. In the second part of the thesis, we study the price elasticity of the offer probability, concluding that it is larger for online offer compared to offline offer.
Yan Yici Analyzing Data With Non-Linear Observations Via The Single-Index Model Prof. Dr. Sara van de Geer Aug-2019
Abstract: Single-index modeling is a statistical approach which can be viewed either as a compromise between parametric and purely nonparametric methods, or as a natural extension of the generalized linear model. It is widely adopted in econometric and biometric studies, as it offers rich flexibility in modeling real-world phenomena. From a theoretical point of view, the single-index model (SIM) enjoys various advantages such as estimation efficiency. In this thesis, we investigate the mathematical properties of the SIM under both classic and high-dimensional settings. We first provide a selective survey on estimating the model. Then a comprehensive review on the equivalence between non-linear and linear measurements under the structured SIM is presented. Finally we propose a novel idea of sparse signals recovery under single-index framework.
Luyang Han Robust Sure Independence Screening on High Dimensional Data Prof. Dr. Sara van de Geer Aug-2019
Abstract:
Sure Independence Screening (SIS) is a variable selection method that uses marginal correlation to to preselect important predictors for a regression model where the number of predictors p is largely greater than the number of of observations n. The procedure reduces the high dimensionality of predictors to a size that is smaller than n. Although SIS is effective under certain conditions, its performance largely deteriorates with large correlation between predictors and the presence of potential outliers. In order to enhance the performance of SIS, two procedures, Robust Rank Correlation Based Screening (RRCS) and Robust Factor Profiled Sure Independence Screening (RFPSIS), are introduced. The performance of the two methods is compared in the simulation studies.
RRCS employs the Kendall τ rank correlation instead of Pearson correlation to conduct the process of marginal variable screening. Kendall τ rank correlation has close relationship with the Pearson correlation, while it is invariant under monotonic transformation and could be extended to generalized non linear regression. RFPSIS is based on the idea of FPSIS where the original data is projected onto the orthogonal complement space. Ideally, the correlation structure in the original space could be captured by the latent factors. RFPSIS proposes robust modification of FPSIS by using a least trimmed squares method to estimate the latent factors and the profiled variables; meanwhile, it identifies potential outliers and reduces the influence of the outliers. The simulation results show that the two methods are robust against heavy-tailed outliers when predictors are not highly dependent with each other. When the data set is contaminated with a large proportion of outliers, RFPSIS would outperform RRCS.
Chiara Gilardi A comparison of statistical approaches for fraud detection Dr. Lukas Meier Aug-2019
Abstract:
Fraud is a constantly increasing problem, which affects many business areas worldwide and results in the loss of billions of dollars per year. Since fraudsters strategies are always evolving, the development of new fraud detection techniques is essential for both prevention and fraud identification. Statistical methodologies such as outlier detection and classifi- cation have been widely used in the literature for tasks like credit card, e-commerce, and telecommunications fraud detection. In this research, supervised and unsupervised ma- chine learning algorithms were studied and applied to simulated bank clients data. When using a supervised approach, particular emphasis was placed on sampling and weighting techniques, which allow mitigating the imbalanceness of the data. It was found that clas- sification methods outperform outlier detection algorithms such as Isolation Forest, LOF, and Self-Organizing Map. The Random Forest algorithm was identified as the best per- forming classifier. Additionally, the use of undersampling or SMOTE sampling techniques allowed to improve its performance even further.
Vladimir Fomin The Effect of Random Rotation Upscaling on Neural Networks and Gradient Boosted Trees Dr. Markus Kalisch Aug-2019
Abstract: Gradient boosted trees and neural networks are currently two of the most successful and used statistical methods. As a result, they consistently occupy the top leaderboards on Kaggle challenges, which is a platform where machine learning hobbyists and professionals tackle problems posed by the industry and scientific community. The goal of this thesis is to provide an overview of the theory behind these methods as well as to explore their strengths and differences in practical applications.

In an attempt to examine these differences, a simulation study was performed which yields an interesting idea. At the core of this idea are random rotations which allow the upscaling of low dimensional data into a higher dimensional space while preserving some of its properties.

This may seem counterintuitive at first since upscaling data usually leads to a bigger search space and additional noise, but the result surprisingly could suggest the opposite if certain conditions are met. We will delve deeper into this idea in the later chapters of this work.
Karavouzis Eleni Artemis Model Selection Dr. Markus Kalisch Aug-2019
Abstract: There are various model selection methods. An important question is how to select the best one for each situation. In this Thesis we present some of the most common information criteria, their derivations and the respective assumptions. Afterwards we introduce the cross-validation tool, which is a general procedure that does not make any assumptions on the underlying model and can be used in many different settings. In the end we try to compare the cross validation with some of information criteria.
Davide Dandrea Incorporating Background Knowledge into the GES Algorithm Prof. Dr. Marloes H. Maathuis Aug-2019
Abstract: Structure learning methods can typically divided into constraint-based, score-based and hybrid methods. Incorporating various types of background knowledge into hybrid meth- ods or constraint-based methods such as the PC algorithm is often, by construction, a natural process. For score-based methods this is less straightforward. We study how to incorporate various types of background knowledge into one of the best known score- based greedy search algorithms—the greedy equivalence search (GES) algorithm—and its variants. In particular, our focus is on improving estimation quality and computational efficiency of GES while ensuring that its asymptotic consistency is maintained.
Martin Buttenschoen A Graphical Criterion for the Asymptotic Efficiency of Conditional Instrumental Variables Prof. Dr. Marloes H. Maathuis
Leonard Henckel
Jul-2019
Abstract: Conditional instrumental variables are commonly used for causal effect estimation. For linear causal models with Gaussian error terms, it is known that instruments can be combined to reduce the asymptotic variance of the corresponding estimators. One limit of this procedure is that the bias of the resulting estimator increases as instruments are added. So combinations of conditional instrumental variables are not always desirable and in those cases one might resort to the uncombined conditional instrumental variables. In this thesis, we present a graphical criterion for comparing conditional instrumental variables with differing conditional sets. Given only the graphical structure of a linear Gaussian causal model, our criterion induces a partial ordering on the set of possible conditioning sets in terms of the asymptotic variance of the corresponding conditional instrumental variable estimators. We present simulations to support our results.
Carla Schärer - Gonzalez Lutzenkirchen Strategies for Randomization in Randomized Controlled Trials Dr. Markus Kalisch Jun-2019
Abstract: Much has been written about the design of experiments, and how it is only through a good experiment that causality relationships can be established. Our goal is to put both the sampling and the causal infer- ence theory under one roof surrounding the different random proce- dures that could be used in both processes. We explore different ways of obtaining a sample out of a population, their efficiency when es- timating the population mean and their pros and cons depending on the structure of the population. Once a sample has been obtained, we look into ways of assigning its elements to the different experimental groups, in order to have more accurate and reliable results. With all the above and with the help of examples and simulations we aimed to make the theory more accessible.
Armin Fingerle A predictive model of private market performance Prof. Dr. Marloes H. Maathuis
Dr. Fabien Piotet
Dr. Nerina Fidanza Romani
Jun-2019
Abstract: In this thesis we develop a model to predict a movement in a company’s EBITDA (Earn- ings Before Interest, Taxes, Depreciation and Amortisation) focusing on the private mar- kets environment. We consider the effects of data from the company’s income state- ment and balance sheet as well as micro- and macro-economic variables on the company’s EBITDA. The methods lasso, ridge, elastic net and random forest are employed to ex- plore this question. An initial attempt using regression was unsuccessful. It seems that the independent variables do not provide much information for the regression. With the method relying on classification however, we achieved positive results. With this model we predict the classification of future changes in EBITDA into two categories: EBITDA %changes > −5% and ≤ −5%. The final model is based on random forest and operates on any industrial sector. It relies only on lagged EBITDA variables and miss-classifies 13.7% of the EBITDA %changes ≤ −5% observations and 53% of the other category.
Qikun Xiang Estimating Influence on Social Media Networks using Causal Inference Prof. Dr. Marloes Maathuis
Dr. Albert Blarer
Ms. Meta Lina Spohn
Jun-2019
Abstract:
In recent years, online social media platforms such as Twitter.com have emerged as a prominent source of information. This has raised concerns regarding the spread of misin- formation and the presence of state-sponsored influencing operations that have the poten- tial to undermine the integrity of elections and referendums. Thus, it is crucial to have the ability to understand the information propagation and peer influencing processes in online social networks. In the existing literature about continuous-time information propagation, most studies focused on prediction and few studies considered the causal aspect, especially in the retrospective and counterfactual sense. While many studies in the causal inference literature proposed powerful techniques to identify causal relations in time-sensitive data, most of the studies focused on discrete-time series and not much attention has been paid to continuous-time models.
In this Master Thesis, I propose a causal inference framework for the continuous-time net- work diffusion model by Gomez-Rodriguez, Balduzzi, and Scho ̈lkopf (2011). The frame- work allows one to ask counterfactual questions such as “what would have happened if something had been different?”. This allows one to gain insight about the model at the most pronounced level of the causal hierarchy. Based on the causal inference framework, I propose a new influence measure suitable for quantifying the influence of individuals in an online social network in retrospect.
Besides the theoretical development, I study the characteristics of a number of real Twitter datasets. Based on the insights from studying the datasets, I empirically demonstrate the proposed framework and influence measure using simulated and real datasets. The simulation experiments consider many realistic challenges including small and unevenly distributed sample sizes, the sub-selection of users, and the case where the true model deviates from the assumed model. The experiments show the practicality of the proposed approaches and also reveal a number of limitations that can be addressed in future work.
Camilla Gerboth A Comparison of Hierarchical Inference Methods with Applications in Genome-Wide Association Studies Prof. Dr. Peter Bühlmann
Claude Renaux
May-2019
Abstract: We compare two methods which test hierarchically groups of predictor variables for signif- icant association with a response variable in a high-dimensional setting while controlling the family-wise error rate. An interesting application for these methods can be found in genome-wide association studies. The objective of these studies is the detection of single or groups of genetic variables that are significantly associated with a disease. Both hier- archical inference methods are data-driven since the signal in the data and the correlation among the predictor variables determine the size of the significant groups. One hierarchical inference method is based on hierarchical clustering where the predictor variables are ordered such that highly correlated variables are assigned to the same group and the other one is region-based. In a simulation study, the performance of the hierarchical inference methods is tested on synthetic and semi-synthetic data. In addition, confounded and de- confounded synthetic data are generated in order to study the performance. A spectral transformation procedure is used to obtain deconfounded data. Our results indicate that the non-region-based hierarchical inference method is superior to the region-based.
Francesco Bigiolli Augmenting Sentiment Analysis via Electroencephalography Recordings Prof. Dr. Nicolai Meinshausen
Prof. Dr. Ce Zhang
Nora Hollenstein
Apr-2019
Abstract: The scope of this Thesis is to investigate the added value that Electroencephalogra- phy (EEG) and Eye Tracking (ET) data can bring to Natural Language Processing tasks, focusing in particular on Sentiment Analysis.
To carry out the analysis, this work leverages the Zurich Cognitive Language Corpus, an open source dataset of EEG and ET recordings of subjects reading sentences.
From a Neuroscience perspective, this thesis explores questions in relation to the detection of emotion in EEG, addressing multiple representations with different Deep Learning techniques to the end of extrapolating the sentiment of read sentences from cognitive data.
From a Natural Language Processing perspective, this work investigates the role that information over the human way of processing can have in augmenting classical sentiment classifiers. To this end, evidence of the potential of EEG and ET, as enhancers of Machine Learning systems, is provided by leveraging such data to improve the detection of sentiment with respect to a baseline model.
Furthermore, the study provides an alternative approach to the creation of word embeddings based, instead than on the word-context relationship, on the human cognitive processes associate to a specific words.
Finally, support to the claim of effectiveness of such embeddings is given. This is done by testing their classification performances beyond the framework in which the cognitive data were recorded, showing consistent improvement in the detection of sentiment on the Stanford Sentiment Treebank corpus with multiple variants of said cognitive embeddings.
Maurice Weber Lossy Image Compression for Classification Prof. Dr. Ce Zhang
Prof. Dr. Nicolai Meinshausen
Cedric Renggli
Mar-2019
Abstract: We study learned image compression at variable bitrates in the context of subsequent classification, where the classifiers are parameterized as convolutional neural networks. Our goal is to train a compression al- gorithm, based on recurrent neural networks, such that the accuracy of pretrained classifiers which are unknown to the compression system, is maintained, when evaluated on the compressed dataset, also at low bitrates. We investigate loss functions used as objectives for the com- pression networks and highlight deficiencies of the commonly used pixel-wise distances. As an alternative, we propose to employ percep- tual loss functions based on features of convolutional neural networks pretrained for image classification.
We validate our approach experimentally on three datasets and show that by using perceptual loss functions, we can substantially decrease the loss in classification accuracy when compared to pixel-wise loss functions and the widely used JPEG and WebP codecs. Our results furthermore indicate, that the advantage of learned image compression with perceptual loss functions is especially pronounced below 0.5 bits per pixel.
Corinne Emmenegger Linear and Non-Linear Anchor Regression Prof. Dr. Nicolai Meinshausen Mar-2019
Abstract: Classic prediction relies on the assumption that the data generating mechanism does not change in between training and prediction. We investigate and propose prediction methods if this assumption fails to hold. We assume that the change of distribution from training
to testing is due to the change of distribution of an exogeneous random variable. This random variable is called anchor. We investigate linear and non-linear data generating mechanisms. In case of a linear data generating mechanism, anchor regression is recalled
and illustrated with examples. In case of a non-linear data generating mechanism, we fit natural cubic splines by optimising a loss function. This loss function is composed of several terms: one term measures closeness of the predictor to the observed data, one term
measures independence of the resulting residuals and the anchor, and an optional term regularises the predictor. Different regularisation terms encouraging independence of the residuals and the anchor are considered. We furthermore investigate a method based on the boosting algorithm in the non-linear case.

Isaia Albisetti Testing for Spherical Symmetry: a Conditional Expectation Approach Fall 2018 Dr. Fadoua Balabdaoui Mar-2019
Abstract:
A distribution is called spherical symmetric if and only if it is invariant under orthogonal transformations. Since many linear models assume this property for the error, testing the null hypothesis of spherical symmetry results essential. Starting from an equivalent definition involving conditional expectation, Kolmogorov-Smirnov and Cram ́er-von Mises type of test are constructed. Empirical analysis and a confront with other tests exhibit distinct performances.
Kay Spiess Computing the Dimensionality of Mixture Nested Effect Models to Improve Model Selection Prof. Dr. Niko Beerenwinkel
Dr. Markus Kalisch
Mar-2019
Abstract: In the framework of Mixture Nested Effects Models (M&NEM) by Pirkl & Beerenwinkel [5], an adapted version of the Bayesian information criterion (BIC) is used for model selection. Essential for the calculation of this information criterion is the number of model parameters or in other words, the dimension of the model. The dimension of an M&NEM is not straightforward to determine and different approaches can be followed. In this thesis, an approach already successfully applied to determine the dimension of mixtures of mutagenetic trees (MMT) by Yin et al. [8] is used to determine the dimension of M&NEMs. We will call this approach the Jacobian dimension estimation. Gene knockdowns are inherent in perturbation experiments used to gather the data to infer M&NEMs. The main challenge in this thesis was the incorporation of gene knockdowns in the dimension estimation as they are not part of the graphical representation of M&NEMs. We show that the M&NEMs trained with the Jacobian dimension estimation method have similar accuracy to the M&NEMs trained with the default method.
Sanzio Monti The effect of data preprocessing on SESI-MS breath analysis studies Prof. Dr. Marloes Maathuis Mar-2019
Abstract: Secondary electrospray ionization mass spectrometry (SESI-MS) is a promising tech- nique used recently by researchers interested in linking compounds present in exhaled breath to some disease. SESI-MS has the advantage of being non-invasive and of pro- viding real-time results, meaning that it could become a major way to collect data for the study or even the diagnosis of diseases. This thesis considers a SESI-MS breath analysis study for the diagnosis of a sleeping disorder which researchers failed to re- produce. So far, especially for non-pulmonary conditions, this kind of data analysis is still under development. For this reason, the hypothesis treated in this thesis is that the preprocessing of data could have a significant effect on the subsequent findings. The analysis performed during the study, involving correlation and classification per- formance, were repeated after the application of various preprocessings. It turned out that different preprocessings return different values of correlation and classification performances. The main factors affecting these changes are the normalization method and the severity of the prior feature selection. These findings could make aware re- searchers of the importance of data preprocessing. Furthermore, the classification procedures illustrated in this thesis could be used to repeat the validation study and reach more comparable results.
Cédric Bleuler Automated Data Extraction From PDF Documents Prof. Dr. Marloes H. Maathuis
MSc ETH Aaron Richiger
Mar-2019
Abstract: Today, paper has been replaced by PDF documents in many areas because it provides a cheap mean to archive large amounts of documents. The problem is that it is difficult to use the data provided by such archives in an efficient way. Typically, the files must be scanned through manually. Another, more advanced method to extract the data is via a rule based approach, where rules are specifically developed for one type of document. In this thesis, we explore the possibility of a machine learning algorithm that learns the rule based on labeled documents and is able to extract data from new documents.
The first part is devoted to the statistical theory, including decision trees, random forests, logistic regressions, cross validation and other means to develop an efficient classification model.
The first problem is to classify the words of a corporate document into two classes words that are essential to the document and words that contain no essential information. The second problem is to classify the essential words into subclasses so that the essential information can be stored in a structured way. The third problem is grouping certain words together such that the semantics of a group of words can be captured. The last problem is to get rid of prefixes and suffixes in the classified words, so that only the information contained in the word is kept.
Although many different models were used in all problems, in all the cases random forest based models proved to be the most efficient in classifying the words using the chosen features compared to the other models that were used in the experiments.
Seongwon Hwang Sampling-based Expectation Maximization with Knowledge-based Prior for Incorporating Functional Annotations to Detect Causal Genes Dr. Daniel Zerbino
Dr. Peter Bühlman
Mar-2019
Abstract: Genome-wide association studies (GWAS) have successfully identified thousands of associated regions with complex traits and diseases. However, identifying causal genes in GWAS studies remains challenging because many statistically significant variants are in linkage disequilibrium with causal variants and some loci harbor more than one causal variant. There have been several attempts to prioritize relevant tissues and target genes by computing colocalization posterior between GWAS and expression Quantitative Trait Loci (eQTL), but biological knowledge has not been fully considered to model the causal status in these cases. On the other hand, functional fine-mapping studies that choose relevant annotations to interpret an association study of complex traits have been modelled statistically to set useful priors. We devised a sampling-based Expectation Maximization (EM) algorithm to combine functional fine-mapping with colocalization using shotgun stochastic search method. Setting functional information-based prior can aid causal Bayesian inference in GWAS and eQTL study. We use the maximum likelihood estimates as an initial value for EM method, which can help circumvent approaching local extremum. We analyzed Body Mass Index GWAS summary data and found that ML+EM estimate produced the most promising evidence in fine-mapping for both GWAS and eQTL study, and in causal gene mapping. Finally, we introduce a new framework to efficiently compute the posterior probability and the parameters with hidden variables combining with sampling techniques.

Christopher Salahub Seen to be Done A Graphical Investigation of Peremptory Challenge Prof. Dr. Marloes Maathuis Mar-2019
Abstract: The legal practice of peremptory challenges is described, outlining its past and present racial controversies as well as the modern defences typically provided in its favour. These arguments are analyzed statistically using novel visual tools including the mobile plot and the positional boxplot, which were developed to explore the impact of race on the exercise of peremptory challenges in three data sets (Wright, Chavis, and Parks (2018), Grosso and O’Brien (2012), and Baldus, Woodworth, Zuckerman, and Weiner (2001)). Mulit- nomial regression models motivated by these visualizations are fit and used to generate precise parameter estimates which indicate the dominance of race in peremptory challenge decisions for venire members across all data sets. Trial level summaries of the data from Wright et al. (2018) are produced and discussed in the context of the results from the venire member models.
Luisa Barbanti Transformation Models: An Introduction with Applications in R Dr. Lukas Meier Feb-2019
Abstract:
This thesis provides an introduction to transformation models by presenting the theoretical background necessary to understand these models and by explaining in detail how to implement them. Many examples on real as well as simulated data are provided to guide the reader into this flexible world, from the set-up of a model to making prediction based on the most likely transformation with the mlt package to obtaining transformation trees and forests via the trtf package. Finally, the potential of transformation models as a tool for data analysis is explored by following a top-down approach for model selection similar to the one presented in Hothorn (2018).
Jesse Provost Unsupervised mitral valve segmentation in 2D echocardiography with neural network matrix factorization Luca Corinzia
Prof. Dr. Nicolai Meinshausen
Prof. Dr. Joachim Buhmann
Feb-2019
Abstract: Mitral valve segmentation is a crucial first step to establish a machine learning pipeline that can aid medical practitioners in the diagnosis of mitral valve dis- eases, surgical planning, and intraoperative procedures. In this thesis, we propose a totally automated and unsupervised mitral valve segmentation algorithm. The method is composed of a low-dimensional neural network matrix factorization of echocardiography videos that separates the mitral valve and noise from the myocardium, and a window detection algorithm that locates the region that con- tains the mitral valve for segmentation. The method is evaluated on a collection of echocardiography videos of patients with a variety of mitral valve diseases and it outperforms the state-of-the-art method in all the metrics considered.
Yi Liu Super-Resolution, Generalized Error, Wasserstein Distance Prof.Dr. Sara van de Geer Feb-2019
Abstract: In this thesis, super resolution problem is studied as an example of high dimensional statistics.
Two cases of signals off the grids and signals on the grids have been considered. For signals on the grids, Lasso can be used to re- cover signals. For signals off the grids, semi-definite programming is used to recover signals. Prediction error can be bounded in both cases. We study when the compatibility condition holds to achieve fast convergence rate.
In fact, super resolution problem is to use low frequency data to estimate a regression function on whole frequency domain. This problem inspires to think about comparing risk of estimators given data from two different distributions.
A new approach of comparing risk of estimators given two dif- ferent data distributions has been developed.
The last chapter is about aspects of calculating Wasserstein distance and optimal transport.
Nicola Botti Design of optimal insurance contracts with limited data Prof. Dr. Marloes H. Maathuis
Prof. Dr. Patrick Cheridito
Feb-2019
Abstract: An insurance policy is one of the most intuitive forms of risk management between two agents (namely, insurance provider and insurance buyer). Still, understanding what actu- ally drives these two agents to such agreement and the optimal form of such contract is not trivial, and has represented a topic of research for more than fifty years. This piece of work analyzes one of the main pillars in the field, the result of Raviv (1979). The atten- tion is initially placed on a theoretical review of his main contribution, to then focus, in particular, on the shape of Pareto-optimal contracts: some examples using various type of utility functions are hence discussed, including insights regarding the piece-wise linearity of such contracts. Subsequently, the paper lays out some ideas for possible extensions of Raviv (1979)’s theory, and in particular what happens in case the market participants are uncertain about the loss distribution faced by the insurance buyer. Two different scenarios are presented: in the first, insurance buyer and insurance provider have different beliefs regarding the distribution of this loss; in the second, the insurance provider is undecided among two possible distributions. Finally, the study highlights the criticality found in trying to approach these scenarios.

2018

Student Title Advisor(s) Date
Weigutian Ou Spectral Deconfoudning on Generalized Linear Models Prof. Peter Bühlmann
Domagoj Cevid
Dec-2018
Abstract: We investigate confounding in causal generalized linear models. We show that a confounded generalized linear model(GLM) is equivalent to a perturbed generalized linear normal model(GLNM) on the distribution of observed variable ( X,Y ) . To fully understand the behavior, we investigate the properties of GLNMs and their fitting process. Based on these, we invent a method to get a correct parameter estimation even
with the existence of the confounding. Finally, we compare our method to other methods.
Natallie Baikevich High-Dimensional Classification with Correlated Data and Applications in Metagenomics Seminar for Statistics Spring 2018 Dr. Lukas Meier
Prof. Dr. Shinichi Sunagawa
Dr. Miguelangel Cuenca Vera
Nov-2018
Abstract: Today high-dimensional classification is a problem of increasing importance, especially in fields like metagenomics. The dimensionality and presence of many highly correlated predictors make building truly accurate models and their interpretation particularly chal-lenging. We review a number of popular approaches and evaluate their perfor-mance in multiple simulation studies as well as in a case study using features originating from DNA sequences for disease classification. The crucial importance of interpretability motivate the choice of variable ranking as the main focus of this research. We high-light the important properties of the methods and develop an improved version of Hierarchical Inference approach.
Lorenz Walthert Deep learning for real estate price prediction Dr. Markus Kalisch
Dr. Fabio Sigrist
Nov-2018
Abstract: The goal of this thesis is to model real estate prices with deep learning. Using transaction data between 2011 and 2017 for self-owned apartments in Switzerland, we do extensive network architecture search, evaluating a large amount of different layer compositions and hyper-parameters. We estimate models with pyramid and at layer composition as well as networks with embeddings for macro location. In addition, we develop a model that restricts interactions and non-linearities to a subset of variables tailored to the nature of our data and propose a new approach to learning rate scheduling by uniting two existing approaches. A classical hedonic regression model and ridge regression with manual feature engineering as well as gradient boosting serve as a benchmark. We conclude that a linear model with manual feature engineering performs significantly worse than the flexible algorithms deep learning and gradient boosted regression trees. In particular when combined, deep learning and gradient boosting are capable of delivering high-quality predictions, surpassing traditional methods.
Lukas Hofmann Estimating the causal e ffect of case management on healthcare expenditure Prof. Dr. Marloes H. Maathuis
Leonard Henckel
Oct-2018
Abstract: Case management is an increasingly popular managed care technique, by which persons in complex life situations are supported in a resource and solution oriented manner. It is implemented in a standardized and cooperative process, intended to increase the quality of
care and to reduce healthcare costs. Especially a cost reduction effect is desirable against a background of constantly rising healthcare expen-diture. However, Case Management is
sometimes criticized for not achieving this cost effect. In this thesis, we analyze data from Helsana Versicherungen AG to investigate this issue. First, we describe the underlying selection process. With the background knowledge
obtained, we analyze the data and discuss the assumptions which are suffi- cient for cost effect estimation. Next, we estimate the cost effect, and finally, we suggest changes of the
selection process in order to prepare the ground for sound causal inference. Furthermore, we carry out a simulation study on healthcare cost behavior and a sensitivity analysis of causal effect estimation under confounding.

Keywords: causality, observational study, intention-to-treat analysis, design of experiments, linear regression methods, covariate adjustment, matching
Aline Schillig Cyber Risks and Data Breaches Dr. S. Wheatley
Prof. M. Maathuis
Dr. S. Frei
Oct-2018
Abstract: In this master thesis we analyze data breaches which constitute one of the key cyber risks of today’s cyber world. Motivated by previous work, in particular the work of Eling and Loperfido [14, 2017], Wheatley, Maillart and Sornette [37, 2016], Hofmann, Wheatley and Sornette [18, 2018], we analyze data breaches with at least 70k records lost from an insurance point of view with a new extended dataset. We use multidi- mensional scaling to identify severity risk classes based on the economic sector. To model the frequency we employ count GLMs, whereby we detect notable different scenario outcomes for the future development of the frequency of data breaches with at least 70k records lost. The data breach severity is analyzed with respect to various characteristics of the event, such as the size and economic sector of the affected entity as well as the type of breach medium, the mode of failure that led to the breach and whether a third party was involved in the data breach event. We estimate the severity distribution, which is best approximated by a truncated lognormal or upper-truncated Pareto distribution for various thresholds for the complete dataset. In a further step we study the reporting delay. Herefore both parametric and non-parametric methods are used to assess the development of the reporting delay over time and its relation to other variables. Furthermore, we analyze whether there have been any changes in the reporting of data breach events due to the introduction of data breach notification laws in the US.
Samuel Kessler Composite Training for Time Series Forecasting Prof. Andreas Krause
Mojmír Mutný
Martin Štefánik
Sep-2018
Abstract:
It has been empirically observed that a discrepancy between the way a model is trained and the manner the trained model is used for forecasting leads to large prediction errors. We investigate a form of time series model training that coincides with the forecasting procedure, and compare it to the standard modes of training. Experimentally we demonstrate that our training method is able to yield better multi-step forecasts for linear autoregressive (AR(p)) models and similarly for recurrent neural networks (RNNs) when compared to the conditional maximum likelihood estimator. Experimentally we show that for the case of AR(p) models our training method affects the learnt model’s Lipschitz constant to produce more stable forecasts. In the case of RNNs, our training method is empirically shown to be more robust to prediction mistakes when performing forecasts in comparison to RNNs trained via the conditional maximum likelihood.
Federico Rogai Linear Mixed-Effects Models in R: Assessing the Quality of Confidence Interval Functions Dr. Markus Kalisch
Dr. Martin Mächler
Sep-2018
Abstract: This thesis is an extension and improvement of previous works by Xia (2014) and Zingg (2014). As they did, we set out to evaluate the quality of different methods to compute confidence intervals for a linear mixed effect model’s fixed effects parameters. We study both models that are balanced and unbalanced with respect to their random structure.
To gain a sound understanding of the problem at hand, we review the theory on parame- ter estimation for linear mixed effect models. We also go over the mathematics of differ- ent methods’ confidence intervals computations to gain further insights over the different necessary assumptions and the resulting techniques’ properties. Given the difficulty of addressing the problem at hand from a theoretical perspective, we approach it empirically.
A simulation study is set up in the statistical software R for this scope. In contrast to typical simulation designs, we do not arbitrarily generate new data, but rely on existing datasets on which several different models are fitted. The fitted object is used to generate new response variables. Re-fitting the same model to the new data allows us to evaluate the confidence intervals we obtain for the parameter of interest β since we now know its true value.
For the evaluation of our results several measures of the quality of our confidence intervals are constructed. Studying descriptive statistics and plotting our results, the method that performs best according to these measures is Kenward-R., which is closely followed by Satterth. and -depending on the statistic we look at- boot-Para-uF. On the contrary, profile, boot-Para-uT and boot-Semi-uT were shown to deliver a disappointing per- formance. Finally, we took a more rigorous approach to analyze our results. We focused on one “quality measure” and fitted a linear mixed effect model using among others, the predictor method. The results we obtained here support and strengthen our previous findings.
Abhimanyu Sahai DEEP LEARNING FOR MUSIC SOURCE SEPARATION Prof. Nicolai Meinshausen
Dr. Romann Weber
Dr. Brian McWilliams
Sep-2018
Abstract: Separating a music track into its component instruments is an interesting problem with applications in music production as well as in creating symbolic music data for further downstream applications in music informatics. Traditional approaches to solving this problem were typically non-data-driven, using known statistical properties of music signals to perform the decomposition.
In recent years, data-driven approaches, and deep neural networks in particular, have delivered a radical improvement over traditional approaches towards this goal. We study one of the current State-of-the-Art algorithms based on deep convolutional networks and implement a music source separation system based on it. We show that our system performs competitively with other systems on a standardized separation task.
We then explore some enhancements to this algorithm. In particular, we study the use of alternative loss functions to the L2 or L1 loss commonly used in these systems, and demonstrate that we are able to improve the algorithm’s performance by using a ’perceptual’ VGG loss. We also explore improving the capacity utilization of the network by modifying the system input/output. Finally, we also create a new dataset that can be useful in music source separation of woodwind quintet compositions.
Alice Feldmann Assessment Voting in Large Electorates - Analysis and Extensions Prof. Dr. Peter Bühlmann
Prof. Dr. Hans Gersbach
Sep-2018
Abstract: This thesis presents Assessment Voting, an innovative two-round voting concept which is evaluated in costly settings and large electorates. Compared to standard voting schemes, Assessment Voting is cost-effi
Lilian Gasser Topic modeling on Swiss parliamentary records Dr. Markus Kalisch
Dr. Luis Salamanca
Dr. Fernando Perez-Cruz
Sep-2018
Abstract:
In various fields, topic modeling is an established technique for the analysis of large col- lection of text documents. Recently, it has been applied in political science to determine interests and concerns of politicians and political groups, most prominently for the U.S. Senate and the European parliament. This thesis provides a thorough introduction to topic modeling focusing on latent Dirichlet allocation (LDA). LDA is applied to a collection of Swiss parliamentary records from 1891 until 1995. The qualitative analysis of the 39th and the 44th legislative period unveiled important socio-economical and political events of these time periods. A first attempt at dynamic topic modeling provided a surprisingly insightful impression of the temporal evolution of discovered topics. Additionally, the 44th legislative period was quantitatively analyzed to determine the optimal number of topics. As this work is the first analysis of such kind of this dataset, there are manifold extension possibilities such as applying dynamic topic modeling to retrieve the evolution of topics. Associating politicians to topics and link this information in a relation graph with their background could be used to predict their decision in votes.
Rashid Khorrami Early-Warning-Signals for Critical Transitions in the Twitter-Network Prof. Dr. Marloes Maathuis
Dr. Albert Blarer
Prof. Dr. Didier Sornette
Sep-2018
Abstract: In different fields of research, there have been numerous attempts to predict the exact time where an observed time-series will reach a peak. In this thesis, we aim to find methods for predicting peaks in twitter time-series which correspond to almost simultaneous real- world events. For that, we assume that for a topic of interest the twitter-network in some way undergoes a bifurcation whenever or shortly before the corresponding time-series of tweet-counts reaches a peak. In general, when dynamical systems come close to a tipping- point, time-series that originate from it are expected to exhibit certain early-warning- signals (EWS). In our analysis, we restrict ourselves to the evolution of the time-series’ autocorrelation, variance and skewness and aim to determine characteristic patterns in these three quantities (so-called EWS-time-series) that would indicate an approaching peak in the tweet-counts. On the way, we also address further approaches to the peak- prediction problem and put the corresponding topics into the overall context. In the end, we conclude that our EWS-methods have overall superior results when we apply them to a detrended version of our time-series or only on the Remainder-parts of their multiplicative Season-Trend decompositions. However, due to some problems regarding data-quality and since some of the methods we used are only suitable for retrospective views, our results should be followed up on with caution.
Michael Scherrer A practical approach to predict future device sales and estimate causal structures in an observational design Dr. Lukas Meier
Verica Milenkovic
Aug-2018
Abstract: A good prediction of mobile phone sales can be extremely valuable for tele-communications carriers. Even better is the knowledge how to intervene such that the future sales increase. We start by predicting future sales with different regression models on a real-world data set, provided by one of the largest swiss telecommunication companies. We analyse the possibilities and limi-tations of the data and come up with various use cases for the models derived. Moreover, we analyse the causal dependencies of the variables used for the pre- diction models. The simplest way to find causal relations are experi-mental interventions. However, in this practical case this is not possible. Instead, we use two different algorithms to estimate graphical models. Graphical models can answer the question whether a causal relation exists in an obser-vational design. In fact, this thesis can show some interesting dependencies and causal relations among variables, which could be used to develop new pricing models. We point out the limi-tations and possible improvements by comparing the PC and RFCI algorithms in specific applications.
Jinzhou Li Nodewise Knockoffs: FDR control for Gaussian graphical model Prof. Dr. Marloes Maathuis Aug-2018
Abstract: This thesis focuses on studying the error control problems in statistics. We start from learning some classical methods in controlling the familywise error rate (FWER), the K-familywise error rate (K-FWER), or the false discovery rate (FDR) under the context of multiple testing. Then, we turn our attention to the FDR control problems under the more specific contexts: variable selection and structure learning. Although both tasks can be converted into multiple testing problems, there are methods that achieve the same goal without referring multiple testing. For variable selection, we study a newly proposed and very interesting idea called knockoffs. The main focuses are on the fixed-X knockoff framework proposed by Barber, Candès, et al. (2015), and model-X knockoff framework introduced by Candes, Fan, Janson, and Lv (2016). After studying all of these, we try to make use of the nodewise and fixed-X knockoff ideas, and find a method that guarantees the finite sample FDR control when learning the structure of a Gaussian graphical model. We propose many experimental procedures and implement simulations to test their ability in controlling the FDR. Some of them successfully control the FDR in our simulations. But we didn’t derive the theoretical FDR control guarantee for most of these methods. One exception is the so-called Nodewise Knockoff method. The FDR control property of this method is proved, and we show by simulation that it outperforms the Benjamini & Yekutieli procedure proposed by Benjamini and Yekutieli (2001) in some settings. The Benjamini & Yekutieli procedure is the only procedure we aware that guarantees the finite sample FDR control without any assumption on the underlying graph. We close this thesis with a summary and some possible future research directions.
Jannik Hamper Causal Inference In Time Series And Its Application To The German Electricity Market Prof. Dr. Marloes H. Maathuis
Dr. Jan Abrell
Aug-2018
Abstract: This thesis discusses the notion of causality in general and in the context of time series and presents two methods to conduct causal inference in time series. One is to fit a vector autoregressive model and draw conclusions using the concept of Granger causality. The other one is a version of the PC algorithm that conducts conditional independence tests and infers properties of the underlying causal mechanism from them. Both approaches are applied to a dataset from the German electricity market that shows several strong seasonal components. The analysis is conducted on different version of the dataset on an hourly, daily, weekly and monthly time resolution where the seasonal components have or have not been removed. Given previous economic knowledge, some of these results can be considered invalid while others are more plausible. These suggest that generation from renewable sources is causally influenced by other factors which is thought not to be the case due to their low marginal costs.
S. H. Magnússon Leave-out methods for selecting the optimal starting point of financial bubbles Prof. Nicolai Meinshausen
Prof. Didier Sornette
Aug-2018
Abstract: Identifying financial bubbles and predicting the burst of them is of high theoretical but also practical interest. The Log-Periodic Power Law Singularity (LPPLS) model is an attempt to model unsustainable growth in financial markets, namely super-exponential growth, and predict the in- evitable burst of such bubbles. This thesis aims to improve the statistical estimation of the LPPLS model by allowing the residuals of the parametric model to have an auto-regressive part and heteroskedasticity in the inno- vations. Further, new methods for selecting the optimal starting point of a bubble are investigated and compared to existing methods. Finally, the parameters in the LPPLS model need to satisfy specific constraints to describe a bubble. We extend these constraints partly to probabilistic boundaries.
The methods for selecting the optimal initial point of a bubble were tested and compared to results on synthetic and historical data. The pro- posed improvements on the residual structure are necessary for estimating the parameters and its confidence intervals. The suggested methods for selecting the optimal initial point of the bubble are conceptually more appealing than existing methods but require refinements. We suggest do- ing further experiments to compare and find the merits and drawbacks of each selection criteria.
Emilien Jules Generalized Linear Models : Parameter Estimation, Correlated Responses, and Omitted Covariates Dr. Markus Kalisch Aug-2018
Abstract: Since their introduction by McCullagh and Nelder (1989), generalized linear models have become a canonical method for handling discrete data. In the case of independent observations, the iteratively reweighted least squares algorithm is commonly used for compu-tation of the maximum-likelihood esti-mator. For the purpose of computational efficiency, we contemplate an alter-native, gradient-based fitting procedure – online gradient descent – which we believe to be more appropriate for large-scale parameter estimation. When the collected data come from a panel study, the observed responses may no longer be assumed to be independent. Two modelling approaches compete for cap-turing the correlation that arises in such designs. While population average models rely on generalized estimating equations, generalized linear mixed models use (restricted) maximum like-lihood for computing parameter esti-mates. We describe how the two approaches differ, and we use their dissimilarity to gauge the consequences of omitting covariates in a GLM frame-work.
Jun Wu Learning directed acyclic graph with hidden variables Prof. Dr. Marloes Maathuis Aug-2018
Abstract: A method called LRpS+GES was proposed by Frot, Nandy and Maathuis (2018) to estimate the Markov equivalence class of a directed acyclic graph (DAG) with hidden variables. The consistency of the method requires the DAG of the observed variables to be sparse and allows a small number of hidden variables that have an effect on a large proportion of the observed variables.

In this thesis, we consider some relaxations of the hidden variable conditions in Frot et al (2018). In particular, we allow the existence of two kinds of hidden variables: arbitrary hidden variables restricted by sparsity conditions, and hidden variables with an effect on a large proportion of the observed variables as in Frot et al (2018). We propose a new LRpS+(R)FCI method for this problem setup. The idea of this method is analogous to LRpS+GES. The first step removes the hidden variables with an effect on a large proportion of the observed variables. The second step consists of applying (R)FCI. We derive consistency conditions for LRpS+(R)FCI and give a consistency proof. In a simulation study, our method showed superior performance when compared to traditional causal structure learning methods, but it did not perform better than LRpS+GES in general. Under some special simulation settings, however, our method showed improvements over the LRpS+GES approach in estimation accuracy.
Andreas Psimopoulos Identifying the Macroeconomic Conditions Preceding Recessions: A Comparison Among Statistical and Machine Learning Methods Prof. Dr. Nicolai Meinshausen Aug-2018
Abstract:
Forecasting recessions is a classic econometric challenge. After the financial crisis of 2008, it became clear that the ability to predict such events could prevent millions of people’s lives from being affected by very serious consequences. The rapid adoption of Machine Learning in scientific applications and the recent advances in Computational Statistics pave the way for a new kind of econometric approaches, which have the potential to radically change the evolution of mainstream Econometrics. In order to identify the conditions that precede economic recessions, a new algorithm was developed in the framework of this thesis. The so-called “Average Trees algorithm” aims to provide reliable and easily interpretable results, with regard to which macroeconomic conditions prevail during the year before a recession. Two variants of this algorithm and eight additional statistical or machine learning methods are compared against each other, in terms of six evaluation metrics on their out-of-sample performance. The analyzed datasets refer to six countries (Australia, Germany, Japan, Mexico, UK, USA) and they cover a time span of more than 40 years. The best performed method is the Support Vector Machines (SVM). Models based on SVM classified correctly at least 75% of the pre-recessionary periods for half of the countries, with a mean overall classification accuracy around 90% in these cases. Moreover, this study is a benchmarking of several theories about economic crises. Despite the complex nature of business cycles, it seems that policymakers can take advantage of this thesis’ methodology by using its results as early warning signs of potentially upcoming recessions.
Sayed Rzgar Hosseini Quantifying the predictability of cancer progression using Conjunctive Bayeisan Networks Prof. Dr. Marloes Maathuis
Prof. Dr. Niko Beerenwinkel
Aug-2018
Abstract: Measuring the predictability of cancer progression provides us with an unpre-cedented opportunity to gain quanti-tative insights into the diagnosis and treatment of this widespread death-causing disease. Cancer is a disease with an evolutionary basis of progres-sion, so it can benefit from the accumulated knowledge on the predic-tability of evolution in general. However, the dominating approach for quantifying the predictability of evolution relies on the concept of fitness landscapes, which is impossible to empirically measure in vivo for cancer and is challenging to infer from cross-sectional mutational data. In this study, we aim to circumvent the need for fitness landscapes towards establishing a stable and scalable statistical frame-work to quantify the predictability of cancer progression directly from cross- sectional data using Conjunctive Bayesian Networks (CBNs). Leveraging on the simulated data of a previous study (Diaz-Uriarte, 2018), which has made a connection between fitness landscapes and CBNs, we show that the predic-tability estimated directly based on mutational data using our approach strongly correlates with that obtained from the corresponding fitness land-scapes under Strong Selection Weak Mutation (SSWM) assumption, and thus our CBN- based approach can accurately capture the underlying evolutionary constraints on the ordering of tumori-genic mutations. Importantly, we identi-fied a simple relationship between predictability of CBNs with a set S of n genes and the average predictability of smaller CBNs with a given set
Mohamad Hassan Mohamad Rom Statistical Process Control Dr. Markus Kalisch Aug-2018
Abstract: In this thesis, I introduce a novel approach to measure the performance of control charts based on the AUC of the ARL_1 plot. Standard approaches to measuring the performance of control charts assumes that the practitioner specify a target shift which for many use cases is impractical. The alternative approach discussed in this thesis instead relies on the practitioner to specify a target shift range. In this thesis I also compared Shewhart, CUSUM, EWMA and CPD control charts using the new measure and found that contrary to when the control charts were specifically designed for a predetermined target shift, EWMA control charts performed better than CUSUM charts.
Nicola Gnecco Causality in Heavy-Tailed Data Prof. Dr. Nicolai Meinshausen Aug-2018
Abstract: We introduce a novel method to estimate a causal order from heavy-tailed data. We start with a binary coefficient Γ, to detect causal directions between heavy-tailed variables.
This coefficient, proposed by Engelke, Meinshausen, and Peters (2018), can also detect the presence of a hidden confounder. We investigate the population properties of the Γ coefficient in a linear SEM, with an arbitrary number of variables. Besides, we prove that it is possible to identify the source node of a linear SEM, under some assumptions. Based on this result, we build four competing algorithms to recover the causal order of a graph from observational data. We compare and test the sample properties of the algorithms on simulated data. We show that our algorithms perform equally well in the large sample limit when the heavy-tail assumption is fulfilled. Finally, we test the algorithms in the presence of hidden confounders.
Lorenz Haubner Optimistic binary segmentation: A scalable approach to changepoint detection in high-dimensional graphical models Prof. Dr. Peter Bühlmann
Solt Kovács
Apr-2018
Abstract: We consider the problem of finding the structural breaks, also referred to as changepoints, in a sequence of non-homogeneous, high-dimensional data. In between the changepoints the data are assumed to be identically distributed, e.g. multivariatenormal. This can be interpreted as finding the changepoints in a piecewise constant graphical model. Different estimators based on the ideas of neighborhood selection and algorithms to compute them are presented. Specifically, the focus is on devising
approaches that scale well in the number of observations. Existing algorithms like dynamic programming and binary segmentation require many evaluations, which scale at least linearly, but in some cases even quadratically with the number of observations. Therefore they are practically not suited to deal with large-scale data, especially in the high-dimensional setting where even the estimation of single models is costly. An extension to binary segmentation called optimistic binary segmentation is proposed that, to the best of our knowledge, is the first approach with a logarithmic, and hence sub-linear number of required model fits. Moreover, this method is possibly applicable to change-point detection much more generally. Although no theoretical results are shown, a simulation study proves the superior computational performance and strongly indicates that the accuracy is comparable to existing methods in many
situations.
Dominik Bettler Predicting Swiss Restaurants' Success from Online Ratings with Mixed-Effect Models Dr. Martin Mächler
Daniel Müller
Apr-2018
Abstract: An increasing amount of user-generated content such as consumer reviews and online ratings is available through review platforms like TripAdvisor. Many studies have investigated the influence of electronic word-of-mouth on hotels financial performance, but only a limited number of studies conducted on restaurants performance. This study investigates the impact of online reviews in the gastronomy industry and its predictive capacity on restaurants financial performance as measured by revenue and growth.

Therefore longitudinal data including all Swiss restaurants from TripAdvisor and time-independent data from Google’s review page, the geographical levels of Switzerland and data from tourism satellite account of the Swiss Federal Statistical Office have been collected. Additionally, two confidential, independent microdata sets of individual restaurants have been provided by Swiss insurance and Swiss Economic Institute (KOF) at ETH Zürich. After merging these datasets by record linkage mixed-effect models have been applied.

The findings of this study reveal that the number of online reviews contributed by online users have a positive impact on revenue and growth. However, the influence of: online ratings, collected restaurant attributes or location, on the revenue and growth of a hotel could not be faultlessly established.

Therefore, the predictive power is limited. However, on the KOF dataset, a generalised linear mixed-effect model could outperform the baseline and was able to classify shrinking and non-shrinking restaurants. Consequently, not the right predictors have been used to explain the restaurant’s performance.

Sylvia Schumacher Machine learning meets time series analysis: Forecasting parking occupancy rates in the city of Zurich Dr. Markus Kalisch Apr-2018
Abstract: Traffic congestion is a major issue in urban areas and up to 30% are caused by cars searching for a free parking spot (Shoup, 2011). Digital parking space management and the Internet of Things (IoT) enable data collection of occupancy rates in parking lots through sensor technology. Forecasting occupancy rates and integrating these predictions in smart navigation systems can reduce pollution in urban areas and decrease car drivers’ stress level.
This thesis aims to develop a methodological framework for predicting parking lot occupancy rates for the city of Zurich empirically. The investigated predictions comprise horizons from 15-minutes-ahead up to 1-day-ahead. Seven different predictors are com- pared. Two of them apply approaches specialized in processing time series data (Re- current Neural Network (RNN ) and SARIMA), and three of them grow an ensemble of trees (Random Forest (RF ), Stochastic Gradient Boosting (GBM ), and Extreme Gradi- ent Boosting (XGB)). A linear model (LM) and a plug-in approach predicting the latest observed observation serve as baselines for comparison purposes.
Up to the knowledge of the author, this is the first study proposing a single prediction model for parking lots of an entire city and the first project exploiting a database com- prising several years. Moreover, this thesis analyzes how well the issue of predicting parking lot occupancy rates can be generalized to other cities. This is implemented by comparing a feature set based only on the time series of the parking lot vs. an extended set comprising information such as weather, events, and pricing information.
The framework proposed in this thesis can be used by the public sector as well as by owners of private parking lots. Therefore, it can serve as a means to make an important contribution to the evolution of smart cities in the future.
Jeremy Jude Seow Recommender Systems for Mass Customisation of Financial Advice Dr. Martin Mächler
Dr. Daniel Lenz
Apr-2018
Abstract: Traditionally, Relationship Managers are required to invest time and tedious calculations to make personalised recommendations on what new investment instruments their clients might be interested in buying. For the purpose of automating this process in the context of private banking, two existing model-based collaborative filtering recommender algorithms, the Confidence Weighted model and the Adaptive Boosting Personalised Ranking (AdaBPR) model were adapted for and evaluated on two different financial datasets. These recommender systems aim to maximise the acceptance rate of clients agreeing to buy the recommended instrument. We propose a third novel algorithm, the Adaptive Confidence (AdaCF) model, which borrows elements of the boosting framework from AdaBPR, and combines these with the additional information obtained from pre-built confidence weights. These models were all benchmarked against the trivial Popular model, which makes recommendations by finding the most globally bought investment instrument across all users. The performance of each recommender system is evaluated using four different ranking metrics: Area under the Receiver Operating Characteristic Curve (AUC), Mean Ranking Percentile (%MR), Normalised Discounted Cumulative Gain (nDCG) and Mean Average Precision (MAP). Results show that our novel AdaCF model performs the best overall amongst the benchmark models, with only marginally lower metric scores in certain cases. AdaCF also shows to be less sensitive to using non-optimised number of latent features k as ranking performance suffers less comparatively when using an ensemble of recommender learners, compared to a single component.
Manuel Wenig Optimal Mass Transportation with Emphasis on Applications in Machine Learning Prof. Dr. Nicolai Meinshausen Apr-2018
Abstract: Optimal mass transportation is on its way to become a major tool in numerous fields of application - in particular in machine learning. The recent emergence of approximate solvers allows to apply this computationally heavy machinery to large-scale data problems. Especially the induced notion of distance and similarity (defined through the optimal transportation cost) is of great interest in analyzing data, since it provides a unique ability to capture the geometry of the underlying information. Moreover it enables to compare basically everything that can be represented as a measure, ranging from empirical distributions over images to 3D characters.
On the other hand, optimal transportation problems have a very rich theoretical foundation. The solution to a problem of this kind strongly depends on the structure of the space and the regularity of both the cost function and the involved measures. This naturally leads to rather different motivations of new contributions concerning numerical methods and applications - often making it extremely difficult to figure out proper connections in between.
The aim of this thesis is to present the mass transportation problem in a suitable framework for possible applications in machine learning. This is done by interlinking theory, computational aspects and applications in a consistent and rigorous manner - with special emphasis on discrete and large-scale problems.

Zheng Gong Continuous Double Machine Learning Methods and its Applications Prof. Dr. Nicolai Meinshausen Apr-2018
Abstract: In \cite{chernozhukov2016double}, one method called Double/Debiased Machine Learning (DML) is proposed, which allows the estimation of treatment effects of binary treatment variable to reach the consistency rate square root-$n$ under some structural assumptions.
The objective of this paper is to propose a continuous version of DML method framework, and to give similar consistency and asymptotic normality result based on the proposed Continuous Double Machine Learning method (CDML).
In this paper, we come up with specifically two versions of CDML methods, namely, CDML with sample splitting and CDML without sample splitting. We will give appropriate assumptions for these two methods to be valid respectively.
Analogous to \cite{chernozhukov2016double}, we will show that the proposed CDML method framework is suitable for Continuous Treatment Effect model (CTE) and Instrumental Variable model (IV).
We also include a simulation section in this thesis, where we do simulations on CTE models: We implement the CDML methods with and without sample splitting and compare the performances in terms of errors and biases with Simple Regression method (SR) and Hirano \& Imbens method (HI). As the simulations results indicate, our methods, especially the one with sample splitting, perform well in various simulation set-ups. In the end, restrictions of this CDML framework and possible directions of future extension and generalization are mentioned.
Simon Hediger Generative Random Forest Prof. Dr. Nicolai Meinshausen Mar-2018
Abstract: When confronted with the limited number of observations in a data set, it would often be desirable to generate new data points with the same characteristics as the original data. We introduce an iterative sample generating procedure, which uses a cascade of random forests to gradually produce observations that are less and less distinguishable from the real ones. The proposed algorithm is embedded in an easy to use function in R, which accepts a data set as input and returns the desired number of new observations as output. We found that on data sets with only few variables, the procedure performs satisfactorily,
which means that the underlying data generation process is well reected. However, the application on several different data sets shows a rapid decrease in performance with increasing amount of variables. Furthermore, the variance with which new observations
are generated seems to be too large.
Meta-Lina Spohn Semantic and Syntactic Meaning in Neural and Count-Based Word Embeddings Prof. Dr. Nicolai Meinshausen Mar-2018
Abstract:
In this thesis we investigate state-of-the-art models for word embeddings and their properties. The Continuous Bag-of-Words Model (CBOW) and Continuous Skip-Gram Model (SG) from the toolbox Word2Vec (Mikolov, Chen, Corrado, and Dean (2013) and Mikolov, Sutskever, Chen, Corrado, and Dean (2013)) build word vectors while constructing a lan- guage model with a neural network trained on a large text corpus. The speed-up techniques Hierarchical Softmax (HS) and Negative Sampling (NS) help improving performance and efficiency of these models. The model GloVe (Pennington, Socher, and Manning (2014)) is a log-bilinear regression model based mainly on the co-occurrence statistics of the words in the text corpus. The models in Word2Vec and GloVe have in common that the result- ing embedding vectors show a property called additive compositionality. This means that semantic and syntactic meaning of words is captured by the vectors and simple vector ad- dition and subtraction reflect this meaning. A prominent example found with Skip-Gram with Negative Sampling (SGNS) is that the embedding vector of the word king minus the vector of man plus the vector of woman results approximately in the vector of queen. This is surprising as the models learn the meaning of words solely by being trained on plain text data.
We analyse the structure of the different models in detail, show relationships between them and further simple models and proof how additive compositionality arises in certain models under some constraints. In the last additional chapter we perform an empirical study of the models applied to text data from the internet platform reddit.
Stefano Radaelli Event Detection and Characterization on Twitter Streams using Unsupervised Statistical Methods Prof. Marloes Maathuis
Prof. Didier Sornette
Dr. Albert Blarer
Mar-2018
Abstract: Twitter represents an invaluable source of data that can be explored for the analysis of events triggering discussion among users. This social network offers the opportunity to study the dynamics of real-world occurrences over time and how social systems react to different stimuli. The aim of this thesis is to define and implement a framework for event detection and characterization on Twitter streams using unsupervised statistical techniques. This is intended to support monitoring activities of Twitter for either research purposes or security-related reasons for government organizations by providing actionable knowledge. The first step is the identification of events as peaks on time series describing the numbers of tweets published with specified hashtags, hence this problem falls in the area of anomaly detection. The superior performance of decomposition-based detection methods, that directly consider the structure of time series, is presented, in particular when using STL decomposition and decompositions with a robust trend estimation. The peaks detected are then analyzed based on their temporal shape through time series clustering to identify recurring temporal patterns. A detailed investigation of various clustering methods is performed. This includes both partitional and hierarchical techniques with specific distance measures for time series - such as Shape-Based distance, Dynamic Time Warping and a distance derived from the TOPS (Symmetric Thermal Optimal Path) method. The modification of k-means using Shape-Based distance and a related centroid function returns the best partitions, based on internal evaluation indexes, namely Average Silhouette Width and COP Index. Five relevant clusters are therefore analyzed by looking at the evolution of proportions of tweets and retweets over time, describing the activity triggered in the network by an event. This enables the identification of two common patterns characterized by exogenous spikes, along with three shapes that feature seasonality and varying growth and relaxation signatures around peaks. Changepoint analysis is included as last step to offer additional insights about the internal dynamics of events: in particular, BFAST (Breaks for Additive Season and Trend) method allows a clear definition of evolving temporal phases. The relevance of the framework developed is demonstrated by providing significant examples using a defined sample of Twitter data.
Yulia Kulagina Artificial Neural Networks for Solving Differential Equations Dr. Fadoua Balabdaoui Mar-2018
Abstract: Fast development of computer techno-logies over the past years made it possible to treat numerous mathematical problems, which were formerly reserved for theoretical analysis, with numerical methods. This possibility opened a great number of new research areas for
statisticians and computer scientists. Machine learning algorithms, especially neural networks, have shown outstanding performance in solving problems that need scientific computing for their solutions. These problems might arise from various disciplines such as physics, natural sciences, engineering, economics and financial sciences. Solid mechanics is a branch of continuum mechanics that studies behavior of solid materials. In the present thesis we analyze the relationship between the amount of force applied to a material, referred to as stress, and the corres-ponding amount of deformation of the
material, referred to as strain, that can be formulated in terms of a system of differential equations subject to certain boundary and initial conditions. We try to solve the stress-strain relationship modelling task by applying neural network models in two different frameworks. The first approach is theory-based and attempts to find a numerical solution to the problem by approximating the true solution of the boundary value problem, containing the constitutive equations, by using a simple feedforward neural network. The second approach is based on using a deep neural network with a sophisticated architecture, an RNN Encoder-Decoder model, by treating the original problem as a sequence-to-sequence modelling task. By applying this model to the synthetically generated dataset of stress and strain histories we try to predict stress history for an arbitrary strain history.

Juan José Leitón-Montero Statistical analysis of multi-model climate projections with a Bayesian hierarchical model over Europe Prof. Dr. Nicolai Meinshausen
Prof. Dr. Hans-Rudolf Kuensch
Feb-2018
Abstract: A hierarchical Bayesian model was used to analyze seasonal temperature and precipitation projections, over the PRUDENCE regions, of the CH2018 multi-model ensemble (RCP8.5). The implementation of this model expands the work done by Kerkhoff (2014), Tay (2016) and Künsch (2017) by evaluating both temperature and precipitation variables for every region-season combination.
Posterior distributions for the parameters associated to bias assumption coefficients, climatological means, inter-annual variability, and additive bias were estimated. Similarly, climate change estimates, with respect to year 1995, were calculated for five different time horizons. A generalized pattern of variation was found for temperature along all the region-season combinations analyzed, while season dependent and region dependent patterns were identified for precipitation.
Reduction of the absolute additive bias due to dynamical scaling was evaluated by comparing the bias components associated to the RCM-GCM chains and their corresponding drivers. Results were evaluated in terms of the probabilities of having a reduction of at least 20% in the said component and region-season-chain combinations were classified based on this value.
Shanshan Zhu Empirical Study of Units Classification and Clustering in Operations Management Prof. Dr. Marloes Maathuis Feb-2018
Abstract: In this thesis, we mainly discussed a typical problem in express companies' operations management. In operations, whole service area is divided into many management "units" and there are thousands of units in one company. To manage better, these units need to be divided into different groups. According to the number of units been labeled manually, we can divide the problem into two sub-problems: ordinal classification and clustering.

In ordinal classification, we tried three types of methods: 1. transform groups into numbers, and apply regression on the continuous variable, and transform predictions back into groups; 2. apply multi-class classification; 3. apply classification with ordinal methods proposed by E. Frank and Alan Agresti. For the first type, linear regression and support vector machine (regression) are used in the regression step and four post-processing methods ("nearest class", "ratio", "decide boundary based on Gaussian distribution", "decide boundary based on kernel density estimation") are used. Logistic regression, ordinal logistic regression, support vector machine (classification), classification and regression tree, random forests, gradient boosted trees, Gaussian mixture model based classification, and super learner are also used for the classification. Since data are unbalanced, SMOTE method is used to get more balanced data.

In clustering problem, we tried both unsupervised learning and semi-supervised learning with Gaussian mixture models.

To measure the performances of methods in ordinal classification and semi-supervised clustering, we calculated accuracy, f1 score, precision, sensitivity, MSE and MAE. In addition, considering our response variable is ordinal, we also proposed four "near" measures: accuracy (near), f1 score (near), precision (near) and sensitivity (near).

In ordinal classification problem, super learner performs best in accuracy and MAE. Linear regression with "nearest" post-processing shows best in all four "near" measures. Support vector machine (regression) performs best in precision and MSE. Support vector machine (classification) with ordinal method and balanced data has the best performance in sensitivity and f1 score. In clustering problem, performances of semi-clustering become better with the increasing of proportions of labelled data.

2017

Student Title Advisor(s) Date
Christoph Conradi Model Selection for Dynamical Systems Prof. Dr. Joachim M. Buhmann
Dr. Martin Mächler
Nov-2017
Abstract: Dynamical systems are mathematical models which are able to describe complicated relations in the natural sciences. The goal is to infer the true underlying model by the data it generates. Selecting the correct model from a set of competing models given noisy observations is extremely difficult. This is due to the high non-linearity of most dynamical system as well as due to the computational resources needed for traditional model selection tequniques like Markov Chain Monte Carlo.
By applying the mean-field gradient matching algorithm on model selection, this work proposes a new, cheaper model selection framework. Experiments on synthetic data show the advantages of this new framework. Especially small dynamical systems can be selected with speed and accuracy. Nevertheless, further improvements in gradient matching or Gaussian process regression are needed to select complicated real world systems.
Patricia Calvo Pérez Text Mining of Electronic Health Records Prof. Dr. Karsten M. Borgwardt
Prof. Dr. Nicolai Meinshausen
Dr. Damián Roqueiro
Nov-2017
Abstract: In this project, we implement a model for mortality prediction of pa- tients hospitalized in the Intensive Care Unit based on their clinical notes. We focus on in-hospital, 30-day and 1-year post-discharge mortality. Our ultimate goal is to improve decision-making by applying an algorithm that provides accurate predictions and individualized recommendations for each patient. The proposed approach consists of a hie-rarchical convolutional neural network (CNN) that models the inherent structure of documents. It also incorporates a technique that allows us to visualize the most relevant sentences in a patient’s clinical note. We present a rigorous comparison with well-known information retrieval approaches such as bag- of-words (BOW), tf-idf and latent dirichlet allocation (LDA). Our results demonstrate that the model can identify known causes and symptoms of mortality and effectively handle complex language semantics such as phrases and negations automatically. This constitutes a major improvement over the baselines, which do not provide interpretable results and solely rely on word semantics. On the other hand, in terms of performance, the tf-idf baseline substantially outper-formed the neural network model. Com-parable results were obtained when using an ensemble of CNNs with different hyperparameter configurations. We further present a supervised approach for disambiguating common acronyms in clinical data. In particular, we deve-loped a system for disambiguating 74 acronyms in clinical discourse. Our method achieved extraordinary perfor-mance in the validation set across all acronyms and remarkable generalization power in an independent dataset. We also evaluate the impact of this algorithm on the mortality prediction task by incor-porating it as a preprocessing step.
Markela Neophytou Using Transformations in Regression Models Dr. Markus Kalisch Nov-2017
Abstract: One of the most common steps in statistical analysis, and especially in regression models, is the transformations of the response and/or the predictor variables, since transforma- tions can make the statistical inference more reliable. In this thesis, we investigate the use of transformations in regression models (parametric and non-parametric). Firstly, the reasons for using transformations are presented and then theory for the interpretation of the results after transforming the response variable is developed. The only assumption is that the transformation makes the distribution of the data approximately symmetric. Then, we compare the common transformation of proportional data, namely the arcsine transformation, with the Generalized Linear Model in the case of binomial proportional data, and with the logit transformation for the non-binomial case. This study is extended to binomial data with overdispersion for assessing alternative models (Generalized Lin- ear Mixed Models and quasi-likelihood models). In addition, for small sample sizes, the application of Exact Logistic Regression is investigated. We close the parametric part of this thesis with the application of the Box-Cox method for finding the optimal response variable transformation w.r.t. normality, homoscedasticity and linearity. Then, we move to non-parametric regression, as in many cases a simple transformation cannot fulfil the required assumptions. Additive models can be seen as non-parametric transformations of the predictor variables; still a ‘traditional’ transformation can be used on the response vari- able to improve the results. The application of additive and generalized additive models is explained in detail through examples. In the end, the Additivity and Variance Stabilizing transformation is presented and applied in datasets. This method is, in a practical way, an extension of the Additive models, as it finds non-parametric transformations for the response and the predictor variables simultaneously, that stabilize the variance and make the relationship linear. These non-parametric methods are useful tools for finding trans- formations that can be used in a parametric way (parametric terms in an additive model or a fully parametric model). For all the aforementioned methods, the very important step of checking the requirement assumptions is explained in detail.
Srivatsan Yadhunathan Decentralized Combinatorial Optimization over Holarchic Networks Dr. Evangelos Pournaras
Prof. Dr. Nicolai Meinshausen
Oct-2017
Abstract:
Combinatorial Optimization problems are often solved by using hierar- chical networks such as trees. Such networks can be modeled by using Holarchy, a hierarchical self-organization technique using autonomous agents who also serve as the part of the network. This provides a greater flexibility to the network. In this thesis we evaluate the perfor- mance of holarchical models using I-EPOS as a case study. We use three initialization schemes: Asynchronous Holarchy, Synchronous Holarchy and Full Holarchy to pre-optimize subtrees of the I-EPOS network acting as the ”Holons” before globally optimizing them using I-EPOS. We eval- uate their performance by measuring the variance of the aggregated plans, which act as the global cost of the system. We also study the plan preferences of the agents which act as the local cost and the standard deviation amongst plan selections in agents, which gives the unfairness of the system. We extend the I-EPOS algorithm to to perform these ini- tialization schemes in a simulation environment by executing them in parallel. The pre-optimization and simultaneous execution of these models provide a higher degree of freedom in the system in search of the optimum. We also develop the Hybrid Holarchy algorithm, which combines recursive local optimization technique with Full Holarchy and Synchronous Holarchy algorithms to perform decentralized combinato- rial optimization. Experimental evaluations of the proposed algorithms for various real life applications show a potential to improve the opti- mization performance of the network.
Vaibhav Krishna Deep Non-linear approach to Matrix Factorization for Recommender Systems Prof Dr. Andreas Krause
Dr. Nino Antulov-Fantulin
Sep-2017
Abstract: Recommender systems have proven to be instrumental in the era of in- formation explosion, where such systems have help determine which information to offer to individual consumers, allowing online users to quickly find personalized information. Collaborative filtering ap- proaches proved to be effective for recommender systems in predicting user preferences using past known user ratings of items. Though these approaches are extensively investigated in research community, it is still limited to different variants of matrix factorization. However, it is possible that the mapping between the latent factors learned from these and the original features contains rather complex hierarchical information with lower-level hidden attributes, that classical one level matrix factorization can not interpret. In this thesis, we aim to propose a novel multilayer non-linear approach to a variant of NMF, that is able to learn such hidden attributes. Firstly, we construct a user-item matrix with explicit ratings and learn latent factors for representations of users and items from the designed nonlinear multi-layer approach. Secondly, the architecture is built with different non-linearities and optimizers to better learn the latent factors in this space. We show that by doing so, our model is able to learn low-dimensional representations that are better suited for recommender systems on several bench mark datasets.

Andrea Mazza Dynamic Linear Models applied on Life Insurance market Prof. Dr. Peter Bühlmann
Dr. Marcel Dettling
Sep-2017
Abstract:
The aim of this thesis is to study the correlation between the overall underwritten Life Premium and the Gross Domestic Product in some countries across the last 30 years: this is not a pure theoretical research, but rather an applied work that can potentially be used by Insurance market, in particular by Swiss Re Corporate Solutions, where I have worked since 2015, as a guidance to forecast how the market will be in the future. Different techniques are used to fit this data, from statistical models such as linear time series regression and dynamic linear model to more econometrics-oriented approaches, such as fixed effects estimator.
The overall goal is to analyse this correlation for the last decades, controlling for some other key factors in Life Insurance market, as well as some shocks in the world which could potentially be related to people willingness to write a coverage. Moreover, we will try to identify any hidden evolving pattern over the years, which can be seen only using some dynamic linear model, thanks to the famous state space representation, which will be finally compared to the outcomes of the other more classical approaches introduced before.
The main result that is achieved by this work is related to the positive correlation between Life Premium and Gross Domestic Product: this fact is clearly visible with all the studied models, with the only exception of China which seems to have a peculiar pattern. Few differences are also highlighted between the countries in this study, in particular for the other covariates which correlate with the target variable in different ways.
It is also important to point out that a deep comparison between the models is not completely fair, due to the their inner nature: the time-varying approach has much more flexibility than the fixed one, meaning that it will achieve a better fit for this data, which is clearly visible by a better residual analysis.
Finally, these models are strong hints for the Life Insurance underwriters and experts to try to forecast how the market will react in the future to some shocks, as well as to economic behaviors, based on historical data. It is key to point out that these models are created on purpose as general as possible, so in order to have a better and more precise view of a particular country, specific market shocks and legal information should be inserted into the desired model.
Since a vast audience can read this work, technical rigidity is less in scope, going into the direction to a more applied discussion in order to understand what is going on and to take meaningful decision for the future.
Samarth Shukla Mapless Navigation through Deep Reinforcement Learning Prof. Andreas Krause
Prof. Roland Siegwart
Aug-2017
Abstract: Reinforcement Learning, aided by the representation learning power of deep neural networks has enabled researchers to solve complex decision making problems, the most notable one being AlphaGO, a computer program which beat the champion of the board game GO. Deep reinforcement learning has also been applied in the field of robotics, enabling robots to learn complex behavior directly from raw sensor inputs.

In this thesis, we present a reinforcement learning based approach to the map-less navigation problem in robotics. We train an end-to-end map-less motion planner in a simulation environment, which takes target data along with laser sensor data as inputs, and outputs robot motion commands. We show that a model trained in a specific environment can be successfully used for navigation in other unseen environments. We also compare the performance of our planner with state of the art map-based motion planner.
Andres Camilo Rodriguez Escallon Synthetic Dark Matter Distributions using Generative Adversarial Networks Prof. Dr. Peter Bühlmann
Dr. Aurelien Lucchi
Dr. Tomasz Kacprzak
Aug-2017
Abstract: To understand the properties of Dark Matter distributions, N-body simulation techniques are usually used by cosmologists. They consist of a box with millions of particles that in- teract with each other due to gravitational forces along cosmic time. The computation of these interactions in an accurate way is expensive (Teyssier et al., 2009) and new methods could help reproducing these simulations in a faster way. Generative Adversarial Networks (GANs) (Goodfellow et al., 2014) may be used for this end. They do not rely on Maximum Likelihood estimation (MLE) and avoid known issues of intractable probability functions that require approximations like other generative models that rely on MLE (e.g. Varia- tional Auto-encoders (Kingma and Welling, 2013) ). GANs have been able to generate real-like natural images and their usage in Cosmology is just starting. We show how GANs can be trained on N-body simulations to reproduce high quality samples that resemble the main characteristics and keep an statistical independence from the training-set. To test this, we use Power Spectrum, Cross Power Spectrum and Peak statistics. They are theo- retically grounded and largely used to evaluate the N-body simulations (Kilbinger, 2015; Liu et al., 2015; Kacprzak et al., 2016; Dietrich and Hartlap, 2010). These metrics allow us to go beyond visual inspection and give us a robust way to measure the performance of our generative model.
Kalina Cherneva Churn modelling in the nancial sector: A machine-learning approach Markus Kalisch
Georgi Nalbantov
Aug-2017
Abstract: Customer churn analysis aims at predicting the probability that an existing client will dispose of all their products in a company. This master thesis tries to find the bestperforming
algorithm to model the probability of churn per customer of a bank. We review and compare empirically more than 20 predictive models using the area under
the ROC curve as a performance metric. Gradient tree boosting proves to be the best performer. Furthermore, ensembles of different models improve the results. Boosting and ensemble learning are state-of-the-art methods to handle binary classification problems of
this sort. Friedman's gradient boosting machine is a stage-wise additive algorithm, where the loss of the model is minimized by adding weak learners in a gradient-descent manner. Ensemble learning has been shown to improve the performance of stand-alone algorithms
by identifying parts of the feature space where each model is performing relatively well. Additionally, we aim to infer empirically if the decision boundary separating the churners from the non-churners, although unknown, is additive in the features, or if inter-action terms are present. This is done using smoothing splines to increase the complexity of the base learner. The comparison between a complex additive boundary and a tree with higher interaction depth shows that interaction is needed. Finally, we examine the effect of artificially balancing a data set versus undersampling from the majority class. This is compared to the performance of the models trained on an unbalanced data set. We show that balancing the data set improves the performance of all algorithms studied.
Lukas Steffen A Semi-Supervised Learning Approach to Causal Discovery Prof. Dr. Nicolai Meinshausen Aug-2017
Abstract: The objective of causal discovery is detecting cause-effect relationships between variables. The identification of these relations via correlation based methods of classical statistics is infeasible by the nature of the task, thus causal discovery requires distinct procedures for classifying causal effects. One prevalently applied solution is performing interventional
experiments, where manipulations to the system are used to distinguish causes from effects. Interventional experiments are not without drawbacks, as producing interventional data can be rather expensive and in special cases even unethical. In this thesis we examine
the recent proposition of utilizing a machine learning approach for causal discovery. Semisupervised learners are based on observational data under the prerequisite that partial knowledge of the causal-effect relationships is available. In our experiments we test and compare the proposed classiffers on a large dataset containing interven-tional data from an experiment in molecular biology as well as artificial data generated by a structural equation model.
Jeffrey Näf Review of Asymptotic Results in Empirical Process Theory Prof. Dr. Sara van de Geer Aug-2017
Abstract: Empirical process theory has proven invaluable for statistics over the past few decades. Loosely speaking, the theory revolves around stochastic processes depending on a random sample. This is made precise by the notion of em- pirical measure, which is a proper probability measure for each realization. A stochastic process is then defined, using any collection of measurable real-valued functions on the sample space, by taking integrals with respect to the empirical measure. This allows the embedding of many statistical problems into a rigor- ous mathematical framework. For instance, the law of large numbers (LLN) and central limit theorem (CLT) can be formulated in terms of such processes with the index set F consisting of a single function. In this thesis we review three fundamental asymptotic results for empirical processes, as presented in van der Waart and Wellner (1996). These identify sufficient conditions on the collection of functions F for the LLN and CLT to hold uniformly over all functions. The goal of the thesis is a rigorous treatment of these results while striving for a maximum of clarity. Over the course of four chapters, results and concepts are introduced step by step, allowing to prove the final results in full detail. The theorems in these chapters themselves present valuable tools used in empirical process and statistical theory: Chapter 2 introduces the fundamental notions and definitions. Chapter 3 generalizes many concepts from probability the- ory to potentially non measurable maps, while Chapter 4 introduces important probabilistic inequalities and the method of chaining. Chapter 5 deals with the hugely important concept of symmetrization and, with the ideas of Chapter 3, applies this to the non measurable map implied by the empirical process. Fi- nally, the main theorems are presented and studied in Chapter 6 and 7. The text is complemented by a wealth of results stated in Appendix A, most of them “standard knowledge” from measure and probability theory. Throughout the thesis, additional results are derived to facilitate the understanding of the in- volved theorems and definitions. We conclude by discussing some of the more recent developments in the field and provide a small application of the results studied in Appendix B.
Jacobo Salomon Avila Assessing the Impact of Renewable Energy Resources on Fossil Fuel Electricity Generation: A Machine Learning Approach Prof. Dr. Nicolai Meinshausen
Dr. Jan Abrell
Mirjam Kosch
Aug-2017
Abstract: One of the major means to address the challenges related to fossil fuels is the promotion of renewable energy resources. The present thesis intends to measure the impact of renewable
resources, namely wind and solar power, on electricity generation with fossil fuels. Due to the absence of a control group in the available data set, the use of hypothetical scenarios predicted with a model constructed with the available data is proposed. Causal inference tools are used to visualize the system under study, for covariate adjustment, and
to measure the impact of the inter-ventions, or “treatment effect”. To construct the models for this counter-factual analysis, several machine learning procedures are used and compared, selecting the ones that perform the best with respect to out-of-sample predictions. Once the best models are selected, four different scenarios (increase/decrease of generation with renewable resources) are simulated by predicting counterfactual outcomes with such models. The results from these scenarios are then compared with the data without an intervention to assess the impact of these technologies. Additionally, confidence intervals
of the results are constructed using the bootstrap, a resampling technique. In this setting, the Random Forest algorithm proves to perform the best in both terms of goodness-of-fit and out-of-sample prediction performance. The analysis yields the amount of electri-city that would (not) have been produced with fossil fuels if there were a decrease (increase) in generation with renewable resources. This study proposes a way to measure the impacts of hypo-thetical interventions while using machine learning algorithms instead of the commonly used linear regression approach in econometrics. These methods prove to fit better models with the data at hand; their use for counterfactual analysis can help policy makers to have a better understanding of the impact of the energy supply of renewable
technologies, and the results can be useful to bolster policies to promote them.
Yanhao Shi Implementation and Applications of Critical Line Algorithm for Portfolio Optimization Dr. Martin Maechler Aug-2017
Abstract: In Markowitz's Modern Portfolio Theory, portfolio on Efficient Frontier has the minimized risk at a given level of expected return, or has the maximized expected return at a given level of risk. Markowitz also introduced Critical Line Algorithm, a quadratic program-
ming method for portfolio selection. Critical Line Algorithm is of great importance due to the fast implemen-tation compared with other optimization approaches. While to thebest of our knowledge, there is only one open-source code written in Python published (Bailey and Lopez de Prado, 2013). Based on the existing Python code, this article aims
at solving portfolio optimization pro-blem by performing Critical Line Algorithm in R code. Firstly, the mathe-matical description of Critical Line Algorithm is introduced. Then the
improvements of code are tested and analyzed using S&P500 Index and NASDAQ Index from R package FRAPO, as well as the assets data provided by OLZ company. The properties of CLA results and related extensions are discussed in the chapters. Moreover, the performance of implementation is compared with perfor-mances of other optimization approaches.
Keywords: Critical Line Algorithm, Efficient Frontier, weight constraints, portfolio optimization, quadratic programming
Daniela Hertrich mage Inpaintingwith Sparse DictionaryLearning Methods Prof. Dr. Nicolai Meinshausen Jul-2017
Abstract: Dictionary learning aims at finding a frame (called dictionary) that allows to representsome training data as a sparse linear combination of dictionary elements. In recent yearsdictionary learning has lead to state-of-the-art results in various image processing taskssuch as, amongst others, image inpainting.In this thesis we perform a detailed analysis of different dictionary learning methods ap-plied to the task of image inpainting. We consider four different methods to obtain dic-tionaries: principal component analysis, non-negative matrix factorization, alternate min-imization and online dictionary learning. The goal is to find a dictionary from a trainingset of facial images that allows us to represent these images as a sparse linear combinationof the dictionary elements. Then we use anl1-minimization algorithm to inpaint missingpixels of a different set of facial images, called the test set. We examine the performance ofthe different dictionaries for reconstructing images for which the missing pixels are eitherordered in the form of a sqaure patch or randomly distributed across the whole image
Leonard Henckel Graphical Criteria for Efficient Total Effect Estimation via Adjustment in Multivariate Gaussian Distributions Prof. Dr. Marloes Maathuis
Emilija Perkovic
Jul-2017
Abstract:
In this thesis, we consider the estimation of total effects via adjust- ment in the multivariate Gaussian setting. We introduce a new theorem that can compare many valid adjustment sets in terms of their asymp- totic variance using just the graph structure of the underlying causal directed acyclic graph. Further, we use this result to construct a valid adjustment set O that always provides the optimal asymptotic variance. It is also shown that among all asymptotically optimal valid adjustment sets, O yields the strictly best finite sample variance.
Simona Daguati Possible ways to deal with survival bias arising in Cox regression analysis Prof. Dr. Marloes Maathuis Jul-2017
Abstract: Statistical techniques from the field of survival analysis are widely used for the assessment of treatment effects in randomized clinical trials. One of the most common approaches is to fit a Cox proportional hazards model including the explanatory variable for treatment. In a first part of the thesis, we follow the paper from Aalen, Cook, and Røysland (2015), which focuses on the question whether Cox analysis of a randomized clinical trial allows for a causal interpretation of the treatment effect. To illustrate that this is not the case for the Cox hazard ratio in a setting with unmodelled heterogeneity, we reproduce theoretical results as well as simulation studies from Aalen et al. (2015), and we complement the material by additional calculations and simulation studies. The main result from Aalen et al. (2015) is that a causal interpretation of the hazard ratio is lost due to the fact that the risk sets beyond the first event time consist of subsets of individuals who have not previously gotten the event. This implicit conditioning damages the initial balance in the distribution of potential confounders between different treatment groups. In the literature, this phenomenon is referred to as survival bias.
The purpose of the second part of the thesis is to discuss some approaches which enable estimation of the causal hazard ratio. As a first method, we will present a frailty approach suggested in Stensrud (2017), which estimates the causal hazard ratio by adjusting the Cox estimate of the marginal hazard ratio on a interval (t1 , t1 + ∆) for survival bias. Moreover, we will introduce a modification of this approach which yields a bias reduction by averaging the adjusted estimates over multiple intervals. In addition, we will consider an accelerated failure time model. Weibull distributed survival times fit into both frameworks, the one of proportional hazards models and the one of accelerated failure time models (Cox and Oakes, 1984, Section 5.3). This allows us to give a detailed comparison of the performances of the various methods on Weibull distributed survival times by simulating different scenarios. An important conclusion is that the methods based on marginal Cox estimates work well if the marginal hazard ratio is estimated on a time interval in the beginning of the study and for a suitable interval length. For the investigation of optimal interval lengths, we have carried out several simulation studies and analysed their bias- variance plots. On the other hand, we find that the main drawback of these methods is that their estimates become unstable if the data sets are too small or if the density of the data points is too low. In some settings, this instability may destroy the analysis. In contrast, the accelerated failure time model performs reasonably well for smaller data sets. Moreover, its performance does not depend on additional tuning parameters. However, a proportional hazards model can only be reformulated as an accelerated failure time model, if the underlying distribution of the survival times is Weibull (Cox and Oakes, 1984, Section 5.3).
Nicholas Tan Hierarchical Testing on Genome-Wide Association Studies Data Dr. Markus Kalisch Jul-2017
Abstract:
When testing for significance in high-dimensional datasets (such as when dealing with genome-wide datasets), multiple-testing becomes an inherent problem. In this thesis, we will look at how we can boost the power of tests by utilising p-value aggregation via aggregation methods such as Stouffer’s method. When dealing with high-dimensional data, we brought in the multi-sample splitting technique by Meinshausen, Meier, and Buhlmann (2009) to calculate p-values for an arbitrarily large number of variables. In addition, we also applied the idea of hierarchical testing by Mandozzi and Buhlmann (2015), exploiting the data’s innate hierarchical structure to select statistically significant clusters of arbitrary sizes, while at the same time, reduce the amount of computation needed. Through simulations with both low- and high-dimensional datasets, we found that for inhomogenous datasets, we can achieve a higher power when aggregating separate p-values than when pooling said datasets together. Also, the increase in power does not come at an expense of higher error rates – we were able to achieve similar error rates when aggregating p-values or when pooling the data. Through the simulations, we also found out that we can improve upon the results of hierarchical testing by working with more accurate hierarchical structures. Lastly, we demonstrated how the techniques mentioned in the paper could be applied to real-life datasets, such as performing genome-wide association studies using real human genome sequences.
Leo Maag Statistical Significance in Genome Wide Association Studies Dr. Markus Kalisch Jul-2017
Abstract:
Genome wide association studies (GWAS) are exploratory approaches to detect previously unknown associations between common types of variation in the genome and a phenotype like the presence of a disease or the characteristic of a trait. The most common variations are single nucleotide polymorphisms (SNP), which are variations in a single base pair in the genome. Often SNPs are tested individually for association with the phenotype. This thesis reviews a method for statistical inference in GWAS proposed by Buzdugan, Kalisch, Navarro, Schunk, Fehr, and Bu ̈hlmann (2016) that is based on joint modeling of all SNPs in a regression setting. First the main challenges are introduced and illustrated: Quantifying uncertainty in high-dimensional regression, dealing with strongly correlated variables and the multiple testing problem. In the second part it is shown how these challenges can be tackled. The concept of multiple sample splitting allows to construct valid p-values in a high-dimensional regression. And hierarchical testing exploits the correlation structure of the variables to adapt its resolution level to the strength of the signal in a data-driven manner. Together they form an algorithm that is able to identify significant individual or groups of SNPs in the high-dimensional setting while controlling for the family-wise error rate.
Yarden Raiskin Automated ATC-code Categorization of Medication Prof. Dr. Marloes Maathuis
Prof. Dr. Thomas Hofmann
Dr. Carsten Eickhoff
Jul-2017
Abstract: In recent years, modern hospitals have begun storing increasingly large amounts of clinical data, often in textual and unstructured forms.
These data may contain precious insights that can be discovered and exploited, by applying machine learning or information retrieval techniques.

The contribution of this work is in developing a data augmentation process and in developing drug prescription classifiers, using state-of-the-art Recurrent Neural Networks architectures.

We develop a data augmentation procedure and apply it to curate four different data-sets, on which we evaluate the classifiers.
We tested regularization techniques, dropout, L2 norm penalty, target replication and noisy activation functions, in order to improve classification performance.

Our experiments show that the developed classifiers outperform the baseline classifiers on all four data-sets.
Our model achieves a Mean Reciprocal Rank of 0.981 on the unaugmented data-set, whereas the baseline achieves a Mean Reciprocal Rank of 0.96.
Lydia Braunack-Mayer Interference Between Common Respiratory Pathogens Professor Peter Bühlmann
Professor Sebastian Bonhoeffer
Professor Roger Kouyos
Jul-2017
Abstract: Rhinovirus, Influenza and Respiratory Sinctial virus and other common bacteria and viruses pose a serious burden to both individual and public health. These pathogens are simultaneously present in a population and, yet, epidemiological studies of the complex factors that cause these illnesses tend to focus on a single pathogen. The aim of this
thesis was to understand the shared determinants of infection by common bacteria and respiratory viruses, focusing on interference between pathogens. Statistical inference was
applied to explore the incidence of infection by 16 common pathogens in multiplex PCR tests conducted at the Universitaetspital Basel, between June 2010 and September 2015. With Fisher's exact tests for independence, cross-wavelet analyses and an SIR model
with cross-immunity, patterns in the pathogens detected were found to be consistent with the hypothesis that, for a number of common respiratory viruses, infection by one pathogen interferes with infection by a second.
Francesco Ortelli Statistics meets optimization: random projections and nearest neighbor search Prof. Dr. Sara van de Geer
Benjamin Stucky
May-2017
Abstract: In the era of big data the number of situations where one has to work with high-dimensional data sets is growing. As a consequence the application of some statistical techniques to such problems is slowed down considerably from a computational point of view by the
high dimensionality of the data: this phenomenon is called the curse of dimensionality. Moreover sometimes it can even be costly to store the data itself. We present some variants of the Johnson-Lindenstrauss Lemma, a data-oblivious dimensionality reduction
technique, and expose how it can be applied to the (approximate) nearest neighbor search problem in order to break the curse of dimensionality. When presenting the variants of the Johnson-Lindenstrauss Lemma the focus will lie on the time required by their
computational application. For the application to the nearest neighbor search problem we will see that the Johnson-Lindenstrauss Lemma represents the bottleneck in terms of time re-quired. Finally we will complete the work by performing some simulations, aimed at understanding how to implement the theory in the best possible way.
Samuel Schaffhauser Detection of Hyperreflective Foci in Optical Coherence Tomography Prof. Dr. Nicolai Meinshausen
Dr. Clarisa Snáchez
May-2017
Abstract: Diabetic macular edema (DME) is a retinal disorder characterised by allo-cation of cystoidal fluid in the retina. The current treatment consists of re-peated antivascular endothelial growth factor (anti-VEGF) injections. Recent studies indicate that the presence and number of hyperreflective foci (HRF) could be a prognostic biomarker for treatment response in DME. Since the detection of HRF is laborious, manual foci quantifications seem infeasible. Therefore, an automated detection of
HRF in optical coherence tomography (OCT) images is designed to assist ophthalmologist in their endeavour.
191 fovea centred B-scans from 76 patients with DME were obtained out of a
clinical database and serve as training set. A further data set with 88 B-scans
from 39 patients forms the test set and contains annotations from two inde-pendent observers. HRF were only anno-tated in the layers ranging from the inner plexiform layer (IPL) to the outer nuclear layer (ONL) as manual detection is challenging in the remaining layers. A supervised fully convolutional neural network (CNN) trained on patches classifies the central pixel into hyper-reflective foci or background. The CNN consists of 7 convolutional layers and 2 max-pooling layers. After Providing the system enough training samples to fit its parameters, it is capable to detect HRF in OCT B-scans. The derived results were compared to manual annotations made by two human graders for the 3mm region surrounding the fovea in the central B-scan. The classifier has a free-response receiver operation charac-teristic (FROC) curve for the inde-pendent test set above the operation point of two independent graders, take one grader as truth and the other as classifier. Comparing the classifier to a random forest with PCA components reveals that the performance is remarkable better. An image analysis algorithm for the automatic detection and quantification of HRF in OCT B-scans was developed. The experiments show promising results to use convolutional neural network to obtain automated detection and foci based biomarkers for build on medical studies.
Lennart von Thiessen Linear Regression Based on Imputed Data Sets and a Further Look on missForest Prof. Dr. Peter Bühlmann
Dr. Daniel Stekhoven
May-2017
Abstract:
Johannes Göbel Analysis of Financial Data with Nonlinear Time Series Approaches Dr. Lukas Meier May-2017
Abstract: In this thesis we are analysing three sets of financial time series with R. We will first review parametric linear time series processes and will show that they are not sufficient for analysing finan-cial data. In chapter 3 we will intro-duce parametric nonlinear time series models namely the ARCH processes which
were introduced by Engle in 1982 and their extension the GARCH processes introduced independently by Taylor and Bollerslev in 1986. Since it is essential for parametric time series analysis that the model chosen is the true data generating model in order to provide good results and choosing the wrong model will introduce bias we
will present in chapter 4 the additive nonlinear model as an example of nonparametric nonlinear time series models. The R code we used in each chapter for the simulations and the analysis of the data can be found in
the Appendix. The following R packages were used: quantmod, FinTS, rugarch and mgcv.
Konrad Knuesel An ROC curve based comparison of multivariate classification methods Dr. Markus Kalisch Apr-2017
Abstract: The goal of this study is to compare classification methods that can be used to develop a diagnostic test based on multivariate data. The methods were evaluated based on their ROC curves. Following an introduction to the univariate ROC curve, estimators for the one-dimensional case were compared under a variety of simulation settings. These univari- ate estimators are: empirical, binormal, ”log-concave,” and kernel-smoothed. To evaluate the methods, data was simulated from known distributions under both small and large sample size settings. Comparing the accuracy of the estimators (defined by how well they approximate the true ROC curve), the binormal method performed best with small sam- ple sizes while the log-concave and kernel-smoothed methods performed best with large sample sizes. The focus of the study then turns to the multivariate case. The following classification methods were compared in a simulation study: simple average, distribution- free, LDA, QDA, logistic regression, and SVM. In the two-dimensional case, every method performed similarly except for the simple average and SVM, which were considerably worse. In the six-dimensional simulation that followed, QDA and SVM were generally the best performing methods although in certain cases, LDA and logistic regression had somewhat better results. Finally, the classification methods were applied to a medical data set. In this case, LDA and logistic regression were found to have the best cross-validated performance.
Emmanuel Profumo Finding the number of clusters via standardization of validity plots using parametric bootstrap Dr. Martin Mächler Mar-2017
Abstract: In this thesis, we present and study a method to estimate the number of clusters in a data set. The calibration method consists in comparing cluster validity index values to the ones obtained under a reference distribution, yielding the so-called gap statistic. Then, we present a reference model for the absence of clusters for mixed-type
data, which can be seen as a genera-lisation of models for continuous data . We give R functions to implement this method and null models, and run a simu-lation on mixed-type simulated data to test the performances of the calibration method depending on parameters such as separability between clusters . We also propose a slightly modifi?ed version
of the gap statistic, and test it on our simulated data.
Nina Aerni Evaluation of Feature Selection Methods for Classi?cation of Autism based on ABIDE II Prof. Dr. M. Maathuis
Pegah Kassraian Fard
Prof. Dr. N. Wenderoth
Mar-2017
Abstract: This thesis evaluates several feature selection methods in combination with the Support Vector Machine (SVM) classifer to distinguish between autistic and typically developed subjects. The Autism Brain Imaging Data Exchange II (ABIDE II) database is used for this thesis. This database includes 1044 resting state functional and structural MRI scans. First, the MRI scans were preprocessed using the Statistical Parametric Mapping (SPM 12). In high dimensional data sets, such as the data at hand, the number of predic-tors p is a lot bigger than the number of observations N. We reduce the feature space with feature selection methods to avoid overfitting. We achieved an accuracy of about 64 % with the univariate filter selection methods
t-test, chi-squared and difference in mean on the feature set including functional and structural features and the covariates. For the multivariate selection method Principal Feature Analysis (PFA) we achieved lower accuracy for this high dimensional data
set. In comparison to Kassraian Fard et al. (2016), where only functional MRI data is used for the classification, this thesis also considers the structural MRI scans for the analysis. We did not see the expected increase of the accuracy results by the addition
of structural features to the functional features. In fact, the addition of the covariates sex, age and IQ score increased the accuracy to a greater extent.
Emiliano Díaz Online deforestation detection Seminar for Statistics Spring 2017 Prof. Dr. Marloes Maathuis Mar-2017
Abstract:
Deforestation detection using satellite images can make an important contribution to forest management. Current approaches can be broadly divided into those that compare two images taken at similar periods of the year and those that monitor changes by using multiple images taken during the growing season. The CMFDA algorithm described in Zhu et al. (2012) is an algorithm that builds on the latter category by implementing a year-long, continuous, time-series based approach to monitoring images. This algorithm was developed for 30m resolution, 16-day frequency reflectance data from the Landsat satellite. In this work we adapt the algorithm to 1km, 16-day frequency reflectance data from the modis sensor aboard the Terra satellite. The CMFDA algorithm is composed of two submodels which are fitted on a pixel-by-pixel basis. The first estimates the amount of surface reflectance as a function of the day of the year. The second estimates the ocurrence of a deforestation event by comparing the last few predicted and real reflectance values. For this comparison, the reflectance observations for six di↵erent bands are first combined into a forest index. Real and predicted values of the forest index are then compared and high absolute di↵erences for consecutive observation dates are flagged as deforestation events. Our adapted algorithm also uses the two model framework. However, since the modis 13A2 dataset used, includes reflectance data for di↵erent spectral bands than those included in the Landsat dataset, we cannot construct the forest index. Instead we propose two contrasting approaches: a multivariate and an index approach similar to that of CMFDA. In the first prediction errors (form first model) for selected bands are first compared against, band-specific, thresholds to produce one deforestation flag per band. The multiple deforestation flags are then combined using an or rule to produce a general deforestation flag. In the second approach, as with the CMFDA algorithm, the reflectance observations for selected bands are combined into an index. We chose to use the local Mahalanobis distance of prediction errors for the selected bands as our index. This index will measure how atypical a given multivariate predicted error is therby helping us to detect when an intervention to the data generating mechanism has occurred, i.e. a deforestation event. We found that, in general, the multivariate approach obtained slightly better performance although the index approach, based on the Mahalanobis distance, was better at detecting deforestation early. Our training approach was di↵erent to that used in Zhu et al. (2012) in that the lower resolution of the reflectance data and the pseudo ground-truth deforestation data used allowed us to select a much larger and diverse area including nine sites with di↵erent types of forest and deforestation, and training and prediction windows spanning 2003-2010. In Zhu et al. (2012) reflectance and deforestation information from only one site and only the 2001-2003 period is used. This approach allowed us to make conclusions about how the methodology generalizes accross space (specifically pixels) and accross the day of the year. In the CMFDA and our adapted CMFDA methodology a single (possibly multivariate) threshold is applied to the prediction errors irrespective of the location or the time of the year. By comparing the results when thresholds were optimized on a site- by-site basis, to those when a single threshold was optimized for all nine sites we found that optimal thresholds do not translate accross sites, rather they display a local behavior. This is a direct consequence of the local behavior of the prediction error distibutions. This lead us to try to homogenize the error distributions accross space and time by applying transformations based on di↵erent observations and assumptions about the predicted error distributions and their dependence on time and space. However, our e↵orts in this sense did not improve performance leading us to recommend the implementation of the multivariate approach without transforming predicted errors.
Lukas Schiesser Causal Inference in the Presence of Hidden Variables using Invariant Prediction Prof. Dr. Nicolai Meinshausen Mar-2017
Abstract: This thesis extends causal inference using invariant prediction based on the ideas introduced by Peters, Bühlmann, and Meinshausen (2015) to settings allowing the presence of hidden variables. Invariant causal prediction exploits that given different experimental environments resulting e.g. from interventions on variables, the predictions from a causal model will be invariant. Hence the causal model has to be among those models fulfilling such an invariance property, or is accepted with high probability in the context of a statistical test for a corresponding hypothesis. A rather general linear model with hidden variables is introduced and an invariant causal prediction framework for such models is established. Testing the invariance assumption is then reformulated to a quadratically constrained quadratic program which in general is non-convex and therefore does not necessarily have an exact solution. Thus, the optimization problem is relaxed to the semi-definite programming (SDP) framework and its solution can then be approximated or sometimes even obtained exactly in polynomial time. One main focus of the thesis lies on describing different approaches to apply SDP relaxations to solve the non-convex optimization problem. This provides specific methods to obtain confidence statements for the causal relationships in such models, namely for the set of causal predictors and for their causal coefficients. These are applied to simulated and real world data and numerical experiments are conducted to study the empirical properties of the developed approaches.
David Zhao Scattering Convolution Networks and PCA Networks for Image Processing Prof. Dr. Nicolai Meinshausen Feb-2017
Abstract: The convolutional neural network's defining principle of parameter sharing over shifting receptive fields makes it well-suited for image processing tasks, as this structure enforces both sparsity and invariance to translations and deformations. However, neural networks are not theoretically well understood, and their standard training method involves an NP-hard non-convex optimization. In this thesis, we explore two alternative models for image processing: the scattering convolution network (SCNet) of Bruna and Mallat (2013) and the principal component analysis network (PCANet) of Chan et al. (2015). Both models use sets of transformations that are fully predetermined, while maintaining the benefits of a convolutional structure. SCNet is built from layers of wavelet transforms, and PCANet is built from layers of PCA-extracted filters. SCNet and PCANet can be thought of as elaborate pre-processing steps that transform images into more expressive feature vectors. To obtain class predictions, we run a classification algorithm on these features. Four types of classifiers are considered in this thesis: generative PCA classifiers, linear and rbf kernel SVM, multiclass logistic regression with lasso, and random forest for classification trees. Experiments on the MNIST dataset show that 2-layer SCNet and 2-layer PCANet consistently outperform a comparable convolutional neural network with 2 hidden layers. We also test variations on the MNIST dataset and on the PCANet filters.
Gabriel Espadas Parameter estimation and uncertainty description for state-space models Dr. Markus Kalisch Feb-2017
Abstract:
The present work seeks to address the statistical problem of non-linear regression, also known as calibration, for state-space models. In classical literature, for example in G. A. F. Seber (1988), Douglas M. Bates (1988) or Gallant (1987), such problem has been studied almost entirely from either a Frequentist or a Bayesian perspective. Here, we present the theory that support the basic models from both frameworks and carefully expose the probabilistic background needed for the use of transformations and the introduction of autocorrelation in the stochastic models. Furthermore, we demonstrate in detail the application of the methods to a real-world study case.
Keywords: State-space models, Non-linear regression, Parameter estimation, Frequentist estimation, Bayesian estimation, MCMC, Metropolis Hasting Algorithm, Gibbs Sampler, Autocorrelated errors, Heterosckedastic errors.
Christoph Buck Optimizing complementary surveys for mapping the spatial distribution of Mercury in soils near Visp, Canton of Valais, Switzerland Dr. Andreas Papritz
Dr. Lukas Meier
Feb-2017
Abstract: For a mercury pollution near Visp, Canton Valais, a geostatistical analysis was made for a sub-area of the entire study-area. The aim of the analysis was to predict which parcels have a mercury content over a certain threshold. An analysis was made for two separated
soil layers and a joint 3D-analysis of both soil layers together. It was found that the joint 3D-analysis produces a better prediction in the sub-area.
To make predictions more accurately, more samples must be taken. The aim of the additional sampling design is to reduce false negative decisions. Based on the idea of the paper from Heuvelink et al. (2010) and Marchant et al. (2013), an optimisation algorithm was
successfully implemented. It predicts an optimised sampling design which reduces false negative and false positive decisions. The user can set the para-meters of the loss function for making false negative and false positive decisions. Based on these parameters, the optimisation algorithm computes the expected loss of a design and inves-tigates an optimised sampling design. The implementation was made with conditional simulations followed by
an iteration process, which included kriging predictions, computing of expected loss and spatial simulated annealing.

2016

Student Title Advisor(s) Date
Manuel Schürch High-Dimensional Random Projection Ensemble Methods for Classification Prof. Dr. Peter Bühlmann Nov-2016
Abstract: In this thesis, we investigate random projection ensemble methods for multiclass classification based on the combination of arbitrary base classifiers operating on appropriately chosen low-dimensional random projections of the feature space. These methods are particularly intended for high-dimensional data sets where the dimension of the variables is comparable to or even greater than the number of available training data samples. We extend a recent proposal of Cannings and Samworth (2015) in two directions. First, we generalize their idea for binary classification to multiple classes. Second, we present alternative approaches to their weighted majority voting for the aggregation of the individual predictions in the ensemble to a final assignment. For this newly developed methodology, we provide implementations and an empirical comparison to state-of-the-art methods on synthetic as well as real-world high-dimensional data sets. Its competitive prediction performance underpins the promising direction of aggregating randomized low-dimensional projections. Moreover, we examine analogous ideas for regression and semi-supervised classification.
Fan Wu On Optimal Surface Estimation under Local Stationarity Dr. Rita Ghosh
Dr. Markus Kalisch
Oct-2016
Abstract: Given a spatial dataset, consider a nonparametric regression model where the aim is to estimate the regression surface. By further assuming local stationarity of the error term, estimation of variance of the Priestly-Chao kernel estimator can be done without the estimation of the various nuisance parameters. All the proofs about uniform convergence of terms are already addressed in Ghosh (2015). In this thesis, we use the proved properties and propose a semiparametric algorithm for optimal bandwidth selection. The findings are then applied to a dataset of the Swiss National Forest Inventory (http://www.lfi.ch).
Polina Minkina A new hybrid approach to learning Bayesian networks from observational data Dr. Markus Kalisch
Dr. Jack Kuipers
Sep-2016
Abstract: This work presents a new hybrid approach to learning Bayesian networks from observa- tional data. The method is based on the PC-algorithm combined with a Bayesian style MCMC search. There are several versions of the algorithm presented in this work. Base version of the algorithm suggests to limit the search space with a PC-skeleton and per- form either stochastic MAP search or sampling from the posterior distribution on a reduced search space. While this version yields relatively good results, in some cases the PC algo- rithm eliminates a large part of true positive edges from the search space. To overcome this issue we also suggest an algorithm for iterative expansion of the search space which helps to increase the number of true positives and as a result leads to much better estimates both in terms of skeleton and equivalence class.
We run simulation studies and compare performance of our approach to other algorithms for structure learning, such as PC-algorithm, greedy equivalent search (GES) and max-min hill climbing (MMHC). The advantages of our algorithm are more pronounced in a dense setting. In a sparse setting algorithm performs similarly to GES, but better than PC.
We provide assessments of computational complexity of a new approach, which grows polynomially with the size of network and exponentially with the size of maximal neigh- borhood, which is the main limitation of the method. For the PC-algorithm lower bound for computational complexity also grows exponentially with the size of maximal neighbor- hood, hence we conclude that if PC algorithm is feasible for some network our approach should be feasible too.
Mun Lin Lynette Tay Statistical analysis of multi-model climate projections with a Bayesian hierarchical model Prof. em. Dr. Hans-Rudolf Künsch
Prof. Dr. Peter L. Bühlmann
Aug-2016
Abstract: This thesis applies a Bayesian hierarchical model as developed by Buser et al. (2009), Buser et al. (2010) and Kerkhoff et al. (2015) to heterogeneous multi-model ensembles of global climate models (GCM) and regional climate models (RCM). The Bayesian hierarchical framework is applied to data from the European arm of the project CORDEX and probabilistic projections of future climate are derived from the climate models.
This thesis is also a continuation of the CH2011 initiative which aims to provide scientifically-grounded information on a changing climate in Switzerland to aid decision-making and planning with regard to climate change strategies. It does so by assessing climate change in the course of the 21st century in Switzerland with a focus on projections of temperature and precipitation. Suitable priors for temperature and precipitation data are suggested and probabilistic projections for different regions in Switzerland, different seasons and different emission scenarios are illustrated and explained. Furthermore, a variant on the Bayesian model proposed by Kerkhoff et al. (2015) which weights data from RCMs more equally to their GCMs is introduced and the two models are compared against each other.
Ravi Mishra Gated Recurrent Neural Network Language Models Prof. Dr. Nicolai Meinshausen Aug-2016
Abstract: "Long term dependencies are difficult to learn with gradient descent in standard Recurrent Neural Networks due to vani-shing and exploding gradient problems. Long Short-Term Memory and other gated networks combined with gradient clipping strategies have been successful at addressing these issues. This work provides details on standard RNN and
gated RNN architectures. The focus lies on forward and backward pass using backpropagation through time. We train an implementation of a character level neural network language model on fine food review data. The goal is to model a probability distribution over the next character in a sequence when presented with the sequence of previous charac-ters. The results of our experiments indicate that for large datasets and increasing sequence length gated architectures have better performance than traditional RNNs. This is in line with previous research."
Janine Burren Outlier detection in temperature data by penalized least squares methods Prof. Dr. Nicolai Meinshausen Aug-2016
Abstract: Chernozhukov et al. (2015) proposed a new regularization technique called lava. In contrast to conventional methods like lasso or ridge regression, this method is able to discover signals, which are neither sparse nor dense. It was shown that this method outperforms the conventional methods in simulations. The application on the temperature anomaly data for January in this thesis confirmed this.
The focus of this thesis lies on the comparison of the lasso method, the elastic net method and ridge regression with the lava method in theory and application and can be split into five main parts. Firstly, all considered regularization methods are described for a multiple linear regression setting and are brought into relation in the orthonormal design case. Secondly, for the application on the temperature data the lava method and a corresponding cross-validation approach had to be implemented with R. Thirdly, the given temperature anomaly data (1940 - 2015) is analyzed and ordinary least squares models are fitted on temperature data, which result from a climate model, to assess how good temperature anomaly values can be predicted by the four nearest values. Fourthly, regularized linear regression models are fitted on the climate model data and predictions are made for an observed temperature anomaly data set. For this, a model fitting procedure was determined, which is able to deal with the NA-structure in the observed temperature anomaly data and which has a reasonable computational time. The residuals produced by prediction are analyzed with respect to their spatial, temporal and probabilistic distributions. In addition, the functioning of the regularization methods on the temperature anomaly data is studied for some examples to compare the methods and to understand the distributions of the residuals. In the last part of the thesis, these residuals are used to detect outliers in the temperature anomaly data. An outlier detection procedure is proposed, which takes into account the prediction error of the fitted linear models and the NA-structure in the observational data set. Furthermore, an artificial outlier study is conducted to assess the outlier detection power of the four considered regularization methods.
Elias Bolzern Stochastic Actor Oriented Models: An Approach Towards Consistency and Multi Network Analysis Prof. Dr. Marloes Henriette Maathuis Jul-2016
Abstract: Stochastic actor oriented models allow to describe longitudinal social networks, i.e., social networks observed at various time points. This model can be fitted either by a method of moments approach or a maximum likelihood approach.
In this thesis we discuss two topics. Firstly, up to now, there exists no proof for the con- sistency of the method of moments estimator. We discuss an approach that could lead to a consistency proof.
Secondly, the existing theory allows us to examine only a single social network. We want to examine the common behaviour that underlies several longitudinal networks. This allows us to gain deeper insights in the general behaviour of such networks. We propose to detect the commonalities by considering maximin-effects, which can be estimated by a magging type estimator. We will call our new estimator the multi group estimator. Simulations show that the multi group estimator performs well, especially for a large number of observed time points. Furthermore, the estimator has nice properties in terms of computational efficiency.
Solt Kovács Changepoint detection for high-dimensional covariance matrix estimation Peter Bühlmann May-2016
Abstract: In this thesis we pursue the goal of high-dimensional covariance matrix estimation for data with abrupt structural changes. We try to detect these changes and estimate the covariance matrices in the resulting segments. Our approaches closely follow a recent proposal of Leonardi and Bühlmann (2016) for changepoint detection in the case of high-dimensional linear regression. We propose two estimation approaches that directly build up on their regression estimator and a third procedure which is analogous to their regression estimator, but modified to match the likelihood arising in the case of covariance matrices. We mainly focus on the implementation, testing and comparison of these proposals. Moreover, we provide complementaries regarding the relevant literature of covariance matrix estimation and changepoint detection in similar settings, tuning parameter selection, models for simulations and error measures to evaluate performances. We also illustrate the developed methodology on a real-life example of stock returns.
José Luis Hablützel Aceijas Causal Structure Learning and Causal Inference Dr. Markus Kalisch Apr-2016
Abstract: This thesis presents the theory and main ideas behind some of the nowadays most popular methods used for causal structure learning as well as the ICP algorithm, a new algorithm based on a method recently developed at ETH Zurich. Then, we measure and compare the performance of these algorithms in two different ways. In our first measure we consider the probability of each of the considered methods for finding exactly all the parents of a randomly chosen target variable. In our second measure we consider the reliability of each method for not yielding a node as a parent which is not. Hereby, we focus on linear Structural Equation Models (SEM) and restrict ourselves to the situation where no hidden confounders are present. We start reproducing and extending the results given in Peters, Bu ̈hlmann, and Meinshausen (2015) and after that, we change the generation process of the data in several ways in order to conduct further comparisons.
Pascal Kaiser Learning City Structures from Online Maps Markus Kalisch
Martin Jaggi
Thomas Hofmann
Mar-2016
Abstract: Huge amounts of remote sensing data are nowadays publicly available with
applications in a wide range of areas including the automated generation of maps, change detection in biodiversity, monitoring climate change and disaster relief. On the other hand, deep learning with multi-layer neural networks, which is capable of learning complex patterns from huge datasets, has advance greatly over the last few years.

This work presents a method that uses publicly available remote sensing data to generate large and diverse new ground truth datasets, which can be used to train neural networks for the pixel-wise, semantic segmentation of aerial images.

First, new ground truth datasets for three different cities were generated
consisting of very-high resolution (VHR) aerial images with ground sampling
distance on the order of centimeters and corresponding pixel-wise object la-
bels. Both, VHR aerial images and object labels are publicly available and were downloaded from online map services over the internet. Second, the three newly generated ground truth datasets were used to learn the semantic segmentation of aerial image by using fully convo-lutional networks (FCNs), which have been introduced recently for accurate pixel-dense semantic segmentation tasks. Third, two modifications of the base FCN architecture were found that yielded performance improvements. Fourth, an FCN model was trained on huge and diverse ground truth data of the three cities simultaneously and achieved good semantic segmentations of aerial images of a geographic region that has not been used for training.

This work shows that using publicly available remote sensing data can
be used to generate new ground truth datasets that can be used to effec-
tively train neural networks for the semantic segmentation of aerial images.
Moreover, the method presented here allows to generate huge and in partic-
ular diverse ground truth datasets that enable neural networks to generalize
their predictions to geographic regions that have not been used for training.
Sriharsha Challapalli Understanding the intricacies of the PC algorithm and optimising causal structure discovery Markus Kalisch
http://stat.ethz.ch/~kalischm/
Mar-2016
Abstract: The PC algorithm is one of the most notable algorithms in causal structure discovery. Over the years various suggestions have been made to optimize the algorithm further. But there is still scope to probe the intricacies of the algorithm deeper. The current study aims to examine the role of various factors like the number of variables, density in the true graph, use of conditional independence graph and sequence of carrying out conditional independence tests. The outcomes of the study contribute to the optimization of not just the PC algorithm but also causal structure discovery algorithms based on conditional independence tests in general.
The study suggests that skeleton-stable is the best of the studied algorithms for the discov- ery of skeleton. The order-independent option is not the best for causal structure discovery and the BC variant is recommended. The study validates that the sequence of orders of the PC algorithm is integral to causal structure discovery. The study recommends avoiding the use of conditional independence graph for very low values of p and very low densities. Algorithms based on conditional independence tests used in the study must be preferred to those based on greedy equivalent search except for extremely low values of p or extremely high density.
Sonja Meier Causal analysis of proximal and distal factors surrounding the HIV epidemic in Malawi Marloes Maathuis
Olivia Keiser
Mar-2016
Abstract: The HIV epidemic in Malawi is a major cause of mortality and induces a highly adverse impact on Malawi’s health system as well as on its economy. It is therefore the aim of this thesis to identify causal associations between proximal and distal factors that may drive the HIV epidemic. The Malawi Demographic and Health Survey 2010 provides a wide variety of behavioral, socio-economical and structural variables as well as information on the HIV status of more than 12’000 participants. To find and display causal pathways graphical models, such as directed acyclic graphs, are used. Amongst the numerous different causal structure learning methods the RFCI algorithm and the GES algorithm are found to be suitable for the considered dataset. To include the sample weights from the survey some modifications need to be made. The ”weighted“ versions of the two algorithms are repeatedly run on random subsets of all observations to obtain robust estimates. Finally, a summary graph is created, where only edges with a certain frequency are displayed. This analysis is carried out for three different sets of variables. Since the HIV prevalence amongst women is significantly higher than amongst men in Malawi, a stratification by gender provides further insight. The proposed method is able to detect various connections between proximal and distal variables in consideration of the provided sample weights. A group of variables robustly connected with the HIV status was found. However, the proposed method has difficulties determining causal directions as these are not robust under resampling.
Yannick Suter Implementation of different algorithms for biomarker detection and classification in breath analysis using mass spectrometry Marloes Maathuis
Renato Zenobi
Mar-2016
Abstract: We implement different algorithms for biomarker detection and classification for breath analysis studies using ambient ionization mass spectrometry. We test them on two studies done recently in the Zenobi research group at ETH Zürich on chronic obstructive pulmonary disease (COPD) and cystic fibrosis (CF). The studies investigate differences in molecules present in breath due to lung diseases.

The data sets contain a lot of highly correlated variables, due to isotope patterns and biological pathways. We show that this is useful for the interpretation of the results,
but has little effect on both biomarker detection as well as classification.

For biomarker detection, we use the Mann-Whitney U test, as well as subsampling with either the Mann-Whitney U test or the elastic net regression as selection method. For classification, we use prefiltering with the Mann-Whitney U test, followed by modern high-dimensional classification methods.

The best performing methods for both biomarker detection and classification are different for the two studies. Due to time drift effects, no significant molecules were found at an FDR control level of q = 0.05 for the COPD study with the Mann-Whitney U test. For the CF study, 127 molecules were found at an FDR control level of q = 0.05.

For classification, the best performing methods for the COPD study was partial least squares regression followed by linear discriminant analysis (PLS-LDA), with an area under the ROC curve (AUC) value of 0.90. A second study on COPD is used as a validation set, which gives an AUC value of 0.71 for PLS-LDA.\\
Concerning the CF study, the best performing classification method was principal component analysis followed by linear discriminant analysis (PCA-LDA) with an AUC value of 0.73.\\

We show in simulations that hierarchical testing approaches given by Mandozzi (2015) do not work well in our setting.
Zhiying Cui Quantifying Subject Level Uncertainty Through Probabilistic Prediction for Autism Classification Based on fMRI Data Marloes Maathuis
Pegah Kassraian Fard
Feb-2016
Abstract: This thesis aims to quantify the subject level uncertainties of the classification between subjects with and without autism spectrum disorder using a type of brain image data, namely, the resting state functional magnetic resonance imaging data. The concerned subject level uncertainty measure for this study is based on the probabilistic predictions,
and the quality of the former is shown to be entirely dependent on the quality of the latter. A selected subset of the data from the Autism Brain Imaging Data Exchange is used for classification, and the quality of the label and probability predictions of nine conventional classifiers combined with the simple threshold feature selection are evaluated
through cross validation and by various evaluation metrics. The best achieved accuracy is 77% by logistic regression with L1 regularization. The best probability predictions are produced by logistic regression with L1 and L2 regularization for two of the three
probability evaluation metrics, and the best probability predictions are produced by both random forest and extremely randomized trees for the third evaluation metric. Considering both label and probability predictions, the best classifiers for this data set are logistic
regression with L1 and L2 regularization and adaptive boosting. To further improve the probability predictions, two probability calibration methods are respectively applied to each of the above mentioned best classifiers, and in the majority of the twelve examined cases, the probability calibrations make some levels of improvements. Similar classification tasks are also performed on one other autism data set and two additional data sets to examine the performance in different settings.
Jakob A. Dambon Multiple Comparisons with the Best Methods and their Implementations in R Dr. Lukas Meier Feb-2016
Abstract: The simultaneous evaluation of multiple factors is required in many scientific experiments. Multiple comparisons account for the multiplicity and are a useful tool for giving simultaneous inference of those factors. There are several methods for multiple comparisons, in particular the multiple comparisons with the best (MCB), which is our main focus for this thesis. Here, we are trying to find the best treatment in comparison to the others.
The main purpose of this thesis was to implement Edwards-Hsu’s MCB method into R, which is not part of the R package multcomp. Our main achievements of this thesis are step-by-step derivations of the confidence intervals of Edwards-Hsu’s MCB method in the balanced and unbalanced one-way ANOVA model as well as a successful implementation into R.
Maurus Thurneysen Performance Analysis of a Next Generation Sequencing Instrument Markus Kalisch
Harald Quintel
Feb-2016
Abstract: The complexity of processes and data output in molecular diagnostics are growing rapidly. In December 2015 QIAGEN AG entered the market with the first complete workflow in Next Generation Sequencing designed to deliver all the steps from Sample to Insight to the customer. This GeneReader NGS System features built-in sample preparation, sequencing of the genetic code as well as analyses of the gene sequences and produces actionable insights for customers working in diagnostic fields.
The quality and reliability of such a workflow are crucial factors in assuring high performance standards. The statistical analysis of critical steps within the workflow provides a powerful means for achieving this goal. So far, this approach has not been exploited to its fullest in this context. Therefore, the aim of this master thesis in statistics is to analyze the performance of the newly developed GeneReader instrument, which carries out the sequencing substep of the workflow, with statistical learning techniques. Qualitiy Control data from instrument production and data from test campaigns in the field are analyzed by an unsupervised learning approach and then combined into supervised learning problems to predict the performance quality of a GeneReader instrument from its Quality Control data.
It was found that the GeneReader instruments are calibrated well and that their contribution to the variability of the workflow is relatively small. However, the power of this approach was limited due to the small number of true replicates available. Nonetheless, this investigation demonstrates the potential lying in the systematic application of statistical analysis to asses and guarantee high quality and stability in QIAGEN’s development and production processes that is currently largely untapped.
Sven Buchmann High-Dimensional Inference: Presenting the major inference methods, introducing the Unbalanced Multi Sample Splitting Method and comparing all in an Empirical Study Martin Mächler Feb-2016
Abstract: Performing statistical inference in the high dimensional setting is challenging and has become an important task in Statistics over the last decades. In my thesis I first give a selective overview of the high-dimensional inference methods, which have been developed to assign p-values and confidence intervals in linear models, including a graphical survey of every presented inference method. The overview is split in two parts: methods for detecting single predictor variables and methods for detecting groups of predictor variables.
Secondly, I introduce a new inference method in the high-dimensional setting, called Unbalanced Multi Sample Splitting, which is a modification of the Multi Sample Splitting Method of Meinshausen, Meier, and Bühlmann (2009). Furthermore, I prove its family-wise error control. Finally, I perform an Empirical Study using the R package simsalapar, which consists of three parts: designing the simulation study, actually performing the simulation and analyzing the various results.
Jürgen Zell Analyzing growth and mortality of Picea Abies for a growth simulator in Switzerland Martin Mächler Feb-2016
Abstract: The thesis is about modeling growth and mortality of Picea Abies. The data are complex and stem from experimental forest management trials all over Switzerland. In the first part growth was modeled. 65% of the total variation can be explained by many different explanatory variables. The second part is about mortality and contains a logistic regression model, which is compared to a Survival analysis approach.
Marc Stefani Lasso Chain Ladder Constrained Optimization for Claims Reserving Lukas Meier
Jürg Schelldorfer
Feb-2016
Abstract:
The Chain Ladder method is by far the most popular method for predicting non-life claims reserves in the insurance industry. Its simplicity induces two limitations: First, we do not have a robust estimation of old development factors which is caused by only few avail- able observations. Second, the Chain Ladder method is not able to deal with diagonal effects (i.e. claims inflation) which are often present in claims reserving data. Although many research papers present extensions to the classical Chain Ladder method, none has addressed the issue of using constrained optimization with Lasso-type estimators. Lasso- type estimators are primarily attractive for high dimensional statistics and still useful in low-dimensional problems. Either to obtain a smaller set of estimated parameters that exhibits the strongest effects or to obtain a robust estimator which reduces the variability of the estimated model parameters.
Since the Chain Ladder model can be understood as a regression problem, it was possible to develop Lasso-type estimators for three different models: A regression version of the Chain Ladder Time Series Model, an extension which allows modeling diagonal effects and an Overdispersed Poisson Model which also considers diagonal effects. To solve the optimization problems, we build up a regression framework to transform the claims re- serving data into appropriate data matrices. The application for real data sets shows that Lasso-type estimators predict plausible claims reserves. For simulated data sets we often achieve a better prediction accuracy with Lasso-type estimators compared to the Chain Ladder method, especially in situations where Chain Ladder model assumptions are not fulfilled. However, the solution of Lasso-type estimators is sensitive to the choice of the optimal tuning parameter and the model selection criterion. Finally, we estimate the pre- diction accuracy of Lasso-type Chain Ladder estimators via model-based bootstrap. The implementation of the Lasso-type estimators is done in R.
Benjamin Jakob Censored Regression Models Lukas Meier Jan-2016
Abstract: Empirically bounded distributions are investigated and the process of regression is employed on these dependent variables with several independent variables. Different models (censored as well as uncensored) are used and programmed with the programming language R such as the Logit model, the Beta distribution model, the Tree model, the Random Forest, a Censored Gamma model and two slight variations of it.
The conclusion is made that the Censored Gamma model and its extensions proposed by
Sigrist and Stahel (2011) do perform well - but not always - in comparison to the other models and might therefore be an attractive option to further investigate for banks and insurers.

2015

Student Title Advisor(s) Date
Jakob Olbrich Screening Rules for Convex Problems Bernd Gärtner
Peter Bühlmann
Martin Jaggi
Sep-2015
Abstract: This thesis gives a general approach to deriving screening rules for convex optimization problems. It splits up in three steps. As the first step, the Karush-Kuhn-Tucker conditions are used to derive necessary conditions that allow to reduce the problem size. They depend on the optimal solution itself. The second step is to gather information on the optimal solution from a known approximation. In the third and final step the information is used to get conditions that do not depend on the optimal solution, which are then called screening rules. This thesis studies in particular the unit simplex, the unit box and polytopes as domain. The resulting screening rules can be applied to various problems, such as Support Vector Machines (SVM), the Minimum Enclosing Ball (MEB), LASSO problems and logistic regression. The resulting screening rules are compared to existing rules for those problems.
Nicolas Bennett Analysis of High Content RNA Interference Screens at Single Cell Level Peter Bühlmann
Anna Drewek
Aug-2015
Abstract: Infectious diseases are among the leading causes of death worldwide and the evolution of antimicrobial resistance poses a troubling development in cases where our only effective line of defense is based on distribution of antibiotic agents. One possible way out of this problematic situation comes by the alternative approach of host directed therapeutics, which in turn warrants the meticulous study of the human infectome. Therefore, large-scale studies such as genome-wide siRNA knockdown experiments as performed by the InfectX/TargetInfectX consortia are of great importance.

The richness of datasets resulting from image-based high throughput RNAi screens permits a broad range of possible analysis approaches to be employed. The present study investigates cellular phenotypes as induced by gene knock-down, with a focus on the effect of pathogen infection, by applying generalized linear models (GLMs) to single cell measurements. In order to simplify handling of such datasets, an R package is presented, that fetches queried data from a centralized data store and produces data structures, capable of efficiently representing the logic of an assay plate. Convenience functions to preprocess, manipulate and normalize the resulting objects are provided, as is a caching system that helps to significantly speed up common operations.

GLM analysis of phenotypic response from knockdown and infection was attempted, but did not yield satisfactory results, most probably due to issues with data normalization. In order to facilitate the simultaneous study of measurements originating from multiple assay plates, several normalization schemes were explored, including Z- and B-scoring, as well as modeling technical artifacts with multivariate adaptive regression splines (MARS). While some improvements of data quality were observed, experimental sources of error could not be sufficiently controlled for meaningful GLM regression.
Marco Eigenmann A Score-Based Method for Inferrig Structural Equation Models with Additive NoiseP Peter Bühlmann Aug-2015
Abstract: We implement and analyse a new score-based algorithm for inferring linear structural equation models with a mixture of both, Gaussian and non-Gaussian distributed additive noise. After introducing some well-known algorithms providing theory, pseudo-codes, main advantages and disadvantages as well as some examples, we extensively cover the technical part which endorses the ideas behind our new algorithm. Finally, we present our algorithm in great detail describing its R implementation and showing its performance compared to the algorithms introduced in the previous chapters.
Patrick Welti Analysis of the Empirical Spectral Distribution of a Class of Large Dimensional Random Matrices with the Aid of the Stieltjes Transform Sara van de Geer
Alan Muro Jiminez
Aug-2015
Abstract: tba
Paweł Morzywołek Non-parametric Methods for Estimation of Hawkes Process for High-frequency Financial Data Peter Bühlmann
Vladimir Filimonov
Didier Sornette
Aug-2015
Abstract: Due to its ability to represent clustered data well the popularity of the selfexcited Hawkes model has steadily grown in recent years. After originally being applied for earthquake prediction it has been also used to anticipate flash crashes in finance, epidemic type of behaviour in social media such as Twitter and YouTube or criminality outbursts in big cities.
The aim of this work is to conduct a comprehensive comparison study of the
existing non-parametric techniques for estimation of the Hawkes model, which
without making any a priori assumptions on the correlation structure of the
observables, provide us insights into the data. To the best of my knowledge
such work has not been done so far. The first considered method is the widely
used in non-parametric statistics EM Algorithm, adjusted to the case of a
Hawkes process. The second procedure is based on the estimation of a
conditional expectation of the Hawkes model’s counting process and then
solving a Wiener-Hopf type integral equation to obtain the kernel function of the model. The last estimation technique uses representation of the Hawkes model as an integer-valued autoregressive model and subsequently applies tools from theory of time series to obtain parameters of the model.
The methods were tested on synthetic data generated from the Hawkes model
with different kernels and different parameters. I investigated how the size of the sample and the overlapping of point clusters influences performance of
different estimation methods. When conducting the analysis, I did not restrict myself only to the case of the most commonly used exponential and power law kernels, but also considered less typical step and cut-off kernels. After the comparison on synthetic data has been accomplished I proceeded with
empirical data analysis. For this purpose I tested the estimation methods on the high-frequency data of price changes of E-mini S&P 500 and Brent Crude futures contracts.
Philip Berntsen Particle filter adapted to jump-diffusion model of bubbles and crashes with non-local crash-hazard rate estimation Markus Kalisch
Didier Sornette
Yannick Malevergne
Jul-2015
Abstract: Crashes in the financial sector probably represent the most striking events among all possible extreme phenomena. The impact of the crises have become more severe and their arrivals more frequent. The most recent financial crises shed fresh light on the importance of identifying and understanding financial bubbles and crashes.
The model developed by Malevergne and Sornette (2014) aims at describing the
dynamics of the underlying occurrences and probability of crashes. A bubble in this work is synonymous with prices growing at a higher rate than what can be expected as normal growth over the same time period. A non-local estimation of the crash hazard
rate takes into account unsustainable price growth, and increases as the spread, between a proxy for the fundamental value and the market price becomes greater.
The historical evaluation of the jump risk is unique and expands the understanding of crash probability dynamics assumed embedded in financial log-return data.
The present work is mainly concerned with developing fast sequential Monte Carlo methods, using C++. The algorithms are developed for learning about unobserved shocks from discretely realized prices for the model introduced by Malevergne and Sornette (2014). In particular, we show how the best performing filter - auxiliary particle filter - is derived for the model at hand. All codes are accessible in the appendix for reproducibility and research extensions.
In addition, we show how the filter can be used for calibration of the model at
hand. The estimation of the parameters, however, is shown to be difficult.
Jakub Smajek Causal inference beyond adjustment Markus Kalisch Jul-2015
Abstract: Covariate adjustment is one of the most popular and widely used techniques to estimate causal effects. The method is easy to use, has a well-understood theory and can be combined with other statistical techniques for efficient estimation of a given causal effect. The problem is, that the covariate adjustment method is not complete, in the sense that it may not identify a causal effect even if it is identifiable by some other methods. The first goal of the thesis is to demonstrate the above mentioned problem and present some alternative techniques, like the instrumental variables technique and a new identification method, that can be useful in estimation of causal effects (chapter 2). The next goal and the main theme of the thesis is to answer a question: "How restrictive is it if we restrict causal inference to adjustment methods?". The third chapter tries to answer this question from a theoretical perspective for single nodes X and Y. It presents important results from other authors and generalize some of them for two types of graphs: acyclic directed mixed graphs (ADMGs or latent projections) and maximal ancestral graphs (MAGs). The chapter shows, that we cannot lose a possibility to identify a causal effect by covariate adjustment by a conversion from a DAG to the corresponding latent projection and provides a criterion that characterizes, when a given causal effect is identifiable at all (by any method), but not by covariate adjustment in an ADMG G. It also shows, that a possibility to estimate a causal effect can be lost purely due to a conversion from a latent projection to a corresponding MAG and provides a criterion that specifies when it happens. Moreover, the third chapter provides a necessary, sufficient and constructive criterion to form an adjustment set in a given MAG M, if X and Y are single variables. Finally, partially based on the theoretical results derived earlier in the thesis, the question is addressed in a simulation study in chapter 4. The chapter describes implementation issues, methodology and several different experiments. The experiments concentrate on a comparison of the complete identification algorithm and the covariate adjustment method in terms of proportions of identifiable causal effects. The comparison on uniformly sampled ADMGs shows a big advantage of the former method. It turns out however, that the difference is mainly caused by some simple cases that can be easily identified. Such an approach leads to the simple but very effective improvement of the covariate adjustment method, that can significantly increase the proportion of identifiable causal effects. Finally, an experiment that shows how much do we lose on a conversion from an ADMG to a MAG is performed. The problem is especially visible if we restrict the analysis to graphs that contain a causal path from X to Y.
Lukas Tuggener Analysis off Cross-Over Trials Markus Kalisch Jul-2015
Abstract: The goal of this thesis is to give the reader an introduction to cross-over trials. The first chapter explains the most basic cross-over design.
Using this design as an illustration it presents the necessary theory to analyse cross-over trials. It shows how this basic design is weak in many situations and introduces designs which are more versatile. There are three computer simulations which help building an intuitive understanding of cross-over design.
The most important insight from this thesis is that a good design choice is is always a multifactorial trade-off between subject recruiting, study duration and design complexity.
If available, it takes information about the expected carry-over behaviour and the structure of the between- and within-subjects variability into account.
Maria Elisabetta Ghisu A comparative study of Sparse PCA with extensions to Sparse CCA Marloes Maathuis Jul-2015
Abstract: In this thesis we compare different approaches to sparse principal component analysis (sparse PCA) and then extend our investigation to sparse canonical correlation analysis (sparse CCA).

First, we study sparse PCA methods, where regularization techniques are included in classical PCA to obtain sparse loadings. We compare different formulations by analyzing theoretical foundations and algorithms. Moreover, we carry out simulation studies to evaluate the performance in a wide variety of scenarios. The optimal choice of the method depends on the objective and on the specific parameters combination. Our results suggest that the SPC \citep[]{Witt09} approach usually outperforms the other techniques in recovering the true structure of the loadings, although the angle between true and estimated vector is generally high.

Subsequently, we examine the closely related problem of sparse CCA, where sparsity is imposed on the canonical correlation vectors. After a theoretical study of the methods, we run simulations to assess their quality. When the covariance matrices of the two sets of variables are not nearly diagonal, CAPIT \citep[]{chen13} shows higher accuracy; otherwise, the performances are similar.

Finally, we consider applications of both sparse PCA and sparse CCA to real data sets, obtaining satisfactory results in most of the situations.
Xiao Ye Zhan Modelling Operational Loss Event Frequencies Marloes Maathuis
Michael Amrein
Jul-2015
Abstract: In this paper we study the application of count data modelling approaches to monthly counts of operational risk events, that have been recorded over 13 years from UBS. Assuming that the underlying distribution of the counts is Poisson, nonparametric and parametric regressions and a time series model are considered here. A mean-matching variance stabilizing transformation (VST) is used to facilitate the nonparametric Poisson regression and reduce the problem to a homoscedastic Gaussian regression one. The Poisson GLM regression and the generalized linear autoregressive moving average (GLARMA) model are applied to investigate the relationship of the number of operational losses observed with exogenous variables, and the dependence structure in the data. Our analysis shows significant connections between the loss count data and the financial and economic drivers. Notable serial correlations are also found in the data, with special attention paid to the Poisson distribution assumption and the over-dispersion issue. Simulation experiments are also provided to examine numerical properties of the estimators.
Marcos Felipe Monteiro Freire Ribeiro Learning with Dictionaries Nicolai Meinshausen Jul-2015
Abstract: The method of dictionary learning was introduced by Olshausen and Field (1997) as a model for images based on the primary visual cortex. It has been successfully used for representing sensory data like images and audio, also providing an explanation for many observed properties in the response of cortical simple cells. In this thesis, we will show that the method can also be derived from an information theoretical point of view. The approach is similar to Bell and Sejnowski (1995) but substitutes the framework of neural networks by a probabilistic one. We also discuss how the learned representations can be used for classification and apply the theoretical results to two real world problems. In the first problem, we analyse GPS data in order to characterize driving styles. In the second, we analyse fundus images of the eye in order to diagnose diabetic retinopathy.
Oxana Storozhenko Maximin effects with tree ensembles Nicolai Meinshausen Jul-2015
Abstract: Non-parametric models, such as regression trees, are often used as a primary estimation method in prediction problems. Fitting the trees requires virtually no assumptions about the data, the learning algorithm requires almost no tuning and non-linear relationships in the data are handled well. The flexibility of trees has been exploited in ensemble learning, where the members of an ensemble are the trees t to different samples of the training data. One of the most popular o-the shelf prediction algorithms is random forest (Breiman (2001)), that constructs an ensemble of randomised trees trained on bootstrap samples of the data and averages over the predictions made by each tree. We propose to extend the aforementioned algorithm for the prediction problems of inhomogeneous data. In particular the estimators in the ensemble can be trained on different groups of the training data, as opposed to perturbation of the dataset with bootstrap sampling. If the data has outliers, contaminations, time-varying or temporary effects, that are present locally, dividing the dataset into groups in a sequential manner outputs more diverse estimators. Another adjustment in the context of inhomogeneous data is finding a vector of weights for the estimators in the ensemble, such that the future predictions are optimal whatever group the new data point comes from. Bühlmann and Meinshausen (2014) proposed to minimise the L2-norm of the convex combination of the fitted values of the estimators, and use the resulting weights in order to maximise the minimum explained variance in every group. This scheme is called maximin aggregation and we show how it works for inhomogeneous data.


Teja Turk Comparison of Con fidence and Prediction Interval Approaches in Nonlinear Mixed-Eff ects Models Lukas Meier Jun-2015
Abstract: In this study we aim to assess the performance of various approaches for confidence and prediction intervals in single level nonlinear mixed-effects models. The evaluation is based on simulated samples of coverage rates for 13 nonlinear functions.

The bootstrap confidence intervals are constructed from the parametrically, nonparametrically and case resampled datasets. In addition, the confidence intervals from intervals function and the Wald confidence intervals are included in the comparison. The performance of all the methods is carried out for all three types of parameters: the fixed effects, variance-covariance components and the within-group standard deviation. Finally, the Wald confidence intervals are improved by empirically adjusting the degrees of freedom of the t-statistic. In general, the simulation speaks in favour of the non-bootstrap approaches.

The prediction intervals methods are based on the Wald's test and derived separately for observed and unobserved groups. The variance of the prediction error derivation is based on various linear approximations of the prediction error. In pairwise comparisons with their bootstrap variants no apparent differences are detected. When their performance is compared with the prediction intervals based on the bootstrap prediction error distribution, the latter exhibits coverage rates closer to the true nominal values.
Caroline Matthis Classifying Autistic Versus Typically Developing Subjects Based on Resting State fMRI Data Marloes Maathuis
Nicole Wenderoth
Pegah Kassraian Fard
Apr-2015
Abstract: In this thesis we investigate several classifiers to discriminate between autistic and typically developing subjects based on resting state fMRI data. We use data from the Autism Brain Imaging Data Exchange (ABIDE) database which consists of fMRI scans of 1112 subjects. First, we implement the Leave-One-Out (LOO) classifier designed by Anderson et al. [2] which attains an accuracy of 60 %. Next, we run various conventional classifiers on the data and compare their predictive performance to the LOO classifier. Most of the examined classifiers perform at least as well as the LOO classifier; a flexible formulation of discriminant analysis reaches an accuracy of 76 %. In a last step we attempt to attach a subject-specific uncertainty to the classification. Based on work by Fraley and Raftery [18] the posterior distributions of the flexible formulation of discriminant analysis are used to model these uncertainties. In a short simulation study we illustrate the informative value of the estimated uncertainties, given that the distributional assumptions are valid. Then, this uncertainty model is evaluated on the data, yielding satisfactory results.
Julia Brandenberg Statistical Analysis of Global Phytoplankton Biogeography in Mechanistic Models and Observations Nicolai Meinshausen Apr-2015
Abstract: After five months of intense work, I am proud to submit my Master’s thesis. I would like to thank my advisor Dr. Meike Vogt for her constant support, her reliability and motivation and congratulate her to her baby, which was one of the highlights during this period. Besides many fruitful on-topic discussions, I enjoyed the off-topic horse-related chats with her. Special thanks to my advisor Prof. Dr. Nicolai Meinshausen, whos support was competent, patient and committed. During several meetings I was able to deepen my statistical understanding and his versatile approaches for problem solving motivated me to try different techniques. I would like to thank Prof. Dr. Nicolas Gruber for his advice and for having me in his group the past months. Dr. Thomas Froelicher supported me in the interpretation of my results and was my contact during Meikes absence. Dr. Charlotte Laufkoetter and Dr. Chantal Swan contributed to this work by providing me with data and information concerning it. Last but not least, I would like to thank all my colleagues from the environmental physics group for their advices and contributions and especially for making this time such a pleasure to think back to.
At this point I would like to mention my parents, Barbara and Andreas Brandenberg and thank them for the unconditional support over the last years. Their love and faith in me contributed greatly to all my achievements and made me to the person I am today. Thank you!
Sonja Gassner Fitting and Learning of Bow-free Acyclic Path Diagram Models Marloes Maathuis
Preetam Nandy Christopher Nowzohour
Mar-2015
Abstract: We consider the problem of learning causal structures from observational data, when the data are generated from a linear structural equation model. Under the assumption that the path diagram of the model is acyclic and the error variables are uncorrelated, one can apply a search and score technique to learn the underlying structure. However, the assumption of uncorrelated errors is often too restrictive. In this thesis we consider a more general subclass of linear structural equation models for structure learning, where correlation of the errors is allowed unless the corresponding random variables are in a direct causal relation. These models are called bow-free acyclic path diagram (BAP) models. BAP models are almost everywhere identifiable, which is in general not ensured for linear structural equation models with arbitrary correlation patterns. First, we consider two methods for estimating the parameters in BAP models. One results from the proof of the identifiability of BAP models and is implemented in this thesis. The other one is an iterative partial maximization algorithm for maximum likelihood estimation, for which an implementation was already available. Next, we use these two fitting methods in a greedy search algorithm for structure learning, which repeatedly fits and scores BAP models and chooses the model with the highest score. Finally, we evaluate the performance of these methods in a small simulation study.
Carolina Maestri Two approaches of causal inference for time series data Marloes Maathuis Mar-2015
Abstract: In this Master's thesis two approaches of causal inference for time series data are studied. The first one addresses non-linear deterministic systems, while the second one is designed for linear stochastic systems. For both methods the theoretical foundations are presented and the algorithms are analysed and described in detail. Applications to real data are also shown and various simulations are run to investigate the performances of the algorithms in different situations.
Kari Kolbeinsson Model Selection for Outcome Predictions of Professional Football Matches Markus Kalisch Mar-2015
Abstract: The subject of this thesis is to model and predict the outcome of professional football matches played in the premier leagues around the globe. For this purpose a number of statistical learning methods are employed and models fit to publicly available data.After gathering the simple data from the relevant websites, numerous variables are constructed to further capture the relative strength of each team. The second chapter of the thesis is dedicated to explaining the dataset constructed from these variables and their relationship with the response variables. The statistical learning commences in the third chapter by fitting classification models to a training subset of the data. For these models the response variable is categorical, taking on three values, a win for either team or a draw. The models considered are linear and quadratic discriminant analysis, k-nearest neighbours, random forest, boosted classification trees and support vector machines. For each model, the fit to the training set is analysed using an estimation of the misclassification rate and calibration plots. The fourth chapter explores the use of regression models for this task. The response variable now is either the goals scored by each team or the goal difference. Models fit to the goal difference of each team are then combined for one unified prediction of the goal difference. The models tried for this task are generalized linear models, random forest and boosted regression trees. Prediction accuracies of the best performing models in these two chapters are the subject of the fifth and final results chapter. The goal count estimations of the regression models are translated into the same categorical results as were modelled by the classification models for comparison between all methods. The best performing model was found to be the boosted classification trees with a prediction accuracy of 50.5%.
Lin Zhu Confidence Curves in Medical Research Leonhard Held 
Markus Kalisch
Mar-2015
Abstract: This thesis briefly reviews the developments of confidence distributions. It introduces the modern definitions of a confidence distribution, confidence density and confidence curve along with point estimators based on a confidence distribution. Then different constructing methods of confidence curves are given for cases without nuisance parameters and cases with nuisance parameters, respectively. The pivotal approach and deviance-based approach are applied to both cases with and without nuisance parameters. The half-correction approach is applied to discrete data. The simulation or bootstrap approach is applied to cases with nuisance parameters. We take exponential distribution, binomial distribution, Weibull distribution, gamma distribution and the comparison of two binomials as examples to study the difference with each approach.
Anita Kaufmann Crime Linkage Jacob De Zoete
Marloes Maathuis
Mar-2015
Abstract: Crime Linkage studies settings where similarities among several crimes suggest execution by the same offender. Due to their linkage, evidence of an individual case becomes relevant for the entire group of crimes. After giving a short introduction of the subject of Bayesian Networks we demonstrate how they can be used to model crime linkage settings. In a next step, a review of two research papers concerning this topic is provided. For a better understanding we outline the most important parts in detail. Moreover, the papers in focus only present examples for a small number of crimes since the complexity increases exponentially with the number of crimes considered. We aim at avoiding the fast increase in complexity by proposing simplifying adaptions of the Bayesian Network. Furthermore, we restrict the number of different offenders to m < n, where n is the number of crimes considered, since it is not very probable to have as many offenders as crimes in a crime linkage setting. The consequence is a reduction of the number of offender configurations which should result in a simplification of the computation of settings with a larger number of crimes. We propose two possibilities to find a reasonable value for m: The problem we encounter is that our adapted function for n crimes with at most m different offenders is not efficient and hence cannot be used for larger numbers of crimes. Nonetheless, comparing the two different approaches for small numbers of crimes we get very similar results. Consequently, the second approach is, at least for small numbers of crimes, faster and thus better suited for determining the number m of different offenders which have to be taken into consideration. In order to maintain its relevance also for larger number of crimes we furthermore propose a possible extension of the second approach.
Sheng Chen Random Projection in clustering classification and regression Markus Kalisch Feb-2015
Abstract: This thesis studies the performance of Random Projection - one of the relatively new dimensionality reduction techniques - when applied to the area of clustering, classification and regression, through reproducing or testing the results in three papers by Boutsidis and Zouzias (2010), Paul and Boutsidis (2013) and Kaban (2014), each from one of the three domains.Firstly, a review of the Johnson-Lindenstrauss lemma, as well as its extensions is given, which is the theoretical foundation of Random Projection. Besides the early subgaussian and sparse matrices, new random matrices based on the Fourier transform are developed for faster computation. Secondly, the experiment in Random Projection-based K-means (Boutsidis and Zouzias, 2010) is reproduced. The result shows when the cardinality of the embedded space is large, the RP-based K-means is comparable to the K-means with original data in terms of misclassification rate. Comparisons are drawn between RP, PCA and LS and finds that PCA outperforms RP in terms of misclassification rate, but RP needs only 19% of the time needed by PCA. Thirdly, for classification, part of the experiment in RP-based Support Vector Machine (Paul and Boutsidis, 2013) is run. The calculation shows that the misclassification rate of the RP-based SVM is not significantly larger than the SVM in the original space. However, the margin γ is significantly smaller. In the area of regression, Kaban (2014) proposed an upper bound on the excess risk of the OLS estimator in the embedded space, and proved that Random Projection applies to a larger group of matrices, whose entries have mean 0, unit variance, symmetric distribution and finite fourth moment. The last part of the thesis runs experiment to examine the necessity of these assumptions upon random matrices and finds that each of them could be loosened without breaking the bounds.
Ioan Gabriel Bucur Structural Intervention Distance for Maximal Ancestral Graphs Markus Kalisch Jan-2015
Abstract: In the process of causal inference, we are interested in accurately learning the causal structure of a data generating process from observational data, so as to correctly predict the effect of interventions on variables. In order to assess how accurate the output of an estimation method is, we would like to be able to compare causal structures in terms of their causal inference statements. Peters and Bühlmann have proposed the Structural Intervention Distance as a premetric between DAGs that provides a partial solution to the issue. However, the causal DAG may not be able to predict certain intervention effects in the presence of confounders. In this paper, we attempt to emulate the results of Peters and Bühlmann in a more realistic setting, where we observe only part of all relevant variables. We propose a new premetric, the Structural Intervention Distance for Maximal Ancestral Graphs (SIDM). A MAG is a causal structure which, unlike the DAG, is closed under marginalisation and can incorporate uncertainty about the presence of latent confounders. The SIDM allows us, under the assumption of no selection bias, to compare and contrast two MAGs based on their capacity for causal inference. The SIDM is consistent with the SID in its approach and provides valuable additional information to other metrics.

2014

Student Title Advisor(s) Date
Lukas Weber Model selection techniques for detection of differential gene splicing Mark Robinson
Peter Bühlmann
Sep-2014
Abstract:  Alternative splicing during the messenger RNA (mRNA) transcription stage of gene expression can generate vast sets of possible mRNA isoforms from individual genes. These mRNA isoforms can create functionally distinct proteins during subsequent protein translation, explaining the enormous diversity of proteins in organisms such as humans. Differential splicing experiments aim to use microarray or RNA sequencing (RNA-seq) technologies to detect genes exhibiting differences in splicing patterns between groups of biological samples, for example comparing diseased versus healthy samples, or treated versus untreated. In this thesis, we have tested whether model selection  techniques can be used to improve the performance of existing statistical methods to detect differential gene splicing in RNA-seq data sets. The new methods were successful, and have been implemented as an R package available on GitHub.
Lucas Enz The Lasso and Modifications to Control the False Discovery Rate Sara van de Geer
Benjamin Stucky
Aug-2014
Abstract: Nowadays, a huge focus is set on high dimensional data sets where the number of predictors $p$ is a lot larger than the amount of observations $n$. One example is detecting which genes are responsible for a specific biological function of our body. Due to the fact that it causes very high costs to measure some microarray data, we normally have at most a few hundred observations, but thousands of possible genes which could control the instance we want to research. Because we have a lot more predictor variables than observations, we cannot compute a unique solution. cite{Tibshirani96} introduced a method called Lasso, which deals precisely with this problem and sets some variables exactly to zero. In other words, the Lasso can ban some predictors from our model. Nevertheless, the Lasso sometimes picks a lot of predictor variables which are in truth not responsible for the observed process. As a consequence, the false discovery rate (FDR), defined as the expected proportion of irrelevant predictor variables among all selected variables, is not even controlled in some models.In this paper we will focus on a new procedure which controls the FDR better, but does not ban too many predictor variables which are actually relevant for the process, i.e. we do not lose too much power. This paper is mainly based on the works of cite{Candes13} (and an updated version cite{Candes2}) about the procedure they introduced, called SLOPE. We analyze the improvement of SLOPE in high dimensional examples for the linear model with Gaussian and orthogonal design matrices. In the end, we adapt the idea of SLOPE to the group Lasso, which is very useful if we can group the predictor variables and select or ban a whole group of regression variables. We present an extension of the group Lasso named SIPE and test its skills in sparse scenarios via simulation study.
Hannes Toggenburger Joint Modelling of Repeated Measurement and Time-to-Event Data, with Applications to Data from the IeDEA-SA Marloes Maathuis 
Matthias  Egger
Klea Panavidou
Aug-2014
Abstract: After the start of ART the low CD4 count in a HIV positive patient typically recovers up to a regular level. By measuring the CD4 repeatedly, a patient's individual CD4 trajectory is known at a discrete set of times. Different approaches were made to model CD4 counts to obtain continuous trajectories. If ART is not working any more, the CD4 will start its decay anew. Such a treatment failure, or in particular the time of its occurrence, is modelled by survival models. In this work, the repeated measurement outcomes of the CD4 are modelled with a nonlinear mixed-effects (NLME) model with three random-effects. The time-to-event data is modelled with a log-normal accelerated failure time (AFT) model. These two models are merged into a random-effects-dependent joint model. Broadly speaking this means that the random-effects of the NLME model are used as continuous predictors in the AFT model. Different approaches, and their pitfalls, to estimate the involved parameters via the maximum likelihood method are discussed. The final model is applied to real data from the International epidemiologic Databases to Evaluate AIDS in sub-Saharan Africa (IeDEA-SA).
Andreas Puccio A review of two model-based spike sorting methods Marloes Maathuis Aug-2014
Abstract: In modern neuroscience, extracellular recordings play an important role in the analysis of neuron activity. Whereas earlier experiments were based on single electrodes, modern settings consist of a large number of channels that record data from multiple cells simultaneously.In such settings, every electrode will record action potentials from all nearby neurons, visible as spikes whose shape depends on various factors. The problem of spike sorting, in a nutshell, is to detect the occurrence of such spikes in multi-electrode voltage recordings and to classify them, i.e., to identify the corresponding neurons.A widely used approach is a so-called clustering method consisting of a thresholding step to detect the occurrence of a spike, a feature-reduction step (e.g. PCA) and a classification ("sorting") step based on these features. However, this method has several disadvantages, an important one being the inability to handle overlapping spikes.After an introduction into the problem of spike sorting and the data encountered in such settings, we review two different modern spike sorting frameworks, one being binary pursuit (Pillow, Shlens, Chichilnisky, and Simoncelli, 2013) and the other one relying on a method called continuous basis pursuit (Ekanadham, Tranchina, and Simoncelli, 2014). These frameworks use a statistical model for the recorded voltage trace and do not rely on a clustering procedure for the spike train estimation. We present an implementation of binary pursuit in MATLAB, conduct a performance assessment of this algorithm using simulated data and identify advantages and disadvantages of model-based spike sorting algorithms.
Laura Casalena Statistical inference for the inverse covariance matrix in high-dimensional settings Sara van de Geer
Jana Jankova
Aug-2014
Abstract: The focus of this work is the problem of estimating the inverse covariance matrix Θ∗ in a high-dimensional setting. High-dimensionality is reflected by allowing p to grow as a function of n, but for our results to hold we require p = o(exp(n)). We will propose four different estimation methods for Θ∗ and study their asymptotic properties under appropriate distributional assumptions as well as model assumptions on the concentration matrix Θ∗. In particular, whenever it is possible, we will give rates of convergence in various matrix norms and state results which prove asymptotic normality of each individual element Θ∗ij. Consequently, we will construct asymptotic confidence intervals for Θ∗ij. Finally, we will illustrate the theoretical results through numerical simulations.
Fabio Ghielmetti  Causal Effect Estimation of Structural Pricing Changes in the Airline Industry Peter Bühlmann
Karl Isler
Aug-2014
Abstract: Pricing changes in the Airline Industry occur on a daily basis, their revenue effects are difficult to measure though. This problem, namely inferring the causal effect of a pricing change on the revenue can be modeled by a structural equation model (SEM) and a causal graph. A lately published paper (Ernest and Bu ̈hlmann (2014)) showed that causal effects within SEMs can directly be inferred out of an additive model, even if the true underlying relationships are not additive. After introducing the subject of Airline Revenue Management and the mathematical tools to infer causal effects, this recent result is applied to actual airline data. Following the identification of the corresponding causal graph, multiple additive models are fitted: with several levels of data aggregation and a comparison of different subsets, the sensitivity of the causal effect estimation is tested. Finally, the results are discussed and interpreted.
Shu Li Causal Reasoning in Time Series Analysis through Additive Regression Peter Bühlmann
Jan Ernest
Aug-2014
Abstract: Causal inference has evolved from its fractionized early days towards a more unified and formal framework with diverse applications ranging from brain mapping to the modeling of gene regulatory pathways. In a time series setting causal reasoning revolves predominantly around Granger causality, disregarding recent advances in structural equation or graphical modelling. We use the former to explore the potential of intervention-based causal inference from observational time series data. Drawing its inspiration from a recent result by Ernest and Bühlmann (2014), we propose a novel approach for inferring causal effects in AR(p) models: Addtime, short for additive regression in time series analysis. Our method is theoretically sound, even for nonlinear or non-additive AR(p) models and computationally efficient, requiring on average 0.5s per intervention and enabling potentially high-dimensional applications. Empirically, Addtime is able to recover the true effect in simulated and real data. Within the scope of (nonlinear) time series the effect of interventions is largely unexplored. Our approach can be regarded as a safe benchmark for univariate time series and generalizes to the multivariate case without further constraints.
Anja Franceschetti Alternatives to Generalized Linear Models in Non-Life Pricing Lukas Meier
Christoph Buser
Jul-2014
Abstract:
Christina Heinze Random Projections in High-dimensional and Large-scale Linear Regression Nicolai Meinshausen Jul-2014
Abstract: We study the use of Johnson-Lindenstrauss random projections in different regression settings. First, we examine the high-dimensional case, where the number of variables p largely exceeds the number of observations n. Specifically, we consider so-called compressed least-squares regression (CLSR). CLSR reduces the dimensionality of the data by a random projection before applying ordinary least squares regression on this compressed data set. We perform an empirical comparison of predictive performance between CLSR and other widely used methods for high-dimensional least squares estimation, such as ridge regression, principal component and the Lasso. Our results suggest that an aggregation scheme which averages the predictions of CLSR over a number of independent random projections can greatly improve predictive accuracy. This extension of CLSR performs similarly to the competing methods on a variety of real data sets. Subsequently, we experiment with two variable importance measures where one exploits the fact that omitting variables in the original high-dimensional data set does not necessarily have to change the projection dimension. This allows for the estimated regression coefficients to be directly compared in the compressed space. The second statistic is based on the change in mean squared prediction error. For both importance measures we explore whether the importance of clusters of highly correlated variables can be identified correctly. We find that the procedures work reasonably well for synthetic data sets with large signal-to-noise ratios (SNRs) and no inter-cluster correlations. However, the randomness in the projection matrix makes detection difficult for data sets with low SNRs. Also, different correlation structures between clusters pose significant challenges. Lastly, we look at the large-scale setting where both p and n are very large, and possibly p > n. We develop a distributed algorithm, LOCO, for large-scale ridge regression. Specifically, LOCO randomly assigns variables to different processing units. The dependencies between variables are preserved using random projections of those variables that were assigned to the respective remaining workers. Importantly, the communication costs of LOCO are very low. In the fixed design setting, we show that the difference between the estimates returned by LOCO and the exact ridge regression solution is bounded. Experimentally, LOCO obtains significant speedups as well as good predictive accuracy. Notably LOCO is able to solve a regression problem with 5 billion non-zeros, distributed across 128 workers, in 25 seconds.
Sabrina Dorn Local Polynominal Matching and Considerations with Respect to Bandwidth Choice Sara van de Geer Jul-2014
Abstract: This master's thesis considers local polynomial matching which is a popular methodin econometrics for estimating counterfactual outcomes and average treatment effects. We discuss identification of counterfactual expectations under conditional independence, give an overview of selected properties of the local polynomial matching estimator, and apply these to calculate the mean squared error for the according two-step estimator for general order approximating polynomials. Finally, this enables us to derive and implement a feasible mean squared error criterion that can be minimized numerically, and provide some evidence of its reasonable performance within an empirical application to the NSW and PSID data.
Olivier Bachem Coresets for the DP-Means Clustering Problem Andreas Krause
Markus Kalisch 
Jul-2014
Abstract:
Valentina Lapteva Different Stability Selection Models for Structure Learning Nicolai Meinshausen Jul-2014
Abstract: Recent developments in analytics, high performance computing, machine learning, and databases result in a situation when collecting and processing web-scale datasets becomes possible. Not only the number of samples increases dramatically, but also the number of features observed and evaluated.Big data analysis, in turn, requires unique experts that need to fully understand all the attributes of the data and the connections between them, which can be costly if at all possible. This all brings the problem of automated structure discovery in the most acute way.The task of structure learning attracts a lot of attention, with many new algorithms being proposed in recent years. However, all of them highly depend on the choice of a regularization parameter. To deal with this problem, Stability selection technique cite{stability_selection} was proposed. Original formulation of Stability Selection approach limits the maximum number of false positive variables selected.In this thesis we explore the problem of learning the structure in an undirected Gaussian graphical model. We extensively explore the properties of Stability Selection when applied in combination with different structure estimators, such as Graphical LASSO cite{glasso}, CLIME cite{clime} and TIGER cite{tiger}.We also propose and explore, for the first time, a variety of different models that are based on Stability Selection approach, but rely on different types of assumptions or incorporate different types of constraints.For example, we show how to incorporate the prior knowledge about the sparsity pattern, topological constraints, such as connectivity or the maximum number of edges adjacent to every node.We also explore assumptions based on the properties of an estimator, such as homogeneous type I and type II discrepancies, or the underlying logistic model as a function from an estimator output and the output of the method.We show that in some cases, either when the prior assumptions hold, or when the graphical model structure is dense, the proposed models can serve as a better regularizer for Stability Selection than the original formulation.
Gian Andrea Thanei Dimension reduction techniques in regression Nicolai Meinshausen Jul-2014
Abstract:
Maximilien Vila Statistical Validation of Stochastic Loss Reserving Models Submission Lukas Meier
Jürg Schelldorfer
Jul-2014
Abstract: Claims reserving in non-life insurance is the task of predicting claims reserves for theoutstanding loss liabilities. There are many methods and models to set the predictedclaims reserves. However, in order to quantify the total prediction uncertainty of thefull run-off risk (long term view) or the one-year risk (short term view) a corresponding stochastic model is needed. In practice, one usually compares the results of several stochastic models in order to determine the appropriate claims reserves and their uncertainties. From a statistical point of view, all these stochastic models require a thorough consideration of the data as well as checking if the model assumptions are fulfilled. In this thesis we are going to investigate these issues by focusing on four different models: the distribution free Chain Ladder model, the Cumulative Log Normal model, a Bornhuetter-Ferguson model and generalized linear models. We present known statistical tools and some newly developed data plots and model checking graphics to support the decision for the appropriate stochastic model. Different numerical examples are used to illustrate the procedure of model checking. Public triangles and AXA triangles were considered and the conclusions coincide. Therefore and for confidentiality we only present the results for the publicly available data.
Colin Stoneking Bayesian inference of Gaussian mixture models with noninformative priors Peter Bühlmann May-2014
Abstract: This thesis deals with Bayesian inference of a mixture of Gaussian distributions. A novel formulation of the mixture model is introduced, which includes the prior constraint that each Gaussian component is always assigned a minimal number of data points. This enables noninformative improper priors such as the Jeffreys prior to be used for the component parameters. We demonstrate difficulties involved in specifying a prior for the standard Gaussian mixture model, and show how our new model can be used to overcome these. MCMC methods are given for efficient sampling from the posterior of this model.
Alexandra Ioana Negrut Traffic safety in Switzerland Hans R. Künsch May-2014
Abstract: More than 50000 car accidents occured on Swiss roads in 2012. With new data at hand, the Traffic Engineering department of ETH Zurich was interested in finding out which factors determine the severity of a car accident. Moreover, they were interested to know what determines a certain cause and type of a car crash. In order to answer this first set of questions, parametric and non parametric methods were used and then compared in terms of misclassification errors and variable ranking. The results confirmed that in order to predict the accident's severity level, one also needs information about the events that didn't happen. In the second part of the thesis, the severe crash frequency was investigated on five of the Switzerland's motorways. It was proved that the higher the average daily volume (DTV) the higher the number of severe accidents.
Yannick Trant Stock Portfolio Selection with Random Forests Peter Bühlmann
Thorsten Hens
May-2014
Abstract: Applications of machine learning algorithms to stock selection usually focus on technical parameters and limited sets of fundamental company ratios. In this study the complete balance sheet, income statement and cash flow statement information of US companies from 1989-2013 is used as model input. The amount and inhomogeneous distribution of missing values is a key characteristic and difficulty in working with this data. I present a structured way to prepare this challenging dataset for statistical learning methods. The fundamental data is complemented by a wide range of technical indicators. In this work the predictive power of random forests is explored on a calibration period from 1989-2006 using this huge data set with respect to stock return prediction. My results show that a small but significant predictive power with respect to ranked returns can be attained for an ‘extreme’ random forest parametrization. The calibrated random forest parametrization raises interesting question with respect to the nature of the data set. Based on the random forest predictions simple investment strategies are formulated. They exhibit significant out-performance in an out-of-sample back test for the period from 2006- 2013. The risk adjusted performance measures are on level with the latest stock selection criteria in the finance literature. Throughout my work I illustrate the challenging peculiarities of working with equity data and propose solutions originating both from finance and mathematics.
Annette Aigner Statistical Analysis of Lower Limb Performance Assessments in Patients with Spinal Cord Injures Marloes Maathuis
Armin Curt
Lorenzo Tanadini
May-2014
Abstract: Based on longitudinal data from spinal cord injured patients participating in the European Multicenter Study about Spinal Cord Injury, the focus of this thesis lies on the assessments of lower limb performance. Initially, the performance measures' abilities to capture change in a patient's walking ability are measured and their relationships with each other assessed. Based on these results, two measures are identified to subsequently explore the possibility of modelling a patient's recovery in these two outcome measures. Finally, the potential of predicting the extent to which patients will regain their walking ability is examined. Choosing methods such that the results may best help answer the respective research questions, non-parametric two-sample testing, canonical correlation analysis, principal component analysis, latent class factor analysis, as well as linear mixed effects models and random forest were relied upon. The findings show that the scores, currently used on an equal footing for assessing lower limb performance, only apply to certain patients. Therefore, there are subgroups of scores associated with specific patient groups. Out of the six walking tests (6MWT, 10MWT, TUG, SCIM3a, SCIM3b, WISCI), 6MWT and SCIM3b exhibit the desired characteristic of responsiveness and turn out slightly better, and especially most consistent, with respect to the assessment of the interdependency of all scores. Regarding the potential of modelling recovery, i.e. the development over time, the effect of time on 6MWT exhibits a log-like trend. On the other hand, the recovery measured with SCIM3b has a different development, for which time alone may even have a negative influence. The results for the prediction of these two outcomes, six months after injury, showed that such an endeavor is very difficult and will therefore have low accuracy if applied to new patients.
Claude Renaux Confidence Intervals Adjusted for High- Dimensional Selective Inference Peter Bühlmann Apr-2014
Abstract: There is a growing demand for determining statistical uncertainty which is a largely unexplored field for high-dimensional data. The main focus of this thesis lies on confidence intervals adjusted for selective inference in the high-dimensional case. Selective inference denotes the selection of some co-variables and construction of the corresponding confidence intervals based on the same data. This results in a bias, namely the selection effect. One can correct for the selection effect by adjusting the marginal confidence level. We select some co-variables and apply this adjustment to Bayesian confidence intervals based on Ridge regression and frequentist confidence intervals based on de-sparsifying the Lasso. Furthermore, we summarize the theory of selective inference and of the methods used to construct confidence intervals. The methods are demonstrated on a real data set, and large simulations on synthetic and semi-synthetic data sets are carried out. Two of the three methods proposed to construct Bayesian confidence intervals based on Ridge regression perform well only in some set-ups. Furthermore, our simulations show that the False Coverage-statement Rate (FCR) criterion is controlled and the power takes high values for the confidence intervals based on de-sparsifying the Lasso. Moreover, the implementation of the de-sparsified Lasso can be changed for the purpose of selective inference which results in computations finishing in 1% to 6.5% of the time with only slight changes in the results. The results are useful for settings where selective inference is appropriate and high-dimensional data is present.
Christoph Dätwyler Causality in Time Series, a Time Series Version of the FCI Algorithm and its Application to Data from Molecular Biology Marloes Maathuis Apr-2014
Abstract: Among many other concepts, Granger causality has become popular to infer causal relations in time series. In the first part of this work we give a short introduction to this topic, whereby we see that Granger causality can be formulated in terms of conditional orthogonality or conditional independence and can be closely linked to path diagrams, which provide a convenient way of visualising causal relationships among the factors/variables of interest. A concept called m-separation then provides us with a graphical criterion to infer conditional orthogonality relations in path diagrams and we conclude the first part with a precise statement linking m-separation and Granger causality.The second part then deals with the FCI algorithm, which has been designed to infer causal relations among systems of variables, where possibly not all of them have been observed. Furthermore we present an adaptation of the original FCI algorithm to the framework of time series data.In the last part of this thesis we apply the time series version of the FCI algorithm to a dataset from molecular biology, with the goal to infer causal relations among the factors of interest and thereby getting a better understanding of how the transcription process of genes works.
Thomas Schulz A Clustering Approach to the Lasso in the Context of the HAR model Peter Bühlmann
Francesco Audrino
Apr-2014
Abstract: We discuss a covariate clustering approach to the Lasso and compare it to the standard Lasso in the context of the HAR model. We analyze the difference in  forecasting error between these models on historical volatility data and find that the error tends to be slightly larger for the clustering approach. Subsequently, we employ the same data to compare the stability of the chosen coefficients for the considered models and we observe that the clustering approach achieves better results than the standard Lasso. Finally, we conduct a data simulation analysis to study stability issues in a synthetic HAR setting and conclude again that the coefficients selected by the clustering approach appear to be more stable.
Huan Liu Incorporating Prior Knowledge in CPDAGs Marloes Maathuis Mar-2014
Abstract: A causal model can be presented as a graph model, with each node representing a variable, and each edge representing a causal relationship. A completed partial directed acyclic graph (CPDAG) is such a causal model with no hidden variables, and with every undirected orientation being possible. A causal prior knowledge is presented as the existence or absence of a directed path from one variable to another. This paper provides an algorithm to incorporate a set of causal prior knowledge into a CPDAG. It uses the chordal properties of a CPDAG to separate the undirected graph into connected subgraphs, then with the help of Meek’s rules and theorems to incorporate all the prior knowledge. This paper also proves the correctness of the incorporation for both positive prior knowledge and negative prior knowledge. Furthermore, a simulation is done to test and compare the performance of the algorithm.
Lana Colakovic Classification using Random Ferns Nicolai Meinshausen Mar-2014
Abstract: Random Ferns are a supervised learning algorithm for classification introduced recently by Özuysal, Fua, Calonder, and Lepetit (2010), as a simpler and faster alternative to Random Forests (Breiman (2001)), with specific application in image recognition. In contrast to trees, ferns have non-hierarchical structure and the aggregation is performed by multiplication rather that averaging. Also, they rely on completely random selection of features as well as split points. The aim of this master's thesis is to investigate general properties of Random Ferns and compare them to Random Forests. We want to see if, and under which circumstances, Random Ferns are comparable in performance to Random Forests. We implemented Random Ferns algorithm in R and used simulated as well as real data sets to investigate Random Ferns' properties in more detail.
Christoph Kovacs Semi-supervised Label Propagation Models for Relational Classification in Dyadic Networks: Theory, Application and Extensions Marloes Maathuis
Lukas Meier
Feb-2014
Abstract: If a dataset not only comprises instance features but also exhibits a relationalstructure between its elements, it can be represented as a network with nodes definedby instances and links defined by relations. Data analysis can be performedon such a structure under the statistical relational learning (SRL) paradigm. Twoof its basic cornerstones, collective classification and collective inference, can becarried out by semi-supervised label propagation (SSLP) algorithms, which allowfor label information to be propagated and updated through the network to arriveat class affiliation predictions for unlabeled nodes. For this purpose, harmonicfunctions have been applied on Gaussian random fields and adapted accordingly,leading to the weighted-vote Relational Neighbor classifier with Relaxation Labeling(wvRNRL). Extending this approach to support social features, extractable fromthe network’s topology, results in the Social Context Relational Neighbor (SCRN)classifier. Moreover, MultiRankWalk (MRW), a classifier which uses ideas from randomwalk with restart, is presented and discussed. These different semi-supervisedclassification models are being applied on nine dyadic networks and their predictionperformances are being evaluated for various accuracy measures using the repeatedNetwork Cross-Validation (rNCV) scheme. Ideas to relax certain model restrictionsand to expand their applicability are outlined, together with a suggested measureof unlabeled node importance (MIUN statistic). In order to provide an adequatevisualization of the obtained results, a new means of holistic visualization, theCirco-Clustogram, is proposed. A discussion of the advantages and disadvantagesof semi-supervised label propagation and its applicability concludes this thesis.
Ambra Toletti Tree-based variational methods for parameter estimation, inference and denoising on Markov Random Fields Sara van de Geer Feb-2014
Abstract: The attention of statisticians and computer scientists for variational methods has increased considerably in the last few decades. While it has become (computationally) cheap to store huge amounts of multivariate data describing complex systems (e.g. in natural sciences, sociology, etc.), the elaboration of this information for either getting parameter estimates for the underlying statistical models, or making inference or denoising is still infeasible in general. In fact, classical (exact) methods (e.g. computing Maximum Likelihood estimates via Iterative Proportional Fitting) need a huge amount of time for solving these issues if the complexity/size of the underlying model is sufficiently large. Markov Random Fields, which are widely used because of their nice representations as both graphs and exponential families, are not immune to this problem. In this case it is possible to convert both inference and parameter estimation into constrained optimization problems connected with the exponential representation. Unfortunately this transformation does not provide any improvement in feasibility, because it is often impossible to write the objective function in an explicit way and even the quantity of constraints is prohibitive. One can obtain a  computational cheaper (approximate) solution by appropriately relaxing the constraints and by approximating the objective function. In this work the relaxation was made by considering all combinations of locally consistent marginal distributions and the objective function was approximated with a convex combination of Bethe entropy approximations based on the spanning trees of the underlying graph. Wainwright (2006) proved that parameters estimates obtained with this method are asymptotically normal but don’t converge toward the true parameter. However, if these estimates are used for purposes such as inference or denoising their performance is comparable with the one of exact methods. In this work some empirical evidence confirming these properties for an Ising model on a grid graph was produced and general definitions and results about graphical models and variational methods were resumed.
Tobia Fasciati Semi Supervised Learning Markus Kalisch Feb-2014
Abstract: The potential advantages of Semi Supervised Learning compared to more traditional learning methods like Supervised and Unsupervised Learning has attracted many researchers in the recent past. The goal is to learn a classifier from data having both labeled and unlabeled observations by exploiting their geometrical position.The aim of this Master Thesis is to give an overview about SSL and study two different methods, Transductive Support Vector Machine and Anchor Graph Regularization. Finally, both approaches are tested on selected datasets.
David Bürge Causal Additive Models with Tree Structure: Structure Search and Causal Effects Peter Bühlmann
Jonas Peters
Feb-2014
Abstract: Drawing conclusions about causal relations from data is a central goal in numerous scientific fields. In this thesis we study a special case of a restricted structural equation model (SEM). In addition to the common assumptions of acyclicity and no hidden confounders, we assume additive Gaussian noise, non-linear functions and a causal structure represented by a directed acyclic graph (DAG) with tree structure. Given data from such a causal additive model with tree structure (CAMtree) we estimate the underlying tree structure and give characterisations of the causal effects from variables on others. This restricted model leads to several simplifications. Identifiability of the structure is guaranteed by a result from Peters et al. (2013). We present a method that efficiently finds a maximum likelihood estimator for the causal structure among all trees. As our method is based on local properties of the distribution, it extends without constraints to high-dimensional settings. Furthermore, we investigate how to characterise causal effects from one variable on others. The maximum mean discrepancy is used to quantify changes in the distribution of the effect variable when the potential cause is varied. Based on our estimate for the structure, we present a procedure which, given only observational data, predicts the strongest causal effects. All methods are implemented in R and we give experimental results for synthetic data and one set of real high-dimensional data.
Emilija Perkovic The FCI+ Algorithm Markus Kalisch Feb-2014
Abstract: The primary focus of this thesis was to understand and implement the FCI+ algorithm as described in “Learning Sparse Causal Models is not NP-hard” Claassen, Mooij, and Heskes (2013a). In order to understand how this algorithm works, a short introduction to causality and some methods of dealing with causal data are examined. Firstly, we deal with introducing the reader to the terminology and graphical representation of causal systems. Then we focus on examining methods for dealing with data from causal systems when there are no hidden variables (PC), as opposed to those when hidden variables are present (FCI, FCI+). Special attention is given to the theory behind the FCI+ algorithm. In the end a comparison between FCI and FCI+ is made, based on accuracy and computational time, and conclusions are drawn.
Xi Xia Comparision of Different Confidence Interval Method for Linear Mixed Effect Models Martin Maechler Feb-2014
Abstract: Our study is a simulation analysis of different confidence intervals methods of fixed effect parameter in linear Mixed-effects models. Two functions, lmer function in the lme4 package and lme function in nlme package, are used to fit the linear mixed-effects mod- els. 6 different confidence interval methods from pacakges lme4, nlme, lmerTest and boot are studied and compared in our study. We conclude that both lmer and lme functions have similar results in fitting the LME models, but bias are growing as the number of fixed effects increase. For confidence interval methods, a general finding is that most of the intervals are too small. But among all methods, lmerTest method perform best. It has the lowest confi- dence interval MP among all methods and its coverage rate is closest to the nominal rate (α). The drawback of lmerTest is it sometimes returns error or intervals which do not make sense(e.g. with infinite boudary) and it runs significantly slower than lme4-Wald and nlme-intervals. Lme4-Wald and nlme-intervals are both very stable and fast, but the intervals are nearly always too small. Profile method is not better than lmerTest, and bootstrap-type methods perform worst. Also, we found that sometimes poor performance of confidence intervals might indicating overfitting in model design.

2013

Student Title Advisor(s) Date
Vineet Mohan Grouped Regression in High Dimensional Statistics Sara van de Geer Oct-2013
Abstract: This work is devoted to clustered estimation in a sparse linear model where parameters are highly correlated and far outnumber the observations. Three variants of the group lasso technique from literature are examined. They are found to have equivalence with  weighted lasso after some dimension reduction. The priors they impose on the parameters is used to suggest which class of problems they work best with. Based on this analysis, a new estimator which bases dimension reduction on principal component analysis is proposed. Empirical experiments follow to confirm results from theory.
Vasily Tolkachev Parameter Estimation  for Diffusion Proess Hans Rudolf Künsch Sep-2013
Abstract: This work considers estimating functions approach to calibrating parameters in stochastic differential equations based on discretely-sampled observations. Since the likelihood function is not known in closed form in the discrete case, we have to rely on an approximation to the score function, the estimating function, and then take its root as the estimator. It turns out that roots of estimating functions enjoy a number of remarkable asymptotic properties. First, some major rigorous regularity assumptions are outlined for the major results to hold. Then we consider one major result that when conditional moments of the process are known in closed-form, the roots of an estimating function are asymptotically normal. Secondly, a more general theorem, which uses sample moments instead of the conditional ones in the estimating function, is discussed. Under a suitable choice of the approximating scheme the roots are still asymptotically normal, but with bias and larger variance. Finally, estimation of both drift and diffusion coefficients are considered for Geometric Brownian Motion and Ornstein-Uhlenbeck process, generated from a Monte-Carlo simulation. Important issues for various values of parameters are emphasized, as well as advantages and difficulties of using estimating functions.
Sarah Grimm Supervised and semi-supervised classification of skin cancer Sara van de Geer
Markus Kalisch
Chris Snijders
Aug-2013
Abstract: As skin cancer rates continue to grow, dermatologists will be overwhelmed with the number of patients seeking skin cancer diagnosis. This problem is being addressed in the Netherlands, where research with a hospital has been developing logistic  regression models that may help train nurses to diagnose skin cancer, and that are accessible via a mobile application. The present work investigated whether the logistic regression models could be improved or outperformed. A small simulation study explored the future potential for improving the models by incorporating information from patients who would use the application and who have not received a diagnosis.Logistic regression proved to be a competitive model. A smaller set of predictors with which models performed practically as well was identified. Although incorporating information from undiagnosed cases did not improve performance, it also did no deteriorate it, and it is worth to continue investigating the value of undiagnosed cases for model performance.
Lennart Schiffmann Measuring the MFD of Zurich: Identifying and Evaluating Strategies for an Efficient Placement of Detectors Marloes Maathuis
Markus Kalisch
Aug-2013
Abstract: In recent years the macroscopic fundamental diagram (MFD) was established in the traffic research community. It can describe the overall traffic state in homogeneously congested areas in cities. To facilitate a real world implementation of an MFD-based traffic control system, we are developing strategies for placement of fixed monitoring resources (e.g. loop detectors). These strategies to place detectors efficiently are based on univariate and bivariate distributions of street properties such as road length, number of lanes and occurrence of traffic lights. We find that the use of bivariate distributions including the length of streets can yield good results. Our research is based on a microsimulation of the city of Zurich implemented in VISSIM.
Reto Christoffel-Totzke Time Series Analysis Applied to Power Market Data Peter Bühlmann Aug-2013
Abstract: The object of study of the present thesis are the daily closing prices of the futures contract for the Base-13. The goal is to elaborate on their characteristics and to understand which impacts determine their trend. By means of appropriate methods and procedures, the most important of numerous variables are selected and five different models developed to describe the Base-13. These models simultaneously try to compute precise one step ahead forecasts for future closing prices. A short introduction equips the reader with the necessary basic knowledge about the functional principal of power markets in order that the results of the analyses can be understood and their interpretations comprehended. The descriptive time series analysis of the closing prices demonstrates in the following section that the volatility heavily changes over a specific period of time what challenges the development of the models. Furthermore, in the same section, the Random Walk Hypothesis could not be confirmed concerning independent and incidental alteration in prices for the financial contracts in the case of the Base-13. The next section focusses on GLM models. Based on GLM, a model has been developed which includes the most important indicators for the closing prices: coal API2-13, EUA-13, Gas TTF-13, CLDS and the USD/EUR exchange rate. The resulting forecasting model with GLM generated very accurate performances with a precision in trend of 81%. A strong linear correlation has appeared between Base-13 and coal, EUA, gas and the exchange rates having the major quantifiable impact what is shown in a graphical analysis of these effects. Thereafter, the impact analysis has been intensified. In the course of analyzing, it has produced some interesting insights on the reaction of the closing prices concerning the changing volatility of the input variables. All variables of the final GLM model are highly significant in the GAM as well and show identical features relating to their impact on the Base-13. The forecasting model with GAM reaches accuracy in trend of 78%. The research documented in the next section has been able to confirm four important variables of the final model by applying MARS: coal, EUA, CLDS und the USD/EUR exchange rate. The effect of those most important variables likewise is almost linear according to the graphical analysis. The forecasting model with MARS reaches accuracy in trend of 78%. Furthermore, another forecasting model has been developed with NNET which captures non-linear effects to an acceptable extent. The relating effect plot illustrates this non-linearity quite obviously, especially high for gas, the exchange rates, coal and CLSS. The forecasting model with NNET demonstrates accuracy in trend of 74%. The following section illustrates that the results with PPR confirm the outcome to a considerable extent provided with GLM for the final model. The forecasting model based on PPR shows a precision in trend of 75%. Various theoretical findings relating to the impact on the closing prices of Base-13 as well as such based on applied experience have been confirmed based on empirical data. The straightforward linear model has proven very accurate as well as comprehensible thanks to its mathematical form. Furthermore, it has been demonstrated that complex non-linear models bear no advantage due to the strong correlation of the most important variables and the Base-13. It can therefore be concluded that the goals set for this thesis have been achieved by providing substantial insight in theoretical and applied aspects of statistical models relating to forecasting of futures closing prices.
Andrea Remo Riva Convex optimization for variable selection in high-dimensional statistics Sara van de Geer Jul-2013
Abstract: Which genes are involved in the favor or oppose the formation of potentially fatal diseases such as prostate cancer, Crohn’s or Huntington’s disease? The world around us is increasingly confronted with situations in which a large number of collected data should be interpreted with the purpose of being able to formulate specific hypotheses about the reasons that lead to phenomena of particular interest. The modern statistics therefore seeks to develop new tools that can effectively deal with this kind of problems. This Master-Thesis will initially refresh the basic ideas related to the LASSO (Least Absolute Shrinkage and Selection Operator) introduced by Tibshirani in 1996 and the basics of convex optimization. Following the study will focus on finding optimal solutions by regularizing the empirical risk with appropriate nonsmooth norms. The proximal methods face profitably and in a diversified manner these optimization problems and become of considerable interest from the computational point of view because the proposed algorithms have good convergence rates. Later we will be interested to explore the possibility of introducing a structured-sparsity in the solutions in order to be able to greatly improve the quality of the regression coefficients. For this purpose we will introduce new variational norms imposing the membership of auxiliary vectors with positive components to a set of our choice determinant in an inductive way the desired structure. Finally, some applications in the field of image processing and medical research will illustrate concretely how the multidimensional statistic is called today upon to help the man.
Nilkanth Kumar  An Empirical Analysis of the Mobility Behaviour in Switzerland using Robust Methods  Werner Stahel
Massimo Filippini
May-2013
Abstract: In this thesis, the demand for personalized mobility by Swiss households has been studied using vehicle stock parameters, geographic and socio-economic characteristics. For this purpose, disaggregate household level data from the latest Swiss travel micro-census for the period 2010 { 2011 has been used. In addition to the OLS approach, robust methods using MM-estimators have been incorporated to obtain improved model fits and estimation results. Few related demand questions, like comparing car usage of single and multiple-car households, are also explored.
The estimated coefficients mostly have expected signs. The demand for personal mobility is found to vary diversely across different locations and households. The non-availability of good public transport in an area is found evidenced with a significantly higher demand for car utilization. Rich households appear to have a higher travel demand in general.Efficient cars are found to be driven more compared to those with poor energy ratings. In multi-car households, vehicle usage disparity of as much as 21% is noticed based on the efficiency label. From a policy maker's point of view, further research into specific areas to assess feasibility of diverse policy instruments that account for the differences in vehicle utilization behaviour of the people is advised.
Nicolas Meng Optimal Portfolios - The Benefits of Advanced Techniques in Risk Mangement and Portfolio Optimization Sara van de Geer
Markus Kalisch
May-2013
Abstract: This Master Thesis deals with the most important challenges facing practitioners in portfolio and risk management. It embeds a variety of risk- and optimization methodologies into a common framework and performs an empirical backtest on a typical sector rotation strategy in the US market. The objective of this study is to evaluate the impact of wrong assumptions in risk modeling and portfolio optimization, as a recent survey showed that practitioners are still using simplified approaches based on wrong assumptions, despite empirical evidence that contradicts their assumptions. This thesis embeds a variety of risk and optimization methods into a common framework and performs an empirical backtest on a typical sector rotation strategy in the US market. First, we apply different risk forecast models to the empirical data. Apart from an unconditional model still prominently practiced, a constant conditional correlation (CCC) and dynamic conditional correlation (DCC) model are implemented and the forecasting performance is evaluated on the risk measures of volatility, VaR, and CVaR. There is clear empirical evidence that the unconditional model performs poorly and lead to severe underforecasting and clustering of loss during the financial crisis of 2008. The more complex DCC model provided the most accurate forecasts, followed by the CCC model. This demonstrates that wrong model assumptions lead to unacceptable results in practice. Based on forecasts from all risk models, two optimization approaches are tested. An adapted version of the traditional mean-variance optimization is employed. Additionally, a relatively new method of diversification optimization is implemented and compared against return maximization, subject to a CVaR constraint. Using this comparison, we examine the effect of estimation error on the expected returns and risk parameters. As a diversification approach is invariant to the estimates of expected returns, we assume that it should provide more stability to an optimized portfolio. We were able to confirm the concerns about estimation error and found that return maximization does not lead to optimal portfolios out of-sample. In contrast, the empirical results of the diversification-CVaR strategy are promising. Maximum diversification of independent risk factors leads to better performance in terms of both, realized risk and returns. In light of these findings, we question the practice of using the traditional method of return maximization, as the cost of ignoring estimation error in the optimization seems to be significant. Finally, we conclude that the standard approach still followed by a majority of practitioners does not deliver satisfactory results due to wrong assumptions about the statistical properties of the financial markets. We conclude that conditional risk estimates and the problem field of estimation errors are important aspects that cannot be neglected solely for the sake of simplicity.
Cong Dat Huynh Semi-supervised learning methods for problems having positive and unlabeled examples Sara van de Geer
Markus Kalisch
Thomas Beer, Swisscom
Apr-2013
Abstract: A company can use upselling methods to upgrade the products its customers have bought from the company. Besides increasing the profit, the higher dependency of the customers to the company through the new upgraded products can help to reduce the churn rate. This is especially crucial in the telecommunication sector in which the volatility is high and the customer loyalty is low. The easiest way to upsell is to offer the customers the upgraded products. However, the reason why not to offer all products to all customers is that too many marketing information will annoy the customers. In this paper we will introduce and compare several methods that can support the decision of whether to offer a product to a customer or not. The effectiveness of the methods is validated through a simulation study based on real world datasets. The results from the study indicate that several methods have great potential
Ruben Dezeure P-values for high-dimensional statistics Peter Bühlmann Mar-2013
Abstract: In this work, recently published methods for hypothesis testing inhigh-dimensional statistics are studied. The methods are compared bytesting for variable importance in linear models for a  variety of test setups, including real datasets. For multiple testingcorrection a procedure is used that is closely related to theWestfall-Young procedure, which has been shown to have asymptoticallyoptimal power. The estimation performance of the regressioncoefficients is also looked at to provide a different level ofcomparison. Finally, we also test for a logistic regression model toinvestigate if testing in generalized linear models is reliable with state of the art methods.
Harald Bernhard Parameter estimation in state space models Hans Rudolf Künsch Mar-2013
Abstract: We considered the effectiveness of a particle approximation procedure to the score function via filtered moments of artificially time-varying parameters in general state space models. To investigate this issue we considered a simple two state hidden Markov model where exact reference values are available. For this model we conducted simulation studies to estimate several diagnostic statistics about the score approximation procedure. The results were then used to perform maximum likelihood estimation in the same model, using the noisy score approximation in combination with a stochastic approximation procedure.
Mark Hannay Robust Testing and Robust Model Selection Werner Stahel
Manuel Koller
Mar-2013
Abstract: As the title of the thesis suggests, this thesis belongs to the domain of robust statistics.  There are 3 main chapters: testing in linear models, testing in generalized linear models and model selection.We start in the linear model, where we describe classical estimation and classical tests. After describing the classical methods, we introduce robust estimation, namely the SMDM estimator. With our robust estimates we present robust tests, including a new robust score test. To improve the speed of our new robust score test, we develop methods to estimate the scale parameter $sigma$ from the reduced modal. The most prominent robust tests for the composite hypothesis are the robust Wald test and the $ au$-test. Both these tests are computationally expensive, they require fitting the full model. We develop new robust tests, that only require fitting the reduced model.In generalized linear models (GLMs), we once again describe the classical estimation and the classical tests. By using robust scores, we introduce robust estimation. In GLMs, 2 prominent robust tests already exist, the quasi deviance test and the robust saddlepoint test. However, they are computationally expensive. So we introduce the robust Wald test and the  robust score test, which are both computationally cheaper. Here we compare the quasi deviance test with the robust Wald test and the robust score test, while simultaneously comparing them to the classical saddlepoint test. In the chapter on model selection, we introduce an important method, the classical Mallows' $C_{p}$ criterion. By using the classical Mallows' $C_{p}$ criterion in an example, we discuss the importance of using robust methods for model selection. So we develop our own robust Mallows' $C_{p}$ criterion, which works well in the example. We compare the classical and the robust Mallows' $C_{p}$ criteria with each other in a simulation study. Another approach to model selection, based on testing is also discussed. I have tried to make this thesis as self contained and as comprehensive as possible, while keeping to the essentials. Chapters 2 and 4 should be accessible for people with a good foundation in linear regression. While chapter 3 should be accessible with a good foundation in generalized linear regression.
Benjamin Stucky Second-Level Significance Testing Sara van de Geer Feb-2013
Abstract: The emergence of all the modern-day information gathering technologies, amongst all their benefits, gave rise to some new problems and challenges. Nowadays we need to be able to handle huge data sets. We will often face the problem that some information of interest is very rarely contained in our data. At the same time this information is very hard to distinguish from every other observation. This thesis will focus on how to detect the presence of such sparse information with the aid of a method called Higher Criticism. This is a hypothesis test to determine whether we have a very small fraction of non-null hypotheses amongst many null hypotheses or if this fraction is indeed zero. For the definition of this test we need a collection of different significance tests, hence the name Second-Level Significance Testing. Higher Criticism was suggested by Tukey in 1976 and then developed by Donoho and Jin [15] in 2004. This thesis is mainly based on their work as well as the work of Cai et al. [9]. The main focus lies on the detection of sparse signals, but some cases where the signals are dense are also discussed. Higher Criticism works very well for the adaptive detection of sparse and faint signals amongst background noise. Adaptive means that Higher Criticism is able to work without knowing the sparseness and the faintness of the detection problem. The case where the data is Gaussian distributed is the basis for developing the Higher Criticism test statistic. In this setting Higher Criticism is optimal. Optimality means that asymptotically Higher Criticism is able to detect all theoretically detectable signals. The detectable signals are described by the detection boundary. We also encounter the problem of correlated observations. There we can modify Higher Criticism and still get nice results, this follows the work of Hall and Jin [19]. The notion of the detection boundary and Higher Criticism can even be generalized to a wide range of different settings due to Cai and Wu [12]. Higher Criticism thus solves one challenge that new technologies have posed us. We discuss other important problems connected to the detection of sparse signals according to Cai et al. [10], such as the estimation of the fraction of sparse signals and discovering which observations are signals of interest.

2012

Student Title Advisor(s) Date
Giacomo Dalla Chiara Factor approach to forecasting with high-dimensional data, an application to financial returns Peter Bühlmann Oct-2012
Abstract: This study considers forecasting a time series of financial returns in a linear regression setting using a number of macroeconomic predictors (N) which can exceed the number of  time series observations (T). Usually, regression estimation techniques either consider only a handful of predictors  or assume that the vector of parameters is sparse. Several recent papers advocate the use of a factor approach to deal with such high-dimensional data without discarding any of the predictors. Assuming an approximate-factor structure on the data, it is possible to summarize the large set of time series using a limited number of indexes, which can be consistently estimated using principal components. First, we review the recent theoretical developments in the construction and estimation of a forecasting procedure which uses the large-dimensional approximate factor model. The aim is to contribute to bridge these studies with the empirical research, which presents mixed performance results of the factor model implemented on real-world data. In a second part we discuss four implementation techniques of the factor model, namely, (i) screening, (ii) estimation window size selection, (iii) factor selection, (iv) variable selection in a factor-augmented regression. We argue that these four methodologies, which have often been considered separately in the empirical literature, are paramount for the factor model to achieve a better forecasting performance than lower dimensional models. In the last part of this study, factor-augmented models, with and without the above mentioned methodologies, are implemented using the Stock and Watson (2006) dataset of macroeconomic and financial predictors to forecast the time series of monthly returns of the Standard and Poor 500 index. Indeed, the empirical results show that screening and estimation window size selection are needed, in the factor model, to outperform lower dimensional benchmarks. The main contribution of this work is to provide general guidelines for applying the large dimensional factor model to real-world data. All the practical methodologies discussed in the paper are coded in the R programming language, and are contained in Appendix E.
Raphael Gervais Predicting the Effect of Joint Interventions from Observational Data Marloes Maathuis Sep-2012
Abstract: It is commonly believed that causal knowledge discovery is not possible from observational data and requires the use of experiments. In fact, it is indeed impossible to learn causal information from observational data when one is not willing to make any assumptions. However, under some fairly general assumptions, IDA (Intervention calculus when the DAG is Absent) is a methodology that can deduce information on causal effects from observational data. The present work extends the IDA methodology in two ways. Firstly, in the case of single outside interventions on a system, two new algorithms are presented: IDA Path and IDA Semi-Local. These algorithms compare favourably in simulation studies in terms of both statistical properties and computational efficiency. Secondly, the IDA methodology is extended to cases where one seeks information about the causal effect of joint outside interventions on a system. Here, two algorithms are introduced, IDA IPW Joint and IDA Path Joint, that show encouraging results in simulation studies. These new algorithms for joint interventions may easily be extended to the creation of IDA-type algorithms for arbitrarily many outside interventions on a system.
Laura Buzdugan High-dimensional statistical inference Peter Bühlmann
Markus Kalisch
Aug-2012
Abstract: The present work seeks to address the issue of error control in high-dimensional settings. This task has proven challenging due to: 1) Difficulty of deriving the asymptotic normal distribution of the estimators, and, 2) The high degree of multicollinearity commonly exhibited by the predictor variables. These two issues were addressed by combining Bühlmann (2012)’s method of constructing p-values based on Ridge estimation with an additional bias correction term, and Meinshausen (2008)’s proposal of a hierarchical testing procedure that controls FWER (Family Wise Error Rate) at all levels. This led to the extension for the construction of p-values to cases in which the response variable is multivariate. The new method was tested on an SNP phenotype association dataset, which also allowed for investigation of different approaches to bias correction.
Michel Philipp Cost Efficiency of Managed Care Programs in Health Care Insurance Werner Stahel Aug-2012
Abstract: Managed care (MC) plans in health care systems promise an improved quality of medical service at significantly lower expenses. Therefore, politicians and health insurers have a strong incentive to estimate the cost efficiency of such alternative insurance plans from historical data on health care expenditure (HCE).However, estimating cost effects between basic and alternative insurance plans in an observational study is particularly challenging. Differences between the baseline characteristics in the different insurance collectives result in selection biases. This occurs notably when insurance companies o_er discounts to MC plans policyholders, effectively creating an economic incentive that approaches basically young and healthy people.This thesis first discusses the statistical challenges that artise when estimating the cost efficiency of MC plans. To draw causal conclusions from estimates based on observed health care data, the MC plan assignment must be independent of the HCE within subgroups of relevant confounders. Unfortunately, not every potential confounder can be observed by the insurance companies and therefore we conclude that it is not practicable to estimate causal effects from the available health care data. However, insurance companies are similarly interested in monitoring the HCE between different insurance plans.Therefore, we analyse data from a large Swiss insurance company using Tobit regression to estimate differences in (left-censored) HCE between basic and MC insurance plans, particularly within regions and pharmaceutical cost groups. Further, we attempt to improve the models using a propensity score, the probability of choosing MC insurance and calculate the confidence bands of the resulting differences in HCE between insurance plans from 100 bootstrap replications. To avoid additional bias we excluded covariates that are potentially affected by the MC plan.The estimates that we receive with our models vary significantly between regions. However, in total we obtain lower HCE compared to basic insurace of (with 95% confidence limits) Since it is unknown if the requirements for causal inference are met, our conclusion is that one can not absolutely exclude remaining selection bias from these estimates.
Rainer Ott A Wavelet Packet Transform based Stock Index Prediction Algorithm Hans Rudolf Künsch 
Kilian Vollenweider
Evangelos Kotsalis
Aug-2012
Abstract: In this Master thesis we develop prediction algorithms which optimize a performance measure over a specified set of wavelet packet trees and smoothing parameters. The performance of the algorithms is evaluated for the daily DAX prices from 18th December 2003 to 30th December 2011. Using a quantitative return quality measure with an algorithm based on a delayed version of the discrete wavelet packet transform (DWPT) we achieved to outperform the exponential weighted moving average trend follower. For the same algorithm 3 out of 25 wavelet packet trees were observed to be favourable. Furthermore, the DWPT was found to consistently outperform the discrete wavelet transform, if the Haar basis is used.
Sylvain Robert Sequential Monte Carlo methods for a dynamical model of stock prices Hans Rudolf Künsch 
Didier Sornette
Aug-2012
Abstract: Stock markets often exhibit behaviours that are far from equilibrium, such as bubbles and crashes. The model developed in Yukalov et al. (2009) aims at describing the dynamic of stock prices, and notably the way they deviate from their fundamental value. The present work was interested in estimating the parameters of the model and in filtering the underlying mispricing process. Various Sequential Monte Carlo methods were applied to the problem at hand. In particular, a fully adapted Particle Filter was derived and  showed the best performances.While the filtering was well handled by the different methods, the estimation of the parameters was much more diffcult. Nevertheless, it was possible to identify the market type, which qualitatively describes the dynamic of a stock.The methods were first tested on simulated data before having been applied to the Dow Jones Industrial Average. The latter application led to very interesting results. Indeed, the estimated model provided insight about the underlying dynamic, and the filtering of the mispricing process allowed to shed a new light on some important financial events of the last 40 years
Radu Petru Tanase Learning Causal Structures from Gaussian Structural Equation Models Jonas Peters 
Peter Bühlmann
Aug-2012
Abstract: Traditional algorithms in causal inference assume the Markov and faithfulness conditions and recover the causal structure up to the Markov equivalence class. Recent advances have shown that by using structural equation models it is possible to go even further and in some cases identify the underlying causal DAG from the joint distribution. We focus on an identi_ability result for linear Gaussian SEMs with same noise variances and propose an algorithm that estimates the causal DAG from such models. We evaluate the performance of the algorithm in a simulation study and compare it to the performance of two other existing methods: the PC Algorithm and Greedy Equivalence Search.
Matteo Tanadini Regression with Relationship Matrices using partial Mantel tests Werner Stahel  Jul-2012
Abstract: Relationship matrices and the statistical methods used to analyse them are of growing importance in science because of the increasing number of systems that are represented by networks. Relationship matrices are often used in fields such as Social sciences, Biology or Economics. In the context of multiple linear regression with relationship matrices, partial Mantel tests represent the standard statistical framework for inference. Several approaches of this kind can be found in the literature. In order to evaluate the performance of these methods, a sensible way to simulate datasets is indispensable. Unfortunately, studies conducted so far comparing performance of partial Mantel tests rely on inadequate simulated datasets and are therefore questionable. The goals of this master thesis were to compare the performance (measured as level and power) of widely used partial Mantel tests using state-of-the-art simulation techniques and to describe new implementations with improved performance. In a first phase, we focused on improving the quality of models used to simulate datasets for multiple linear regression with relationship matrices. We were able to propose two convenient procedures for simulating predictors (i.e. relationship matrices). We could also show a more appropriate way to simulate the error term for linear regression with relationship matrices. In a second phase, we described three modi_cations for partial Mantel tests that are supposed to improve performance. The implementation of these improvements in a Rcode will be object of future research. Finally, we compared the performance of three partial Mantel tests using datasets simulated according to our improved technique. The results agree with previous studies and confirm that the method proposed by Freedman & Lane has the best overall performance.
Markus Harlacher Cointegration Based Statistical Arbitrage Sara van de Geer
Markus Kalisch
Jul-2012
Abstract: This thesis analyses a cointegration based statistical arbitrage model. Starting with a brief overview of the topic, a simulation study is carried out that is intended to shed light on the mode of action of such a model and to highlight some potential flaws of the method. The study continues with a back-testing on the US equity market for the time period reaching from 1996 up to 2011. The results of all the different model versions that were tested look quite promising. "Traditional" mean-variance based performance measurements attest the employed cointegration based statistical arbitrage model very good results. The advanced dependence analysis with respect to the returns of the S&P 500 index and the returns obtained from the back-testing shows a very favourable structure and indicates that such a model can provide returns that are only very weekly related to the returns of the S&P 500 index.
Yongsheng Wang Numerical approximations and Goodness-of-fit of Copulas Martin Mächler
Werner Stahel
Jul-2012
Abstract: The author first gives an introduction to copulas and derives Rosenblatt transform of elliptical copulas. To circumvent numerical challenges in estimating the density of Gumbel copula, several approaches are presented. The author finds an algorithm to choose appropriate methods under various conditions. It is obtained by first determining the bit precision when using the benchmark method dsSib.Rmpfr and then conducting a simulation study for comparisons. Then followed by a review of goodness-of-fit methods of copulas including tests based on empirical copulas, Rosenblatt transform, Kendall transform and Hering-Hofert transform. The author conducts a large simulation experiment to investigate the effect of the dimension on the level and power of goodness-of-fit tests for various combinations of null hypothesis copulas and alternative copulas. The results are interpreted via graphs of confidence intervals and power ratios. Also, the relationships among the computational time, dimension, sample size and number of bootstraps are explored. Last, dependence structure of Dow Jones 30 is investigated using graphical goodness-of-fit test under various types of Student-t, Gumbel and Clayton copula families. Student-t copula with unstructured correlation matrix and optimized degree of freedom estimated by maximum likelihood estimator gives the best solution.
Amanda Strong A review of anomaly detection with focus on chnagepoint detection Sara van de Geer
Markus Kalisch
Jul-2012
Abstract: Anomaly detection has the goal of identifying data that is, in some sense, not "normal." The definition of what is anomalous and what is normal is heavily dependent on the application. The unifying factor across applications is that, in general, anomalies occur only rarely. This means that we do not have much information available for modeling the anomaly generating distribution directly. We will describe several ways of approaching anomaly detection and discuss some of the properties of these approaches. Changepoint detection can be considered a subtopic in anomaly detection. Here the problem setting is more specific. We have a sequence of observations and we would like to detect whether their generating distribution has remained stable or has undergone some abrupt change. The goals of a changepoint analysis may include both detecting that a change has occurred as well as estimating the time of the change. We will discuss some of the classic approaches to changepoint detection. As very large datasets become more common, so do the instances in which it is dificult or impossible for humans to heuristically monitor for anomalous observations or events. The development and improvement of anomaly detection methods is therefore of everincreasing importance.
Peter Fabsic Comparing the accuracy of ROC curve estimation methods Peter Bühlmann Jul-2012
Abstract: The aim of this study is to compare the accuracy of commonly used ROC curve estimation methods. The following ROC curve estimators were compared: empirical, parametric, binormal, "log-concave" together with its smoothed version (as introduced in Rufibach (2011)), and the estimator based on kernel smoothing. Two simulations were carried out, each assessing the performance of the estimators in a range of scenarios. In each scenario we simulated data from known distributions and computed the true and the estimated ROC curves. Using various measures we assessed how close the estimates were to the true curve. In the first simulation, a large sample size was used to compute the estimated ROC curves. A substantially smaller sample size was used in the other simulation. The "log-concave estimator"was found to perform the best when a large sample was available. On the other hand, the estimator based on kernel smoothing outperformed all other competitors in the simulation with the small sample size.
Edgar Alan Muro Jimenez About Statistical Learning Theory and Online Convex Optimization Sara van de Geer
Jul-2012
Abstract: This work is divided in two parts: in the first block, we present a relationship between an empirical process and the minimax regret of a game from Prediction with Expert Advice (PWEA). We use this expression to show how the lower bound of a PWEA minimax regret can give us some information about the form of the experts class being used, in particular, whether it is a VC class or not. In the second block, we analyse from a theoretical point of view the similarities of the performance of algorithms from Statistical Learning, PWEA, and Online Convex Optimization (OCO). We present results for the three methods that show us that the rate of decay of the prediction error depends on the curvature of the loss function over the space of the predictor's choices. In addition, we provide Theorems for Statistical Learning and OCO showing that similar lower bounds for their regrets can be obtained assumming that the minimizer of the expected loss in not unique. This provides more evidence on the resemblance between the performance of algorithms from Statistical Learning and OCO. Finally, we show that any PWEA game can be seen as an special case of an OCO game. Even though this represents an advantage for finding upper bounds for PWEA, we present an example where the upper bounds for the regret originally created for OCO are not better than those found for PWEA
Elena Fattorini Estimating the direction of the causal effects for observational data Marloes Maathuis Jul-2012
Abstract: In many scientific studies, causal relationships are of crucial importance. Unfortunately, it is not possible, without making some assumptions, to calculate the causal effects only with observational data. In this thesis, the observational data are assumed to be generated from an unknown directed acyclic graph (DAG). Under such a model, bounds on causal effect can be computed with the approach of Maathuis, Kalisch, and Bühlmann (2009). The idea behind this approach is as follows. First, one tries to estimate the DAG that generated the data and then one computes the causal effects for the obtained DAG. However, under our assumptions we can generally only identify an equivalence class of DAGs that are compatible with the data. Due to the existence of these different possible generating DAGs, the causal effect from a variable X to a variable Y can not always be identified uniquely. However, one can identify the causal effect for each DAG in the equivalence class, and collect all these effects in a multisets. These multisets can be summarized using summary measures. For example in the paper of Maathuis et al. (2009) the minimum absolute values is used as a summary measure. That gives a lower bound on the size of the causal effect. In this thesis, we focus on the problem of how to derive the sign of the causal effects. Clearly, the minimum absolute value is not appropriate for this purpose. Eight new summary measures are proposed and simulation studies are performed to detect the summary measure that best detects the largest positive causal effects among a set of given variables. The summary measures are compared using averaged ROC curves. The maximum and the mean results to be the best summary measures. In the estimated graphs it occurs that some edges are directed in the wrong direction. A large positive causal effect can be estimated as zero due to a wrong directed arrow. Therefore, in order to detect all the largest positive causal effects, one should also investigate the effects which are estimated as zero.
David Schönholzer Geostatistische Kartierung der Waldbodenversauerung im Kanton Zürich Andreas Papritz
Hans Rudolf Künsch 
Jul-2012
Abstract: Im Rahmen des erhöhten Bewusstseins der zunehmenden Versauerung der Waldböden in der Schweiz und im Kanton Zürich kartiert diese Arbeit erstmals annähernd flächendeckend den Versauerungsgrad der Waldböden im Kanton Zürich anhand des geschätzten pH-Werts im Oberboden. Dazu werden eine Reihe von nationalen und kantonalen Datensätzen über Bodenversauerung, Klima, Vegetation, Topo- graphie und Geologie verarbeitet und zur statistischen Schätzung der Bodenversauerung verwendet. Um einigen Schwierigkeiten der statistischen Schätzung umweltnaturwissenschaftlicher Messgrössen zu begegnen, wird eine Kombination verschiedener statistischer Methoden eingesetzt, insbesondere der Geostatistik und der robusten Statistik.
Myriam Riek Towards Consistency of the PC-Algorithm for Categorical Data in High-Dimensional Settings Marloes Maathuis Jun-2012
Abstract: The PC-algorithm is an algorithm used to learn about or estimate the causal structure among a causally sufficient set V of random variables from data. Under the assumption of faithfulness, the PC-algorithm yields an estimate of the graph representing the Markov equivalence class of causal structures over V that are compatible with the probability distribution defined over V. Consistency of an estimator is a crucial property. It has been proven to hold for the PC-algorithm applied to multivariate normal data in high-dimensional settings where the number of variables is increasing with sample size, under some conditions ([10]). In this master thesis, an attempt was made to prove consistency of the PC-algorithm applied to categorical data in low- and high-dimensional settings.
Stephan Hemri Calibrating multi-model runo_off predictions for a head catchment using Bayesian model averaging Hans Rudolf Künsch 
Felix Fundel
May-2012
Abstract: One approach to quantify uncertainty in hydrological rainfall-runoff modeling is using meteorological ensemble prediction systems as input for a hydrological model. Such ensemble forecasts consist of a possibly large number of deterministic forecasts. Uncertainty is given by their spread. As such ensemble forecasts are often under-dispersed, biased and do not account for other sources of uncertainty, like the hydrological model formulations, statistical post-processing needs to be applied to achieve sharp and calibrated predictions. In this thesis post-processing of runoff forecasts from summer 2007 to the end of 2009 for the river Alp in Switzerland is done by applying Bayesian model averaging (BMA). A total of 68 ensemble members coming from a deterministic and two ensemble forecasts are used as input for BMA. These forecasts cover different lead-times from 1h to 240h. First, BMA based on univariate normal and inverse gamma distributions is performed under the assumption of independence between lead-times. Then, the independence assumption is relaxed in order to simultaneously estimate multivariate runoff forecasts over the entire range of lead-times. This approach is based on a BMA version that uses multivariate normal distributions. Since river discharge follows a highly skewed distribution, Box-Cox transformation is applied in order to achieve approximate normality. Back-transformation combined with data quality leads in some cases to too high predicted probabilities of extremely high runoffs. Using the inverse gamma distribution, instead, cannot remove this problem, neither. Nevertheless, both, the univariate and multivariate, BMA approaches are able to generate well calibrated forecasts that are considerably sharper than the climatology
Linda Staub On the Statistical Analysis of Support Vector Machines Sara van de Geer
Mar-2012
Abstract: We analyze Support Vector Machines from a theoretical and computational point of view by explaining every building block of this algorithm separately, where we mainly restrict ourselves to binary classification. We start with loss functions and risks and then make a digression to the theory of kernel functions and their Reproducing Kernel Hilbert Spaces. We are then ready to perform the statistical analysis, where we assume in a first part the data to be independent and identically distributed. This analysis aims to investigate under which conditions on the regularization sequence the method is consistent and, more interestingly, to find the optimal learning rate and a way of nearly reaching it. We thereby explain the results given in [21] and add the missing proofs. Next, we briefly discuss the computational aspects of support vector machines, where we show that numerically the problem is reduced to solving a finite dimensional convex program. Subsequently, we explain how to use support vector machines in practice by applying the R function svm() from the package e1071 to independent and identically distributed data. We then slightly violate this assumption and generate data of a GARCH process which naturally carries a dependence structure and observe that the algorithm still produces good results for this kind of data. We finally find the theoretical explanation for this by performing a statistical analysis of support vector machines for weakly dependent data following the work of [22].
Christian Haas Analysis of market efficiency: Post earnings drift in Swiss stock prices   Peter Bühlmann Mar-2012
Abstract: The study of stock market behaviour and market efficiency is a very active topic within probability theory and statistics. Market models and their implications have recently been in focus of not only the mathematical and economic community. In this thesis, we take a look at some market models and studies of market efficiency. We therefore establish the theory behind efficiency and regression.In chapter 7 we then study the post earnings drift for Swiss stocks. We find a significant intraday drift in direction of the first reaction after the earnings release. We then look at a strategy, using our result, and try to answer whether we found market inefficiency or not.
Ana Teresa Yanes Musetti Clustering methods for financial time series Martin Mächler
Werner Stahel
Mar-2012
Abstract: The purpose of this thesis is to study a set of companies from the S&P 100 and determine whether share closing prices that move together correspond to companies belonging to the same economic sector. To verify this, different clustering methods were applied to a dis­similarity matrix corresponding to the degree of dependence between the companies. Since financial data does not exhibit multivariate normal distribution, applying non-parametric dependence measures was needed. For this, the theory of the Hoeffding’s D, Kendall’s τ and Spearman’s ρ was reviewed. Then, in order to choose the best clustering solution, a set of validation statistics were applied. To compare in advance the performance of the different clustering methods and the validation statistics under different circumstances re­garding the overlapping level of the clusters, two simulation studies were carried out. The first simulation based on correlation matrices computed from covariance matrix samples from a Wishart distribution and the second one based on models with Gaussian mixture distributions. This study showed that the transformation of the data, either from de­pendence measures or distances to (dis)similarities, has an impact in the performance of the clustering methods. Additionally, regarding the validation statistics, in the simulation studies some of these statistics showed a poor performance in extreme scenarios, where the clusters are very well separated. Finally, when the companies belonging to the S&P 100 were clustered, the method PAM applied to the corresponding dissimilarity matrix estimated with the Hoeffding’s D gave the best solution compared to the clustering meth­ods AGNES, DIANA and DSC, which agreed with the results from the simulation studies and the reviewed theory.
Lukas Patrick Abegg Analysis of market risk models Werner Stahel
Evangelos Kotsalis
Lukas Wehinger
Mar-2012
Abstract: In this thesis, risk models are evaluated in a joint project with the swissQuant GroupAG and a major Swiss bank. Risk models with high complexity, e.g. based on GARCHmodels and different distribution assumptions, as well as simpler models, e.g. based on EWMA models and normal distributions, are assessed and compared for weekly data. The out-of-sample results are assessed graphically and the evaluation is performed with statistical tests applied to large scale data. At the 95% confidence level, the quality of the Value-at-Risk estimates under the simple and complex models are assessed to be similar. If the Expected Shortfall and the Value-at-Risk at higher confidence levels are considered, however, the sophisticated methods improve the risk estimates. A risk model based on copulas, GARCH models and non-parametric distribution estimates is developed additionally and found to outperform the risk models provided.
Martina Albers Boundary Estimation of Densities with Bounded Support Geurt Jongbloed
Marloes Maathuis
Mar-2012
Abstract: When estimating a density supported on a bounded or semi-infinite interval by the kernel density estimation, problems may arise at the boundary. In the past, many variations of the 'standard' kernel density estimator have been developed to achieve boundary corrections.Smooth estimates of the distribution and density functions have recently been derived for current status censored data. This topic is closely related to kernel density estimation and the mentioned boundary problem can also appear in this context.In this Master's thesis some boundary corrections were combined with the smooth distribution estimates for current status censored data. Simulations to analyze the performance of these new constructions were carried out using R.
Alexandros Gekenidis Learning Causal Models from Binary International Data Peter Bühlmann Mar-2012
Abstract: The goal of this thesis is to provide and test a method for causal inference from binary data. To this end, we first introduce the mathematical tools for describing causal relationships between random variables, such as directed acyclic graphs (DAGs for short), in which the random variables are represented by vertices whereas edges stand for causal influences. A DAG can, however, only be identified up to Markov equivalence which roughly means that one can estimate its skeleton, but not the direction of most of the edges. This can be improved by performing interventions, i.e. by forcing a certain value upon one or several random variables and observing the change in the values of the other factors to obtain additional data. The resulting Markov equivalence classes form a finer partitioning of the space of DAGs than the non-interventional ones, thus improving the estimation possibilities. Based upon this theory we will adapt the existing Greedy Interventional Equivalence Search algorithm (GIES, [1]) to the case of binary random variables and test it on simulated data
Eszter Ilona Lohn Estimating the clinical score of coma patients - a comparison of model selection methods Werner Stahel
Markus Kalisch
Mar-2012
Abstract: The aim of this thesis is to explore the possibility of estimating coma patients' clinical awareness score by objective clinical measurements, in order to substitute the rather subjective doctors' examination which is expensive and time consuming. A comparison is made on variable selection and model fitting methods by cross-validation. The basic analysis is extended towards block subset analysis, alternative cross-validation schemes and analyzing the dynamics of the clinical score. As only a small sample is available, the phenomenon of over fitting is a serious concern throughout the analysis, which is seen through the difference of in-sample and cross-validated model fits. In general we observe that low-variance (higher bias) methods perform better on this sample size. In the end it is concluded that based on this sample the clinical measurements contain little information about the clinical awareness score.
Lisa Borsi Estimating the causal effect of switching to second-line antiretroviral HIV treatment using G-computation Marloes Maathuis
Markus Kalisch
Thomas Gsponer
Mar-2012
Abstract: Understanding causal effects between exposure and outcome is of great interest in many fields. In this work, the causal effect of switching to second-line antiretroviral treatment on death is estimated for a study population including HIV-infected patients experiencing immunological failure in Southern Africa (Zambia and Malawi). CD4 cell count is con­sidered as a time-varying confounder of treatment switching and death, while it is itself affected by previous treatment. Given the impossibility to conduct a randomised exper­iment, we address the problem of time-varying confounding by G-computation. Under certain conditions, G-computation yields consistent estimates of the causal effect by sim­ulating what would happen to the study population if treatment is set to a certain regime by intervention. In our analysis we compare intervention “always switch to second-line treatment” to intervention “always remain on first-line treatment”. We find the resulting risk ratio to be 0.24 (95% CI 0.14-0.33), emphasizing that the risk of dying is smaller in the population that switched to second-line treatment than in the population that stayed on first-line treatment. Thus, we conclude that there is a beneficial causal effect of switching to second-line treatment among HIV-patients experiencing immunological failure.
Gardar Sveinbjoernsson Practical aspects of causal inference from observational data Peter Bühlmann Mar-2012
Abstract: In this thesis we study methods to infer causal relationships from observational data. Under some assumptions causal effects can be estimated using Pearl’s intervention calculus provided that the data is supplemented with a known causal influence diagram. We study the IDA algorithm which estimates the equivalence class of this diagram and uses the intervention calculus to get a lower bound on the size of the causal effects. Since it can be a difficult task to discover structure, especially in high dimensional setting, we combine the IDA algorithm with stability selection, a subsampling method to select the most stable causal effects. In hope for improvement we verify our results on a data set where the true causal effects are known from experiments. We also investigate the robustness of our method with a simulation study where we look at violations of assumptions.
Simon Kunz Simulated Maximum Likelihood Estimation of the Parameters of Stochastic Differential Equations Lukas Meier
Werner Stahel
Mar-2012
Abstract:
Marcel Freisem Estimating rating transition probabilities and their dependence on macroeconomic conditions for a bank loan portfolio Peter Bühlmann Feb-2012
Abstract:
Tulasi Agnihotram Statistical Analysis of Target SNPs and their Association with Phenotypes Peter Bühlmann
Markus Kalisch
Feb-2012
Abstract: Genomics is not only influencing the field of medicine, but also distantly related fields such as behavioural sciences, economics etc. The primary goal of this thesis is to investigate relation between the genome represented by SNPs and the behavioural characteristics (such as risk aversion) of an individual, using supervised learning techniques. Human genome has 23 chromosomes, which contains information on millions of SNPs. Applying supervised learning techniques on millions of SNPs is difficult and may not be efficient.To simplify the analysis we select Target SNPs, which can represent all the surrounding SNPs. Target SNPs can be found by linkage disequilibrium with our modified Carlson's algorithm. By applying random forest (a supervised learning technique) on genotype data at Target SNPs as predictors and the categorized phenotype data as response vector, the error rates obtained corresponding to each phenotype were not informative.By using heuristic approach we select Best SNPs from all SNPs on chromosomes according to their rank correlation with phenotype. With the test data at Best SNPs as predictors and the categorized test data of phenotype as response vector, error rate of random forest did not suggest relationship between genotype and phenotype. Furthermore we apply this procedure on random SNPs and compare the results with the results of Target SNPs, Best SNPs and provide directions for future work.

2011

Student Title Advisor(s) Date
Sung Won Kim Study on Empirical Process, based on Empirical Process Theory and Applications in Nonparametric Statistics Sara van de Geer Dec-2011
Abstract: Any estimator is a function of empirical measure, while what we want to estimate is a  function of theoretical measure. Then to justify our estimator we want to see that the estimator, a function of empirical measure, converges to the parameter, a function of theoretical measure, as the sample size grows. In general, however, the function to be measured is unknown, and one wants to see simultaneous convergence of the class of all possible functions. Thus, we present uniform laws of large numbers to show empirical meausure on class of functions converges to theoretical measure on that. To show it one requires entropy condition on the class, which ensures the proper size of the class of functions to be estimated, and condition of finite envelope, where the envelope is the supremum of the class of functions. Furthermore, we address uniform central limit theorem which gives the information how well the empirical measure converges to theoretical measure. If one can show the equicontinuity of the empirical process, indexed by the class of functions, and if this indexing class is totally bounded, then the class of functions is P-Donsker, equivalently the process satisfies the uniform central limit theorem. That is, the empirical process converges to Gaussian distribution. Equicontinuity, derived for showing P-Donsker, will open the way to deduce the rate of convergence, in our case, of least squares estimators. Therefore, as an application, we derive the rate of convergences of the least squares estimators for different classes of functions. Also, we consider the rate of convergences of the least squares estimators when the penalty is imposed for the complexity of the class of models. Even if one is not aware of the optimal model in the class, the proper choice of penalty would allow one to attain the optimal rate of convergence, as if one knows the optimal model. As the applications of uniform law of large numbers and uniform central limit theorem, convergence and normality of M-estimator are introduced, as well. There, one can see how empirical process theory is applied on the way to proving those properties. Furthermore, in order to see whether a class satisfies ULLN or UCLT, it is convenient to use Vapnik Chervonenkis index, VC index. Vapnik-Chervonenkis class, whose VC is finite, satisfy both ULLN and UCLT with envelope condition, and it would play a role in empirical process
Andre Meichtry Back pain and depression across 11 years Analysis of Swiss Household Panel data Werner Stahel
Thomas Läubli 
Dec-2011
Abstract: Design and objective: In this longitudinal retrospective cohort study, we analysed back pain and depression data across 11 years in the general population of Switzerland. The main objective was to investigate the association between back pain and depression. Methods: We used data from the Swiss Household Panel. 7799 individuals (aged 13- 93, mean 42.9 years, 56.2% women) were interviewed between 1999 and 2009. Observed depression and back pain were described across 11 years. Missingness was assumed to be independent of unobserved data. We estimated marginal structural models using inverse-probability-of-exposure-and-censoring weights to assess the (causal) association between back pain history and depression. Correlated data was analysed by fitting marginal and transition models with generalised estimating equations yielding robust sandwich variance estimates. Results: Cross-sectional analysis adjusting for other time-fixed covariates showed that back pain was associated with a 42% increase in the odds of depression over time. The association of continuous past back pain up to time t−1 with depression at time t was 0.65 on a linear logistic scale (95% CI: 0.48-0.82), corresponding to a 92% (62-127%) increase in the odds of depression. Assuming a causal model accounting for confounded back pain by past depression, a marginal structural model (inverse-probability-of-exposure-and-censoring weighted model) regressing depression on past back pain showed an association of 0.63 (0.44-0.81) on a linear logistic scale, corresponding to a 87% (55-126%) increase in the odds of depression. Expressing exposure history by cumulative back pain up to time t-1, marginal structural model estimated a causal effect on depression at time t that increased with age at baseline and decreased for individuals with depression at baseline. Conclusion: Marginal structural models are well suited for the analysis of observational longitudinal data with time-dependant potential causes of depression, however,  marginal structural models do not address all issues of causal inference. Back pain history is one of many possible causes of depression. Future work must collect more socio-economic and health-related covariates, investigate possible non-ignorable missing and investigate other functions of back pain history.
Jongkil Kim Heavy Tails and Self Similarity of Wind Turbulence Data (corrected version July 2012) Hans Rudolf Künsch Nov-2011
Abstract: In this thesis, we perform the statistical analysis in order to figure out the characteristics of wind turbulence. We estimate the pdf of the increments of wind velocities which have heavy tails. Also, we estimates the autocovariances and the autocorrelations of the increments of wind velocities by revealing their second order properties for the purpose of showing Self Similarity. Parsimonious properties of wind turbulence are discussed by the estimated parameters. With reasonable assumptions, the relations between lag of wind increments and estimated parameters are suggested. Also, interpretations of the result are explained. In addition, the dependency between the wind increments and mean velocities are also discussed. Non-parametric tests are perform the whether the dependency exists between the increments of wind velocities and block mean velocities. Also, the dependency of two consecutive increments on the block means velocities are researched. Key words: Wind turbulence, Generalized Hyperbolic distribution, Normal Inverse Gaussian distribution, Self-similarity
Evgenia Ageeva Bayesian Inference for Multivariate t Copulas Modeling Financial Market Risk Martin Mächler
Peter Bühlmann
Sep-2011
Abstract: The main objective of this thesis is to develop a Markov chain Monte Carlo (MCMC) method under the Bayesian inference framework for estimating meta-t copula functions for modeling financial market risks. The complete posterior distribution of the copula parameters resulting from Bayesian MCMC allows further analysis such as calculating the risk measures that incorporate the parameter uncertainty. The simulation study of the fictitious and real equity portfolio returns shows that the parameter uncertainty tends to increase the risk measures, such as the Value-at-Risk and the Expected Shortfall of the profit-and-loss distribution.
Emmanuel Payebto Zoua Subsampling estimates of the Lasso distribution. Peter Bühlmann  Sep-2011
Abstract: We investigate possibilities offered by subsampling to etimate the distribution of the Lasso estimator and construct confidence intervals/hypothesis tests. Despite being inferior to the bootstrap in terms of higher-order accuracy in situations where the later is consistent,subsampling offers the advantage to work under very weak assumptions.Thus, building upon Knight and Fu (2000), we first study the asymptotics of the Lasso estimator in a low dimensional setting and prove that under an orthogonal design assumption, the finite sample component distributions converge to a limit in a mode allowing for consistency of subsampling confidence intervals. We give hints that this result holds in greater generality. In a high dimensional setting, we study the adaptive Lasso under assumption of partial orthogonality introduced by Huang, Ma and Zhang (2008) and use the partial oracle result in distribution to argue that subsampling should provide valid confidence intervals for nonzero parameters. Simulations studies confirm the validity of subsampling to construct confidence intervals, tests for null hypotheses and control the FWER through subsampled p-values in a low dimensional setting. In the high dimensional setting, confidence intervals for nonzero coefficients are slightly anticonservative and false positive rates are shown to be conservative.
Hesam Montazeri Nonparametric Density and Mode Estimation for Bounded Data Rita Ghosh
Werner Stahel
Aug-2011
Abstract: This thesis investigates the performances of various estimators in density estimation and mode estimation for bounded data. It is shown that many nonparametric estimators have boundary bias when the support of true probability density function has a compact support. Because the boundary region might be a large percentage of the whole support, boundary bias problem could be very serious in many complex and real-world applications. The widely accepted method for boundary bias correction in regression and density estimation is Automatic Boundary Correction [1]. This method is based on local polynomial fitting and no explicit correction for boundary effects is needed in this method. In the first part of this thesis, we consider applications of this method and Parzen's method in density estimation of some bounded univariate and bivariate data examples. It is shown that the local polynomial based method has no significant boundary bias in the considered examples. In addition, we also give a new formula for the asymptotic bias of the density estimate based on local polynomial fitting which includes the bin width parameter. In the second part of this thesis, we consider mode estimation and several methods are examined for bounded data. We show that many nonparametric mode estimation methods have boundary bias if the true global mode is located in boundary region. Among the considered methods, mode estimation based on local polynomial shows to have superior performance and it does not seem to have considerable boundary bias problem.
Xiaobei Zhou Prediction Models for Serious Outcome and Death in Patients with Non-specific Complaints Presenting to the Emergency Department Werner Stahel
Markus Kalisch
Aug-2011
Abstract: This paper is based on the Basel Non-specific Complaints (BANC) by Nemec, Koller, Nickel, Maile, Winterhalder, Karrer, Laifer, and Bingisser [2010]. Nonspecific complaints (NSCs) are very common in emergency departments (EDs). How­ever, when treating the patients with NSCs, emergency physicians have rarely experience. My research mainly focuses on the outcome variables a serious condition (o ser) and death in ED patients with NSCs. My primary goal is to find a set of methods (classifiers) which classify with high accuracies for o ser and death. Moreover, we try to find a series of risk factors (explanatory variables) which are highly correlated with the outcome variables. We do not find a classifier that clearly outperforms all others in all aspects. Random-Forest, Logistic-Regression and Adaboost turn out to be favorable according to different criteria. We find that dealing with missing values using imputation increases classification performance. Finally, we discuss SMOTE as an interesting but not fully satisfy method for dealing with highly unbalanced data.
Marc Lickes Portfolio optimization if parameters are estimated Hans Rudolf Künsch Aug-2011
Abstract: In the following we discuss the effect of parameter estimation in the context of mean variance portfolio optimizations. We compare the efficient frontier under a certainty equivalent approach and Bayesian predictive posterior distribution. We will show that the sample estimators lead to a risk underestimation and we will provide corrected estimators. In addition we will relax the assumption of identical returns and introduce dynamic linear models for time varying mean and covariance matrices. This study will conclude by analysing the performance of those estimators on a simulated multivariate normal data set and on a sample set of returns drawn from either the Dow Jones 30 or S&P500.
David Lamparter Stability Selection for Error Control in High-Dimensional Regression Peter Bühlmann Aug-2011
Abstract: In the recent past, the development of statistical methods for high-dimensional problems has greatly advanced leading to methods for model selection such as the lasso. However, the question of error control in high-dimensional settings has proven to be difficult. Recently, an approach called stability selection has been proposed to tackle the problem. It combines a method for model selection and subsampling to deliver a form of error control. In this thesis, some variants of stability selection are introduced. It was tested if error control would actually hold up. Furhermore, some conditions were isolated where using these variants might have beneficial effects.
Marco Läubli Particle Markov Chain Monte Carlo for Partially Observed Markov Jump Processes Hans Rudolf Künsch Aug-2011
Abstract: The goal of the thesis was to investigate, understand and implement the so called particle Markov chain Monte Carlo (PMCMC) algorithms introduced by Andrieu, Doucet, and Holenstein (2010) and to compare them to classical MCMC algorithms. The PMCMC algorithms are introduced in the framework of state space models. Their key idea is to use sequential Monte Carlo (SMC) algorithms to construct efficient highdimensional proposals for MCMC algorithms. The performance of the algorithms is examined on a simple birth-death process in discrete time as well as on the stochastic Oregonator, an idealized model of the Belousov-Zhabotinskii non-linear chemical oscillator. In summary it can be said that the PMCMC algorithms produce satisfactory results even when using only standard components and they require comparably little problem-specific design effort from the user's side. On the other hand it must be mentioned that the computational effort, compared to classical methods, is tremendous and a serious drawback.
Christian Sbardella High dimensional regression and survival models Peter Bühlmann
Patric Müller
Aug-2011
Abstract: In the high-dimensional regression we have too many parameters relative to the number of observations and then we can have the problem of the overfitting. A method to solve this problem is to use the Lasso (Least Absolute Shrinkage and Selection Operator) to estimate the regression's coefficients. This estimator has become very popular because, among other properties, it does variable selection, in the sense that some estimated coefficients are equal to zero.We study the Lasso estimator proving its consistency and finding an oracle inequality in the case of squared error loss.In this thesis we also talk about survival analysis: this branch of the statistic studies the failure times of an individual (or of a group of individuals) to conclude if for example a new treatment is effective, or if a certain group of individuals has more survival probability than another. We mainly focus on the Cox Proportional Hazard model and the Weibull Proportional Hazard model.A natural question is: "Can we use the theory of the Lasso estimator in the survival analysis?"We try to answer this question in the last chapter of this thesis (Chapter 5).
Alexandra Federer Estimating networks using mutual information Marloes Maathuis
Markus Kalisch
Jul-2011
Abstract: Identifying the relations between variables of a dataset and visualize these relationships in an independence network is important in many applications. We use the concepts of entropy and mutual information to estimate  the dependency between two random variables. An advantage of this method in comparison to a correlation test is that mutual information measures also non-linear dependency. To estimate the correlation graph of a dataset, we construct a statistical test for zero mutual information. We analyze the performance of this method compared with the well-known method of estimating the correlation graph by defining a threshold for the mutual information regarding to ROC-curves.
Oliver Burkhard The Effect of Managed Care Models on Health Care Expenditure Marloes Maathuis
Markus Kalisch
Jul-2011
Abstract: In this thesis we want to estimate the cost reduction effects by managed care plans that were introduced in Swiss health insurance in 1996. Those plans limit the free choice of health care provider and come with reduced premiums. The data comes from one insurer and the years 1997-2000.The challenge we face comes from the unobserved health of the insured. It can have an influence on both the choice of managed care plan and on the costs caused. We tackle the problem by generating an estimate of an auxiliary variable "latent health'" using Tobit regression which allows us to estimate the causal effect of managed care plans on costs using a Two Part model. We then look at different possibilities to improve the results.We find that  the total effect of managed care consists of a part that can be explained through the auxiliary variable and a part that cannot, indicating true cost reduction effects by the managed care models.
Niels Hagenbuch A Comparison of Four Methods to Analyse a Non-Linear Mixed-Effects Model Using Simulated Pharmacokinetic Data Martin Mächler
Werner Stahel
Jun-2011
Abstract: Our study characterizes the behaviour of four different methods to estimate a non-linear  mixed-effects model in R . Three methods used a closed-form analytical solution of a  system of ordinary differential equations (ODEs), the fourth method used the system of  ODEs directly. The three methods were nlme() from the package of the same name,  nlmer() from package lme4a and nlmer() from package lme4. For the ODEs, we used  nlmeODE() along with nlme(). The two methods using nlme() do not differ much in their estimates. Non-convergences occurred. lme4a and lme4 provide fast and reliable (in terms  convergence) routines nlmer() which have shortcomings as well: fixed-effects parameters’  standard errors are over- or  underestimated, inconsistently across the parameters; the estimation of the standard deviations of the random effects does not always profit from an  increase in observations. The results across three simulations reveal unpredictable patterns  of the estimators of lme4a and lme4 considering coverage ratios, bias and standard error  as functions of number of observations. A limitation of this study is its limited number of simulation runs (250).
Stephanie Werren Pseudo-Likelihood Methods for the Analysis of Interval Censored Data Marloes Maathuis Mar-2011
Abstract: We study the work of Sen and Banerjee (2007), focusing on their method based on apseudo-likelihood-ratio statistic to obtain point-wise confidence intervals for null hypotheses on the distribution function of the survival time in a mixed-case interval censoring model. Mixed-case interval censored data arises naturally in clinical trials and a variety of other applied fields. The setting of such a model is one where n independent individuals are under study and each individual is observed a random number of times at possibly different observation time-points. At each observation time it is recorded whether an event happened or not and one is interested in estimating the distribution function of the time to such an event, also called failure. However, the time to failure cannot be observed directly, but is subject to interval censoring. That is, one only obtains the information whether failure occurred between two successive observation time-points or not.We extend the results from Sen and Banerjee (2007) to mixed-case interval censored data with competing risks. This is data, where the failure is caused from one of R risks, where R ∈ N is fixed. We define a naive pseudo-likelihood estimator for the distribution function of the event that the system failed from risk r for each r = 1, 2, . . . ,R, analogous to Jewell, Van der Laan, and Henneman (2003). We prove consistency and the asymptotic limit distribution of the naive estimators and present a method to draw point-wise confidence intervals for these sub-distribution functions based on the pseudo-likelihood-ratio statistic introduced by Sen and Banerjee (2007).
Karin Peter Marginal Structural Models and Causal Inference Marloes Maathuis Feb-2011
Abstract: We analyze data of an observational treatment study of HIV patients in Africa, collected by the Institute for Social and Preventive Medicine (ISPM) in Bern. In particular, we focus on patients who received frst-line treatment and experienced immunologic failure, where immunologic failure might be an indication that the current treatment is no longer effective. Some of these patients were switched to a second-line treatment, according to the decision of their doctor (i.e. non-randomized). Based on these data, we are interested in estimating the causal effect of the switch to second-line treatment on survival. The data contain information on the treatment regime and the CD4 counts of the patients, where both of these are time dependent. A main challenge in the analysis is the CD4 count, which indicates how well the immune system is working. The CD4 count may influence future treatment and survival, making it a confounder that one should control for. On the other hand, the CD4 count is likely to be influenced by past treatment, making it an intermediate variable that one should not control for. We address this problem by using marginal structural models. Conceptually, this method weighs each data point by its inverse probability of treatment weight (IPTW), creating data of an unconfounded pseudo-population. Our results indicate that switching to second-line treatment is beneficial, and slightly more so than an analysis with classical methods would imply.
Reto Bürgin Pain after an intensive care unit stay Werner Stahel
Marianne Müller
Feb-2011
Abstract: The present study examines pain occurring within twelve months after an intensive care unit (ICU) stay by focussing on three aspects: i) Which variables relate to pain after an ICU stay? ii) Which is the longitudinal association of ICU-related variables and pain? And iii) do former ICU patients suffer more severe pain than comparable people who haven’t been in ICU recently?The first two aspects are examined with statistical analyses of data of 149 former ICU patients: Whilst these data contain three repeated pain measurements per patient - immediately after as well as six and twelve months after the ICU stay - the provided explanatory variables are physiological-, emotional- and sociodemographic-related and were measured before, during and after the ICU stay. The third aspect is examined by using additional data of a control group of 153 subjects.Concerning the first aspect, stepwise regression model selections have identified gender, pain before the ICU stay, four ICU-related variables, agitation and other illnesses as to be useful explanatory variables for pain after an ICU stay. Moreover, anxiety before the ICU stay and the length of stay in the ICU have shown significant associations too.The second aspect, the longitudinal study was examined by the use of a repeated measurement regression model. This model has shown a significant association between ICU-related variables and pain, both six and twelve months after the ICU stay (p-values: 0.005 and 0.025). Whilst the significance of these associations tends to decrease with the time that has elapsed since the ICU stay, the effect of variables which are not directly ICU-related, particularly that of pain before the ICU stay, tends to increase.The third aspect was again analysed with a repeated measurement regression model. This model has demonstrated that ICU patients tend to suffer more severe pain than the subjects of the control group. However, this difference decreases as time passes from the initial ICU stay. As a result, twelve months after the ICU stay, the difference is no longer significant (p-value: 0.3).Finally, the identification of explanatory variables for pain turned out to be the principal challenge of this study. As the discovered explanatory variables are indicators which leave room for interpretation, both an extended discussion of the study results - also with experts from medical sciences - as well as their comparison with similar studies were essential
Weilian Shi Distribution of Realized Volatility of Long Financial Time Series Werner Stahel
Dr. Michel Dacorogna
Feb-2011
Abstract: Insurance companies face a difficult situation as the regulators ask for the same level of solvency during the crisis [Zumbach et al., 2000]. This master thesis focuses on the log returns and volatilities of very long financial time series. We investigate the distributions and tail behaviors of both log return and volatilities, where the Hill estimator is used for the tail index estimation of the volatility distribution. Taking the definition that a crisis occurs when the GDP consecutive drops for two quarters, the financial crisis has been identified as the biggest crisis after the Second World War. A linear regression model is conducted to analyze the connection between realized volatilities and the GDP log return before and after 1947, respectively. The negative correlation between them suggests that the volatility has the tendency to increase when the economy is experiencing a recession.

2010

Student Title Advisor(s) Date
Alain Helfenstein Forecasting OD-Path Booking Data for Airline Revenue Management using a Random Forest Approach  Peter Bühlmann Aug-2010
Abstract: A main issue of an airline's revenue management is to calculate an accurate forecast of the future demand of bookings. Poor estimates of demand lead to inadequate inventory controls and sub-optimal revenue performance. Within this thesis we describe the structure of booking data within the airline industry that needs to be forecasted and discuss the current bayesian forecasting model implemented by Swiss Revenue Management.We then implement new forecasting models using different random forest (regression) approaches and discuss the accuracy of the predicted demand of all models. As a further result we will illustrate how an implementation of a regression using the random forest algorithm can fail.
Fabio Valeri Sample Size Calculation for Malaria Vaccine Trials with Attributable Morbidity as Outcome  Marloes Maathuis Aug-2010
Abstract: Malaria is a ma jor public health issue. Big efforts have been put into research to develop a vaccine against malaria. Problems arises in estimating vaccine efficacy. Standard methods as the cutoff method and logistic regression may have biased efficacy estimates. An alternative approach which avoid bias is to apply a Bayes latent class model to estimate attributable risk. One problem using this probabilistic approach is that it is not clear how big a trial would need to be in order to have comparable power to that of the cutoff method. To assess the size of a trial using this approach a hypothetical parasite density of a population has been constructed based on a latent class model and some other constraints. Samples have been drawn from these true values, measurement errors simulated and vaccine efficacy estimated. This has been done for three different vaccine type mechanism. For the vaccine we considered, to get a power of 80% the probabilistic method needs 3 to 12 times more individuals as in the cutoff method. Whereas the probabilistic has no biased efficacy estimates, two vaccine types have large or very large bias. If vaccine type is not well defined standard methods to estimate vaccine efficacy could produce large biased estimates which can result in a rejection of the vaccine. The probabilistic approach would avoid bias but due to larger size for the same power the costs will be higher.
Doriano Regazzi The Lasso for Linear Models with within Group Structure  Sara van de Geer  Aug-2010
Abstract: In an high dimensional regression model, we consider the problem ofestimating a grouped parameter vector. We assume there is within groupstructure, in the sense that the ordering of the variables within groups ex-presses their relevance. In this setting, we study two group lasso methods:the structured group lasso and the weighted group lasso. Our work consistsin the implementation of these two methods in R. First, we prove the con-vergence of their algorithms. Then, we run simulations and we compare thetwo estimators in various situations.
Anna Drewek A Linear Non-Gaussian Acyclic Model for Causal Discovery Marloes Maathuis Jul-2010
Abstract: The discovery of causal relationships between variables is important in many applications. Shimizu et al. proposed a method to discover the causal structure from observational data in linear non-Gaussian acyclic models, abbreviated by LiNGAM (see Shimizu et al. 2006). We analyze their approach and empirically test the strictness of non-Gaussianity byapproximating the Gaussian distribution with the t-distribution. Moreover, we compare the performance of the LiNGAM algorithm to that of the PC algorithm (Sprites et al. 2000). Finally, a combination of both algorithms is discussed (Hoyer et al. 2008) that enables the detection of causal structure in linear acyclic models with arbitrary distributions.
Rita Achermann Effect of proton pump inhibitors on clopidogrel therapy Werner Stahel Mar-2010
Abstract: In the present study, the interaction between clopidogrel and proton pump inhibitors (PPI) is investigated. A PPI might reduce the anti platelet function of clopidogrel and increase the risk of a second myocardial infarction. Patients with both drugs prescribed have a higher risk for such an event, but whether this is due to individual risk factors or a reduced effect of clopidogrel is an open question. The present study aims to assess the effect due to an interaction between the two drugs using health insurance data. Methods to adjust for confounders in observational data were applied, and new graph theory developments in combination with probability theory were evaluated. The study population consisted of 4 623 patients with prescribed clopidogrel, a hospital stay of at most 30 days before the first administration of clopidogrel, and health insurance coverage with Helsana. Hospitalization due to cardiac event and death were used as the clinical endpoints to assess, whether proton pump inhibitor prescription was associated with a higher risk of rehospitalization.A graph was constructed based on knowledge to derive theoretically, whether the effect was identifiable. Causal inference rules applied to this knowledge based graph showed, that the effect is identifiable when observational data are used. Graphs estimated from data did not disprove these findings. The effect of PPI on clopidogrel was calculated from the interventional distribution defined  by the graph. Also standard statistical techniques, a Cox proportional hazard regression, was applied, once with covariates to adjust for confounding and once with a propensity score. An instrumental variable approach was not feasible, since no instrument was found.Patients with concomitant use of clopidogrel and proton pump inhibitors had a higher risk for rehospitalization due to a cardiac event by a factor of 1.33 (CI 95%: 1.10, 1.61) compared to patients with no prescription for PPI. Important for the analysis was, that some patients had PPI administred together with clopidogrel but had no prescription before. Treatment guidelines recommend PPI to prevent stomach bleeding, a side effect caused by clopidogrel. It is assumed that this patients had no higher individual risk for a recurrent myocardial infarction compared to patients with no PPI prescription. Hence, the patients can be compared to patients with no PPI prescription before and during the study phase to estimate the effect. Comparison of the baseline characteristics for 23 drug groups, as well as age and gender revealed only minor differences. Results calculated based on the interventional distribution defined by the graph showed similar results compared to Cox regression. Finally, the propensity score used as a stratifier in a Cox proportional hazard regression yielded similar results  either. As alternative treatments for PPI are available, patients should not take these two drugs together.
Armin Zehetbauer A Statistical Interest Rate Prediction Model  Werner Stahel Mar-2010
Abstract:

2009

Student Title Advisor(s) Date
Nicoletta Andri Using Causal Inference for Identifying Coresets of the ICF Marloes Maathuis Sep-2009
Abstract: The World Health Organisation (WHO) has a strong interest in reducing the ICF-catalogue to a smaller set of items for different reasons such as time management and complexity. In this context, we analyse two data sets of the WHO concerning rheumatism/arthritis and chronic widespread pain consisting of variables from the ICF-catalogue. For this variable selection process we use the approach of Maathuis, Kalisch and Bühlmann which uses graph estimation techniques in combination with a causal method called back door adjustment. We show under which conditions this approach can be applied also to dichotomized data sets and how interactions between the variables can be handled. Significance of the estimates is assessed using permutation tests and a method called stability selection presented by Meinshausen and Bühlmann. Finally, the causal results are discussed and compared to associational results.
Simon Figura Response of Swiss groundwaters to climatic forcing and climate change A preliminary analysis of the available historical instrumental records Werner A. Stahel 
Rolf Kipfer
David M. Livingstone
Sep-2009
Abstract: Research on groundwater quality over long-term periods has scarcely been done in the past. In this thesis groundwater temperature is used as an indicator for groundwater quality. Temperature measurements of 8 river recharged and 6 rain-fed groundwaters were analysed. Some data sets also contained records of water level, spring discharge, pumping amount and oxygen concentration. The length of the records ranged from 20 to 52 years. Plots and trend and changepoint tests were used to describe the temperature developments. Correlations and regression models were established to analyse the impact of climatic forcing in the form of air temperature and the impact of groundwater quantity variables on groundwater temperature. The behaviour of oxygen concentrations was also briefly analysed.Most of the river recharged groundwaters showed an increase in temperature of 1-1.5◦C in the last 30 years. More than half of this warming took place in the period of 1987-1993. Results indicate that this warming was due to climatic forcing. The temperature of the rain-fed groundwaters showed small to no increase. Some properties of air temperature development can be recognized in temperature of these groundwaters but a possible response of rain-fed groundwaters to climatic forcing is outweighed by other factors.Measurements of oxygen concentrations were available at 4 sites. Decreasing concentrations at 3 measurement sites are likely caused by higher microbiological activity and lower oxygen solubility as a result of higher temperatures. This theory is contradicted by the increasing oxygen concentration at the fourth measurement site.
Lukas Rosinus Fehlende Werte EM-Algorithmus und Lasso in hochdimensionaler linearer Regression Peter Bühlmann Aug-2009
Abstract: Verschiedene Schätzer für hochdimensionale lineare Regressionsprobleme mit fehlenden Werten werden vorgeschlagen und untersucht [[?]]. Dabei wird Mithilfe des EM-Algorithmus der beobachtete negative Log-Likelihood mit- samt Lasso-Bestrafung der Regressionsparameter β minimiert. Durch die Verwendung der Lasso-Bestrafung werden die Regressionskoeffizienten sparse geschätzt. In Simulationsstudien werden die Methoden an verschiedenen multivariat normalverteilten Modellen untersucht. Dabei zeigt sich, dass die MissRegr Methode die besten Resultate erzielt. Mit dem EM-Algorithmus wird die inverse Kovarianzmatrix K = Σ−1 im Likelihood Sinn optimal geschätzt. Mit der Lasso Bestrafung werden dann auch die Regressionsparameter gut geschätzt, auch bei hohem Anteil fehlender Daten.
Philipp Stirnemann Unmatched Count Technik: Zum Zusammenhang zwischen Anonymität und statistischer Effizienz Werner A. Stahel 
B. Jann
Aug-2009
Abstract:
Rudolf Dünki Robuste Variogrammschätzung und robustes Kriging  Hans Rudolf Künsch  Aug-2009
Abstract: The thesis describes the development of robust algorithms for the analysis of geostatistical data. Three algorithms where implemented in R and each of these allows for a simultaneous estimation of the regression parameters and the covariance parameters. All three algorithms returned consistent results. Two of them are implemented as a package of R-functions. The treatment of the nugget effect makes the essential difference between the two algorithms: the first algorithm treats the nugget as a part of the estimation of the covariance parameters. The other algorithm treats the nugget as a part of the regression problem. This bears advantages in the analysis of polluted data. Sets of 50 simulations with different degrees of added pollution were analysed. The resulting parameter estimates agreed with the true values within the statistically tolerable range. The exception was the set containing the most polluted data. The estimation of the range parameter was somewhat problematic when performed with small Huber constants i.e. the resulting range displayed a bias upward. In contrast to this, the nugget estimate was improved when choosing a small Huber constant. The algorithm treating the nugget effect as a part of the regression problem returned more stable results in the case of a high degree of pollution. A Huber constant of 1.333 ... 1.666 appeared appropriate in these cases. An increase in stability was also visible in the behaviour of influence functions. The algorithms were applied to data on contamination of soils with Cu in the surroundings of a metal smelter in Dornach. It could be shown that the estimated parameters allowed for kriging estimates which are comparable with earlier analyses. Despite this it was not possible to gain unambigous parameter estimates. The reason lies in the existence of a very flat and extended optimum region. This allows for fitting models with comparable goodness of fit characteristics for clearly distinct parameter sets.  
Thomas André Rauber Parameter risk in reinsurance Peter Bühlmann Jul-2009
Abstract: In this thesis we consider parameter uncertainty that comes along in differentpricing areas in a reinsurance. By parameter risk we mean the riskof not estimating the parameter properly. We mainly look at parameter riskin the severity distribution. We differentiate three different ways ofcharacterising uncertainty. We first replace the parameter that has to beestimated by a randomvariable and derive some analytical result. Then we look into MaximumLikelihood Estimators and use the result that they are asymptotic normallydistributed. For some examples these asymptotic results are not accurateenough. Considering these cases we will classify the uncertainty by usingbootstrap. Finally we will specify where uncertainty arises in theExperience, Exposure and Credibility Rating in praxis. We will see anexample of Credibility Rating which blends Experience and Exposure Ratingby minimizing the parameter risk. 
Alessia Fenaroli Propagating Quantitative Traits in Gene Networks Marloes Maathuis Feb-2009
Abstract: Gene networks have been created to extend the knowledge of the gene functions in a specific organism. Such networks describe connections between genes involved in the same biological process.McGary, Lee and Marcotte have related a gene network of the baker yeast, called the YeastNet, with a morphological traits variation dataset, the SCMD, and have defined a method which assigns scores to each gene of the network in order to predict their activity. The researchers have tested the predictability of YeastNet with ROC curves and the respective AUC values by computing a leave-one-out cross-validation and have obtained the median value 0.615. Our contribution to this study includes: the definition of other score methods that take into account the quantitative data given by the SCMD dataset, in opposition to the dichotomization applied to these data made by McGary et all.; some new rules to predict the activity of each gene based on their scores, more complicated than the simple idea of comparing the scores with a cutoff adopted by McGary et all. but more efficient; and a different procedure, the 10-fold cross-validation, to compute the network predictability analysis.Thanks to these changes we have improved the YeastNet prediction quality by 5%, whose median value now is 0.665.
Simon Lüthy Merkmalswichtigkeit im Random Forest Peter Bühlmann Feb-2009
Abstract: In der Bioinformatik und verwandten Wissenschaftsgebieten, wie die statistische Genforschung und die genetische Epidemiologie, ist die Vorhersage von kategoriellen Antwortvariablen (wie der Krankheitsstatus eines Patienten oder die Eigenschaften eines Molekuls) einerseits und die verlässliche Identifikation der relevanten Merkmale andererseits, eine wichtige Aufgabe. In der Genforschung enthalten typische Datensätze hunderte oder gartausende von Genen beziehungsweise Merkmalen, doch stehen oftmals verhältnismassig wenige Beobachtungen, anhand deren man die Vorhersagen und Identifikationen machen will, zur Verfügung. Der Random Forest-Algorithmus löst dieses Problem sehr gut.In dieser Arbeit möchten wir in einem ersten Schritt die Entstehung eines Entscheidungsbaumes, mit dessen Hilfe ganze Vorhersage-Wälder {sogenannte Random Forests{ generiert werden, erklären. Wir erläutern kurz die Vorgehensweise bei der Erzeugung eines solchen Waldes und definieren die permutierte Fehlerfreiheit (engl. permutation accuracy importance) als ein Mass fur die Merkmalswichtigkeit.In einem zweiten Schritt weisen wir auf die Problematik hin, die auftritt, wenn man die permutierte Fehlerfreiheit auf Datenmengen mit stark korrelierenden Variablen oder mit Variablen, die sich in der Anzahl ihrer Kategorien unterscheiden, anwenden möchte. Wir präsentieren den Lösungsvorschlag nach Strobel et al. (2007), die einen anderen Algorithmus zur Erzeugung des Waldes propagieren.Wir führen zwei weitere Masse für die Merkmalswichtigkeit ein, zeigen anhand von Simulationen ihr Verhalten auf verschiedenen Datenmodellen und vergleichen sie mit der permutierten Fehlerfreiheit. Nach unserer Meinung ist die permutierte Fehlerfreiheit im Random Forest nach wie vor ein starkes und glaubwürdiges Werkzeug in der Variablenselektion.
Patric Müller Image restoration Blind deconvolution for noised Gaussian blur  Sara van de Geer Feb-2009
Abstract: Blind deconvolution is an inverse problem with one or more unknown parameters.Nowadays, one of the more common practical applications of deconvoultion is in image analysis, where it is used determining how to restore blurred images. To recover the original image, however, we first have to estimate the unknown parameteres the image  was blurred. In the last years, this topic has attracted significant attention, resulting in numerous studies. This thesis studies blind deconvolution from theoretical and practical point of view.On the other side, we provide the necessary tools we will utilise to improve the quality of blurred and noised pictures. Our results give rise to algorithms computing estimations if the aforementioned unknows.The applicability of the explored techniques then is demonstrated by means of several practical examples.The thesis is concluded by a brief qualitative analysis of the limits of deconvolutionwith regard to image restoration. To this end we show that the process isill-conditioned. Thus, it might be at best inefficient, but at worst impossible, to retrieve the original picture from a blurred one.

2008

Student Title Advisor(s) Date
Diego Colombo Goodness of fit test for isotonic regression Marloes Maathuis Jul-2008
Abstract: We study the work of Durot and Tocquet (2001), whom proposed a new test of the hypothesis H0 : ”f = f0” versus the composite alternative Hn : ”f != f0”, under the assumption that the true regression function f is monotone decreasing on [0, 1]. The test statistic is based on the L1-distance between the isotonic estimator ˆ fn of f and the given function f0, since a centered and normalized version of this distance, is asymptotically standard normal distributed under the null hypothesis H0, provided that the given function f0 satisfies some regularity conditions. The main purpose to study asymptotic normality of the isotonic estimator, relies on the study of its asymptotic power under the alternative Hn : ”f = f0 + cn"n”. The idea is to study the minimal rate of convergence for cn, such that the test has a prescribed asymptotic power. Durot and Tocquet show that this minimal rate is n−5/12 if "n does not depend on n and n−3/8 if it does.Our contribution is a more detailed explanation of the models, of the main results and the insertion of some extra particular steps in the proofs. To check these theoretical results in simulations like Durot and Tocquet, we write new R codes. Namely, we perform a simulation study to compare the power of this test with that of the likelihood ratio test, for the case where f0 is linear, and we also compare these simulations results to the ones obtained by Durot and Tocquet. Moreover, we propose extra simulations for the power of another test not treated by Durot and Tocquet and we will see that it is always most powerful than the one they studied. Finally, we conduct a new simulation study in the case where the given monotone function f0 is quadratic.
Alain Weber Probabilistic predictions of the future seasonal precipitation and temperature in the Alps Hans Rudolf Künsch Jul-2008
Abstract: This work presents probabilistic predictions of the future (2071-2100) seasonalprecipitation and temperature in the Alps. The predictions combine climate forecasts from different numerical simulations in a Bayesian ensemble approach. It is well known that these climate simulations have systematic errors, which should be taken into account. Unfortunately, simulations are driven by boundary conditions, which are very different to those of the last century. This is a problem because there exist no comparable data from the past to estimate the bias of a climate model under similar boundary conditions. It becomes necessary to rely on assumptions, which can hardly be proven wright or wrong. Recently,Christoph Buser showed that predictions of seasonal temperature in the Alps differ for two reasonable assumptions. In this work we compare predictions of precipitation for the same two assumptions. In addition, one of the corresponding Bayesian modelsis extended to predict the bivariate distribution of precipitation and temperature.
Patricia Hinder Additive Isotonic Regression Sara van de Geer Mar-2008
Abstract: In this master thesis we study the isotonic regression model for one or more covariates. We will first give an introduction to the one dimensional regression problem with calculated using the pool-adjacent violator algorithm (PAVA). We will extend the regression problem to multiple covariates and assume an additive model. The functions will be estimated with a classic backfitting estimator. We compare the backfitting estimator with an oracle estimator and discuss that they can be estimated with the smoothed by applying a kernel smoother to the isotonized data. The monotonicity of the kernel smoother ist guaranteed by using a log-concave kernel. We will study another approach of the additive isotonic regression problem that is based on boosting. The function are expanded into a sum of basis functions and component-wise boosting algorithm is applied.
Manuel Koller Robust Statistics:Tests for Robust Linear Regression Werner Stahel Mar-2008
Abstract: Analyzing data using statistical methods means to break reality down toa mathematical framework, a model. Often this model is based on strongassumptions, for example normally distributed data. Classical statisticsprovides methods that fit the chosen model perfectly. But in reality themodel assumptions usually hold only approximately. Anomalies and untrueassumptions might render the statistical analysis useless. Robust statistics aims for methods that are based on weaker assumptionsand thus allow small deviations from the classical model.  However,robust statistics is not restricted to the use of robust estimationmethods alone. It also extends to methods used to draw inference. In thepast, there has not been much research focused on robust tests.In this thesis we study the quality of inference performed by of twostate-of-the-art robust regression procedures. We then propose a designadapted scale estimator and use it as part of a new robust regressionestimator, the MMD-estimator. This new estimator improves the quality ofrobust tests considerably.A simulation study is performed to compare the performance of thementioned regression procedures in combination with various covariancematrix estimators. We found large differences between the testedmethods. Some methods were able to approximately reach the desired levelin the corresponding tests for most tested scenarios while othersproduced estimates that were only useful in specific high samplesettings. We infer that the covariance matrix estimator needs to becarefully selected for every new scenario.
Philipp Rütimann Variablenselektion in hochdimensionalen linearen Modellen mittels Schrumpfvarianten des PC-Algorithmus Peter Bühlmann Mar-2008
Abstract: In dieser Arbeit geht es um Variablenselektion in hochdimensionalen linearen Modellen. Dazu wird der Ansatz von Professor Peter Bühlmann und Markus Kalisch basierend auf dem PC-Algorithmus übernommen. Dieser Ansatz wird in der Arbeit dahingehend verändert, dass die Korrelationen, statt mit dem Maximum Likelihood Schätzer, mit verschiedenen Schrumpfschätzern berechnet werden. Diese neuen Varianten des PC-Algorithmus werden mittels ROC-Plots und weiteren graphischen Vergleichsmethoden mit der Standardvariante verglichen.Des Weiteren geht es in dieser Masterarbeit um Dimensionsreduktion. Diese wird verwendet um die Dimension der hochdimensionalen linearen Modelle zu verringern. Es stellt sich heraus, dass sich dadurch die Varianten des Algorithmus klar Verbessern. Somit kam die Idee auf, die Dimensionsreduktion auch im Falle des robusten PC-Algorithmus zu verwenden. Doch dies ergibt nicht die selben positiven Resultate wie bei den Schrumpfvarianten.
Bruno Gagliano Asymptotic theory for discretely observed stochastic volatility models Sara van de Geer Feb-2008
Abstract: This thesis investigates the estimation of parameters for discretely observed stochastic volatility models. The main concern is to give a general methodology for estimating the unknown parameters from a discrete set of observations of the stock price. Two estimation methods, the minimum contrast and estimating functions, are introduced and it is shown that, under certain assumptions, the estimators obtained are consistent and asymptotically normal. Finally, a series of simulations is performed to confirm the results and an application to real-world stock data is made.
Sandra König Analyse von Skisprungdaten Sara van de Geer Feb-2008
Abstract: In der vorliegenden Arbeit soll untersucht werden, welche Faktoren beim Skispringen signifikant mehr Weite bringen. Als einfachstes Modell wird eine lineare Regression angepasst. Dabei zeigt sich wie erwartet, dass Wind, Anlaufgeschwindigkeit und Gewicht die Weite eines Sprungs beeinflussen. Für eine detailliertere Analyse werden Verallgemeinerungen des linearen Modells herangezogen. Insbesondere das Gemischte Effekte Modell zeigt, dass es springerspeziefische Effekte (wie etwa das Fluggefühl) gibt; weiter wird die isotone Regression betrachtet sowie die Möglichkeit, mittels Multiscale Testing die Isotonie einer Funktion zu überprüfen. Da insbesondere der Wind immer wieder Wettkämpfe mitzubestimmen scheint, wird sein Einfluss durch Messungen an weiteren Stellen detaillierter untersucht. Dabei stellt sich heraus, dass der Wind besonders beim Schanzentisch eine wichtige Rolle spielt. Eine weitere offene Frage war, ob Podestplätze bei der Junioren Weltmeisterschaft ein Indikator für spätere Erfolge sind. Da es ebenso viele Beispiele für wie gegen diese Hypothese gibt, war anders als bei den vorhergehenden Untersuchungen keine intuitive Antwort vorhanden. Die Natur der Daten macht das Testen schwierig, daher wird wiederum eine Regressionsanalyse gemacht. Mathematisch schwierig zu beurteilen ist die Frage, wann Punkterichter, die den Sprung subjektiv bewerten, parteiisch sind. Eine mögliche Beschreibung der sehr komplexen Situation liefert das Gemischte Effekte Modell.
Francesco Croci The World of Volatility Estimators  Sara von de Geer  Jan-2008
Abstract: This thesis investigates the estimation of the volatility of an asset  return process. The main concern is to give a general overview for how to  estimate volatility non-parametrically and efficiently. First of all, I  have introduced the basic notions of stochastic theory and a special and  unusual limit theorem that I will use throughout the thesis. Then, I  deal with several volatility estimators, from the easiest and worst one,  the so called realized volatility (RV) estimator, to the so far best  estimator, the so called multi-scale realized volatility (MSRV)  estimator, which converges to the true volatility at the rate of  n-1/4. Finally, in the last section, we consider microstructure as  an arbitrary contamination of the underlying latent securities price,  through a Markov kernel Q. The main result there is that, subject to  smoothness conditions, the two scales realized volatility (TSRV) is  robust to the form of contamination Q.

2007

Student Title Advisor(s) Date
Sonja Angehrn Random Forest Klassifikator zur Erkennung von Alarmsignalen in Sicherheitssystemen Peter Bühlmann Aug-2007
Abstract: In dieser Diplomarbeit werden die drei Klassifikatoren Logistische Regression, CART und Random Forest auf ihre Verwendbarkeit für einen Erkennungsalgorithmus überprüft, in welchem von verschiedenen Geräuschsignalen bestimmt werden soll, ob sie der Klasse Alarm oder Normal zugehören. Es stellt sich heraus, dass der Random Forest-Algorithmus von den drei Klassifikatoren für diese Problemstellung am besten geeignet ist. Anschliessend wird dieser Klassifikator anhand verschiedener Szenarien mit einem bestehenden HMM-Algorithmus verglichen.Für die Implementierung der Klassifikatoren stehen mehrere Features zur Verfügung. In dieser Arbeit wird für den Random Forest- und den HMM-Algorithmus überpfüft, welche Auswahl dieser Features eine möghlichst kleine Fehlerrate ergibt.
Sarah Gerster Learning Graphs from Data: A Comparison of Different Algorithms with Application to Tissue Microarray Experiments Peter Bühlmann Aug-2007
Abstract: A new algorithm (logilasso) to learn network structures from data has been introduced in “Penalized Likelihood and Bayesian Methods for Sparse Contingency Tables with an Application to Full-Length cDNA Libraries” (Dahinden, Parmigiani, Emerick and Buehlmann, 2007). The main idea is to study the interactions between the variables by performing a model selection in log-linear models.In this master thesis, a few other graphical model fitting algorithms are compared to the logilasso. The chosen algorithms are the PC, the Max-Min-Hill-Climbing (MMHC) and the Greedy Equivalent Search (GES). They all base on different approaches to fit a graphical model. Those methods are presented and the algorithms are described. Their performance, in the sense of their ability to reconstruct a graph, is tested on simulated data. The algorithms are also applied to Renal Cell Carcinoma data toillustrate a typical domain of application for such algorithms.
Lorenza Menghetti Density estimation, deconvolution and the stochastic volatility model Sara van de Geer Aug-2007
Abstract: The stochastic volatility model  contains the stochastic volatility process observed at discrete time instance with vanishing gaps whose density is to be estimated. The volatility density based on logarithm of the squared process is estimated with  the deconvolving kernel density estimator. Since the error density is supersmooth,  the convergence is very slow.This thesis studies the theoretical and empirical behaviour of the bias and the variance  of the estimator.  Empirical study suggests considering the bandwidth to be smaller than  the theoretical bandwidth  and confirms the slow rate of convergence.
Giovanni Morosoli Optimale Anpassung einer Portfolioschadenhöhenverteilung an ein individuelles Risiko Peter Bühlmann Aug-2007
Abstract: In der Einführung wird das Ziel dieser Diplomarbeit erklärt und werden die zur Verfügung stehenden Schadendaten präsentiert. Grundsätzlich besteht unsere Aufgabe aus der Berechnung eines optimalen Gewichts für die individuelle und die Portfolioschadenhöhenverteilung. Im Kapitel 2 wird das Problem des sogenannten Data fittings analysiert; mit anderen Worten, gegeben eine Stichprobe von Schadenhöhen, versucht man eine geeignete Verteilung zu finden, welche die gegebenen Schäden erzeugen könnte. Insbesonder untersuchen wir zwei versicherungsspezifische Methoden: das Erweiterungsverfahren, welches eine Verallgemeinerung der Maximum Likelihood Methode ist, und die Join Operation, welche als eine erste "grobe" Anpassung einer Portfolioschadenhöhenverteilung an ein individuelles Risiko interpretiert werden kann.Im dritten Kapitel benützt man den Chi2-Test um eine Anpassung einer Portfolioverteilung an ein individuelles Risiko zu bestimmen. Diese Anpassung hängt aber stark von den gewählten Signifikanzniveau ab; daher, im 4. Kapitel analysieren wir das Problem der Wahl eines geeigneten Signifikanzniveaus, indem wir eine Art von "Credibility Approach" verwenden. Im letzten  Kapitel diskutieren wir die erhaltenen Resultate und einige Hinweise für eventuelle zukünftige Entwicklungen.
Nicolò Valenti Regression under shape restriction and the option price model Sara van de Geer Aug-2007
Abstract: Many types of problems are concerned with identifying a meaningful structure in real world situations. A structure involving orderings and inequalities is often useful since it is easy to interpret, understand, and explain. In many settings, economic theory only restricts the direction of the relationship between variables, not the particular functional form of their relationship. Let c(X) denote the call price as function of the strike price X. By the no arbitrage principle, c is a convex, decreasing function ofX, i.e. it satisfies certain shape constraints. It can be argued that economic theory virtually places no other restrictions on c, and that the estimation of the state-price density should be carried out using only these shape restrictions (and some bounds on first and second derivative). Furthersmoothness assumptions or parametric assumptions may not be justified and have the potential risk of misspecifying the state-price density. Our work consists of studying estimation under such shape restrictions. We first consider monotone regression function estimation, the so-called  sotonic regression problem. Second, we analyse the problem of convex regression estimation. Then we build a nonparametric estimator of the call pricing that is decreasing and convex for small samples.
Enrico Berkes Statistical Analysis of ChIP on Chip Experiments Peter Bühlmann Jul-2007
Abstract: With the end of the Human Genome Project, the challenge of the emerging discipline of modern biology is  to determine the role of the newly characterised genes in man and model organisms. This new sequence data represents, for the first time, a realistic opportunity to link the function (and dysfunction) of specific tissues and cell types to the activity of the genes expressed within them, and so to identify genes and gene products that could act as therapeutic targets. The underlyingstrategy in the identification and functional characterisation of target genes will rely heavily on the ability to perform high throughput analysis of gene expression, at both the tissue and cellular level. Gene expression is regulated by proteins, specific for every gene, that bind themselves tothe target gene and promote or repress its transcription. In the last years two methods have been refined in order to study the gene regulation mechanism: microarray and ChIP on chip experiments. However the large quantity of data and the uncertainties, due to noise, provided by these experiments make the interpretation of the results difficult and laborious. For this reason many statistical methods have been developed trying to obtain the most relevant information from these data.Our work consists of modifying Motif Regressor, an already existing method to analyze data of microarray experiments, and using this new algorithm to search the transcription factor DNA-binding motifs of HIF1-alpha, a protein involved in gene regulation under hypoxia. The results show thatour algorithm is fast, effective, does not require many biological experiments and gives important suggestions on where future biological researches could be directed.
Jürg Schelldorfer Multivariate Analyse linearer Mischungen mit bekannten potenziellen Quellenprofilen Werner Stahel Jul-2007
Abstract: Die Konzentration von gewissen Schadstoffen in der Luft kann mathematisch durch eine lineare Mischung von Beiträgen verschiedener Quellen approximiert werden. In diesem Zusammenhang ist die Aufgabe der multivariaten Statistik, mit geeigneten Verfahren die Anzahl der vorhandenen Quellen, deren Emissionsprofile sowie deren Aktivitäten (in Abhängigkeit der Zeit) zu schätzen.In dieser Arbeit präsentieren wir Verfahren, wie wir die Kenntnisse über mögliche vorgegebene Quellenprofile benutzen können, um die Datenanalyse bei einem linearen Mischungsmodell zu verbessern.
Miriam Blattmann-Singh Nonparametric volatility estimation with a functional gradient descent algorithm for univariate financial time series Peter Bühlmann Mar-2007
Abstract:
Claudia Soldini Variablenselektion in hochdimensionalen Regressionsmodellen bei nicht-homogenen Daten: Die Nutzung von Bacillus subtilis zur Synthese von Ribaflavin Peter Bühlmann Mar-2007
Abstract: Die vorliegende Arbeit ist Teil eines interdisziplinären Forschungsprojektes. Ihr Ziel ist die Identifizierung von wichtigen Mechanismen, die an der Herstellung eines Vitamins durch ein Bakterium teilnehmen. Dafur stützt man sich auf Daten, die aus einer Genexpressionstudie einer pharmazeutischen Firma stammen. Da man mit einer grossen Anzahl von Genen zu tun hat, werden Regressionsmethoden angewendet, die für hochdimensionale Probleme geeignet sind, und Variablen selektieren können. Die Gene werden als Prädiktoren und die Menge des produzierten Vitamins als Zielvariable betrachtet.Die Experimente wurden unter verschiedenen Bedingungen durchgeführt, so dass man es mit einem nicht-homogenen Datensatz zu tun hat. Die Menge des produzierten Vitamins variiert in Abhängigkeit von den Bakterienstämmen, die untersucht wurden, und vom Zeitpunkt, zu dem die Messungen genommen wurden. Es ist daher interessant, die wichtigsten Gene oder Gruppen von  Genen zu identifizieren, die für solche Unterschiede verantwortlich sind. Zu diesem Zweck werden statistische Tests durchgeführt, sowohl auf den einzelnen Genen als auch auf Gruppen von Genen. Diese werden mit Hilfe einer Clusteranalyse gebildet, wobei als Ähnlichkeitsmass die Korrelation verwendet wird.
Nicolas Städler  Statistische Modellentwicklung für nichtinvasive Blutzuckermessung mittels Sensoren Werner Stahel Mar-2007
Abstract: Impedanzsignale zur nicht-invasiven Messung der Blutzuckerkonzentration werden durch eine Vielzahl anderer Einflussfaktoren (Temperatur, Schweiss, Durchblutung, usw.) beeinträchtigt. Um den Einfluss solcher Störparameter auf die Impedanzsignale zu quantifizieren, werden diese mit Sensoren gemessen. Ziel dieser Arbeit ist es, mittels einer linearen Regression und verschiedener Variablen-Selektions-Methoden möglichst allgemeingültige Modelle zu entwickeln, welche die Glukose-Konzentration in Abhängigkeit der Impedanzsignale und anderer Einflussfaktoren vorhersagen. In einem ersten Teil der Arbeit kommen die klassischen Selektions-Methoden Schrittweise-Vorwärts, Schrittweise-Rückwärts und "all subsets" zum Zuge. Da die erklärenden Variablen enorme Messungenauigkeiten aufweisen, werden diese in einem nächsten Schritt geglättet. Im Verlaufe der Arbeit zeigt sich, dass gewöhnliche Selektionskriterien wie AIC und Cp zu extrem überangepassten Modellen führen. In einem entscheidenden Schritt wird alternativ zum AIC und Cp ein an die spezielle Struktur der Daten besser angepasstes Kriterium vorgeschlagen. Mit dem neuen Kriterium wird sowohl eine Adaptive-Lasso-, als auch eine Schrittweise-Vorwärts-Selektion durchgeführt. Beide Methoden führen zu sehr ähnlichen und vernünftigen Modellen mit einem R2 von 0.73. Besondere Aufmerksamkeit wird dem Adaptive-Lasso gewidmet. Die Analyse zeigt, dass eine datenabhängige Gewichtung im Adaptive-Lasso einen erheblichen Fortschritt gegenüber dem gewöhnlichen Lasso bringt. Da die funktionale Form des Modells a priori unbekannt ist, wird zudem eine Analyse mit dem Namen "Multi Adaptive Regression Splines (MARS)" benutzt. Diese Methode erweist sich aber als ungeeignet.

2006

Student Title Advisor(s) Date
Massimo Merlini Identifikation relevanter Mechanismen der Vitaminproduktion Peter Bühlmann  Sep-2006
Abstract: Die Systembiologie ist eine junge interdisziplinäre Wissenschaft, deren Ziel es ist, biologische Organismen in ihrer Gesamtheit zu verstehen. In dieser Arbeit wird ein Forschungsprojekt vorgestellt, das die Produktion eines speziellen Vitamins durch einen Mikroorganismus untersucht. Man möchte die wesentlichen Mechanismen identifizieren, die am Fermentierungsprozess teilnehmen, um die Produktion zu optimieren.
Michael Amrein Parameterschätzung in zeitstetigen Markovprozessen Hans Rudolf Künsch Aug-2006
Abstract: In dieser Arbeit geht es um Parameterschätzungen in einer bestimmten Klasse von zeitstetigen, homogenen Markov-Ketten, die sich insbesondere zur Modellierung von gewissen chemischen Reaktionen oder Systemen aus der Populationsdynamik eignet. Die Daten sollen dazu in Form einer Zeitreihe vorliegen, das heisst, man kennt die Werte des Prozesses zu diskreten Zeiten.Die Übergangswahrscheinlichkeiten zwischen je zwei Observationen werden mit Hilfe von Poisson-Verteilungen approximiert. Die Güte dieser Näherung wird durch das Einführen von zusätzlichen Zeitpunkten (und latenten Daten) zwischen den eigentlichen Beobachtungszeiten verbessert. Zur approximativen Bestimmung des Maximum-Likelihood-Schätzers wird der EM-Algorithmus gepaart mit Monte-Carlo- beziehungsweise Markov-Ketten-Monte-Carlo-Methoden verwendet. Daraus resultieren schlussendlich zwei Algorithmen, die an verschiedenen Beispielen, insbesondere an künstlichen Datensätzen, getestet werden.
Andrea Cantieni Effiziente Approximation der a posteriori Verteilung für komplexe Simulationsmodelle in Umweltwissenschaften Hans Rudolf Künsch Aug-2006
Abstract:
Elma Rashedan Models for Emission Factors of Passenger Cars linking them to Driving Cycle Characteristics Werner Stahel Jul-2006
Abstract:
Carmen Casanova  Vorhersage von Partikelgrössen-Verteilung anhand Bildananlyse-Daten Werner Stahel Mar-2006
Abstract:
Andreas Elsener  Statistical Analysis of Quantum Chemical Data; Using Generalized XML/CML Archives Peter Bühlmann Mar-2006
Abstract:
Simone Elmer  Sparse Logit-Boosting in hochdimensinalen Räumen Peter Bühlmann Mar-2006
Abstract: Das Ziel meiner Diplomarbeit ist es, das Klassifikationsverfahren Sparse-LogitBoost zu entwickeln und dies in R zu implementieren. Weiter soll das Verfahren auf simulierte und natürliche Daten angewendet werden und die Vorhersagegenauigkeit mit anderen Klassifikationsverfahren verglichen werden.

2005

Student Title Advisor(s) Date
Roman Grischott Robuste Geostatistik im Markovmodellen am Beispiel eines Schwermetalldatensatzes Hans R. Künsch Sep-2005
Abstract:
Michael Hornung  Klassifikation hochdimensionaler Daten unter Anwendung von Box-Cox Transformationen Peter Bühlmann Aug-2005
Abstract: Die Regressionsmethoden Lasso, relaxed Lasso und Boosting werden benutzt, umsowohl simulierte wie natürliche hochdimensionale Daten vorherzusagen und zu klassieren.Dabei bestehen die betrachteten Daten nicht nur aus den erklärenden Variablensondern auch aus deren Box-Cox Transformationen, was die Vorhersagegenauigkeitvergrössern soll. Da die Zielvariable bei den natürlichen Datensätzen diskret ist, richtenwir unser Augenmerk vor allem auf den Missklassifikationsfehler. Es zeigt sich, dassbei einzelnen Datensätzen durch die Verwendung der Box-Cox Transformationen wohlVerbesserungen der Vorhersagekraft auftreten können, aber häufig auch Verschlechterungen in Kauf genommen werden müssen.Im zweiten Teil dieser Arbeit wird die Korrelation der durch die drei Regressionsmethodenausgewählten Modellvariablen betrachtet und zu verringern versucht. Dabei werdenzwei unterschiedliche Ansätze verfolgt. Als erstes wird durch eine Lasso-ähnlicheMethode, die zusätzliche Gewichte im Bestrafungsterm benutzt, die Korrelation zumTeil beträchtlich verringert. In einem zweiten Schritt werden aus den gegebenen Variablendurch Mittelung von Gruppen bestehend aus stark korrelierten Variablen neueErklärende konstruiert. Diese werden dann für weitere Klassifikationen benutzt. Auchdiese Methode verringert die Korrelation der Variablen teilweise stark. Jedoch lassensich keine allgemeinen Aussagen machen und beide Ideen führen in der Regel zu einerVergrösserung des Missklassifikationsfehlers.
Stefan Oberhänsli  Robustheit bei Multiplem Testen: Differentielle Expression bei Genen Peter Bühlmann Aug-2005
Abstract: Warum Multiples Testen? Seit Datenerhebungen aller Art nicht mehr von Hand gemacht werden, sondern mit Computerunterstützung, sind die Datenmengen stark angestiegen. Damit wurden Methoden nötig, welche mit so umfangreichen Datensätzen umgehene können - und gleichzeitig möglichst wenige Fehler machen. Üblicherweise umfassen Datensätze hunderte von Faktoren. Damit wird es möglich, ganz verschiedene (eventuell schon vermutete) Zusammehänge zu testen. Weiter erlauben solch umfangreiche Datensätze ein exploratives Vorgehen, d.h. man betrachtet die Daten im Prinzip ohne Vorwissen und schaut, welche Zusammenhänge sich aufdrängen. Dieses Vorgehen ist allerdings statistisch heikel, da mit einer geschickten Auswahl von Testprozeduren oder vorgängiger "Datenbereinigung" fast beliebige Zusammenhänge "belegt" werden können. Der einschränkende Faktor bei wissenschaftlichen Untersuchungen ist sehr oft das festgesetzte Budget. Trotzdem möchte man möglichst viel Information aus den gesammelten Daten erhalten. Es ist viel billiger, einer Testperson eine Frage mehr zu stellen als eine weitere Testperson zu rekrutieren. In einem Experiment werden also aus finanzieller Sicht besser mehr Variablen gemessen als das ganze Experiment öfter zu wiederholen. Es gibt dann zwar weniger Beobachtungen, dafür mehr Faktoren, deren Zusammenhänge es zu analysieren gilt. In derartigen Fällen ist es unvermeidlich, sehr viele Tests simultan (Multiple Tests) durchzufähren. Bei der Analyse und Interpretation von Multiplen Tests treten erhebliche Schwierigkeiten auf, welche bei einfachem Testen inexistent sind. Wie diese Schwierigkeiten gemeistert werden können, wird im Verlaufe der Arbeit beschrieben.
Rahel Liesch Statistical Genetics for the Budset in Norway Spruce Peter Bühlmann Mar-2005
Abstract: Genetic variations is needed for plants to respond and adapt to environmental challenges. Understanding the genetic variation of adaptive traits and the forces that shaped it is one of the main goals of evolutionary biology. This is a difficult task, as most adaptive traits are quantitative traits, i.e. traits that are controlled by many loci intercting with the environment. The aim of this thesis was (i) to analyze the genetic variation of the timing of budset of Norway spruce (Picea abies L) within and among 15 populations covering the natural range of the species and (ii) to relate the variation among population for timing of budset with the variation observed at both neutral and candidate genes. The former was done through a classical ANOVA after choosing the adequate model. The latter was achieved by estimating and calculating confidence intervals for Wright's fixation indices (a measure of among-population differentiation) for budset, on the one hand, and neutral or candidate genes, on the other hand. Estimating confidence intervals for Wright's fixation index for quantitative trait, such as timing of busdet, has been and can be done in many different ways. In some studies the delta method has been used whereas in others nonparametric bootstrapping was favored. In almost all studies, the choice of a certain method was not justified or discussed, nor, when bootstrap was retained, was the choice of a particular bootstrap strategy of type warranted. We therefore simulated several datasets and applied miscellaneous methods to find the most appropriate method. We concluded that either a semiparametric of a parametric bootstrap gave the best results in the case of the spruce dataset. Using a nonparametric bootstrap, sampling over populations and families would definitely be the most adequate way of obtaining a confidence interval. Finally, Wright's fixation index for budset was significatly larger than differentiation at both candidate and neutral loci suggesting strong local adaptation.

2004

Student Title Advisor(s) Date
Lukas Meier Extemwertanalyse von Starkniederschlägen Hans R. Künsch Mar-2004
Abstract: Zusammenfassung:Klimaveränderungen sind von grossem Interesse, da sie einen bedeutenden Einfluss auf den Menschen und die Umwelt haben können. In dieser Arbeit untersuchen wir mit Hilfe von Methoden der Extremwerttheorie den zeitlichen Verlauf von Starkniederschlägen für 104 Messstationen in der Schweiz. Wir modellieren die stationenweisen Überschreitungen von genügend hohen Schwellen mit einem 2-dim. Poisson Punktprozess und nicht-stationären Modellen für die Lokations- und Skalenparameter. Wir finden so für viele Stationen eine grosse Evidenz eines positiven Trends. Um die einzelnen Trendschätzungen zu kombinieren, verwenden wir ein Analogon zu einem hierarchischen Modell. Die räumliche Analyse der Resultate zeigt jedoch Anomalien, die eine Kombination der Messstationen erschwert. Wir untersuchen deshalb alternative Ansätze, hauptsächlich um saisonale Besonderheiten besser zu modellieren. Es zeigen sichdabei grosse saisonale Unterschiede bei der räumlichen Abhängigkeit der Trendschätzungen der verschiedenen Stationen, die genauer untersucht werden könnten.
Andreas Greutert Methoden zur Schätzung der Clusteranzahl Peter Bühlmann Mar-2004
Abstract: Im Zusammenhang mit Microarray-Experimenten werden laufend neue Methoden der Cluster-Analyse entwickelt. Drei solche Methoden werden im Technical Report von Fridlyand und Dudoit (citeyear{clest}) vorgestellt. Fridlyand und Dudoit verfolgen zwei Ziele. Erstens möchten sie durch die resampling Methode clest die Clusteranzahl schätzen. Zweitens soll die Genauigkeit der Clusterung verbessert werden. Um die Genauigkeit zu verbessern, schlagen sie zwei Bagging Methoden für Clusteralgorithmen vor.Wir werden uns mit dem clest-Algorithmus auseinander setzen. Damit wir den Algorithmus verstehen und anwenden können, ist einige Theorie notwendig. Im Kapitel 2 beginnen wir mit einer kurzen Einführung in die Cluster-Analyse. Weitere Methoden, die die Clusteranzahl schätzen, werden im Kapitel 3 vorgestellt. In den Kapiteln 4, 5 und 6 wird clest mit seinen Parametern eingeführt.Das Ziel dieser Diplomarbeit besteht darin, den clest-Algorithmus zu verstehen und wenn möglich ihn zu verbessern. Dazu war es notwendig den Algorithmus clest in R zu implementieren (siehe Anhang B). Das grosse Ziel clest zu verbessern, wollen wir erreichen, indem wir die verschiedenen Parameter von clest verändern. Eine weitere Aufgabe besteht darin, ein Mass für die Sicherheit der Clusteranzahl-Schätzung zu konstruieren (siehe Kapitel 7). Weiter sollen auch bestehende Schätzmethoden mit clest verglichen werden.
Käthi Schneider Mischungsmodelle für evozierte Potenziale in Nervenzellen Hans R. Künsch Mar-2004
Abstract: Dieser Arbeit liegen 18 Datensätze neurobiologischer Daten über evozierte Potentiale zugrunde. Jeder Datensatz enthält Amplituden- und Noise-Werte, wobei die Amplituden-Werte die evozierten Potentiale darstellen.Da es sich um neurobiologische Daten handelt, werden in Kapitel 2 zuerst einige biologische Begriffe und Abläufe erklärt. Diese spielen bei der Erhebung der Daten, welche ebenfalls thematisiert wird, eine Rolle. Nebst einer ersten Übersicht über die Daten wird zudem auf die quantale Hypothese eingegangen, da sie bei der Auswertung der Daten eine wesentliche Rolle spielt.ZielsetzungAn die einzelnen Amplituden-Werte der Datensätze werden Mischverteilungsdichten angepasst. Dazu sind verschiedene Modelle zu betrachten und gleichzeitig ist zu überprüfen, welches Modell am besten dafür geeignet ist.Als erster Schwerpunkt werden Mischungsmodelle betrachtet, die von abhängigen Daten ausgehen. Deshalb muss vorher geprüft werden, ob überhaupt Abhängigkeiten zwischen evozierten Potentialen bestehen. Falls solche vorhanden sind, ist zu untersuchen, wie diese modelliert werden und ob diese Modelle die besseren Schätzungen der Mischverteilungsdichten liefern.Der zweite Schwerpunkt wird auf die quantale Hypothese gelegt. Man möchte wissen, ob sich evozierte Potentiale als eine Überlagerung einer zufällig ausgeschütteten Anzahl Quanten modellieren lassen oder nicht.
Jeannine Britschgi Analyse einer Brustkrebsstudie Hans R. Künsch Feb-2004
Abstract: Das Ziel dieser Diplomarbeit besteht darin, für eine Gruppe von Brustkrebspatientinnen, deren Tumor operativ entfernt wurde, eineÜberlebenszeitanalyse durchzuführen. Es interessiert uns aber weniger die Zeitspanne, bis eine Patientin an Brustkrebs stirbt, sondern vielmehr die Zeit bis zu einem Rezidiv (Wiederauftreten des Tumors). Wir möchten für die Patientinnen ein gutes Prognose-Modell konstruieren, das die Zeit eines Rückfalls des Tumors voraussagt. Dieses Modell wird eine Funktion sein. Wir wollen herausfinden, welche der vielen erklärenden Variablen notwendig sind, um diese Funktion gut zu charakterisieren. Esstellt sich die Frage, ob die Angaben über die Lymphknoten, welche ebenfalls operativ entfernt und nach Ablegern des Tumors untersucht wurden, notwendig sind, oder ob sich auch ohne diese Informationen gute Prognose-Modelle finden lassen.

2003

Student Title Advisor(s) Date
Corinne Dahinden Schätzung des Vorhersagefehlers und Anwendungen auf Genexpressionsdaten Peter Bühlmann Nov-2003
Abstract: Im Kapitel 2: Microarray Prädiktoren werden verschiedene Methoden vorgestellt, welche wir später verwenden, um Microarrays zu klassifizieren.Im Kapitel 3 werden Schätzungen des Vorhersagefehlers einführt.Im Kapitel 4: Schätzung der Vertrauensintervalle werden Schätzungen der Standardabweichungen für die im Kapitel 3 eingeführten Schätzer besprochen.In den Kapiteln Kapitel 5-7 werden die verschiedenen Schätzungen für die Fehler und die Vertrauensintervalle anhand von Simulationen miteinander verglichen.Diese Erkenntnisse werden im Kapitel 8: Vergleich von Microarray Prädiktoren mit und ohne klinische Variablen angewandt.Im Kapitel 9: Prevalidierung wird die gleichnamige Technik eingeführt und angewandt auf verschiedene Microarray Prädiktoren um die Relevanz der klinischen Variablen zu bestimmen.In dieser Diplomarbeit habe ich sehr viel simuliert und dabei einige der verwendeten Fehlerschätzer in der Statistiksoftware R selbst programmiert. Den Code der wichtigsten Programme findet man unter /u/dahinden/Diplomarbeit/RCode.
Christof Birrer Konstruktion von Vorschlagsdichten für Markovketten Monte Carlo mit Sprüngen zwischen Räumen unterschiedlicher Dimension Hans R. Künsch Sep-2003
Abstract: In der vorliegenden Diplomarbeit ging es darum, Vorschlagsdichten für Markovketten Monte Carlo zu konstruieren, wobei vor allem im AR-Modell gearbeitet wurde. Die Arbeit baut auf dem Paper von Brooks, Giudici, und Roberts (2003) auf. Es sollte der Vorschlag im Diskussionsbeitrag von H.R. Künsch untersucht werden, der eine sorgfältiger ausgesuchte Sprungfunktion empfiehlt als die naheliegende, mit der im Paper gearbeitet wurde. Zu diesem Zweck sollten auch Simulationen mit der Statistik-Software R durchgeführt werden. In einem zweiten Teil sollte untersucht werden, ob auch für ARCH-Modelle und Gauss'sche graphische Modelle geschicktere Sprungfunktionen zu finden sind als die offensichtlichen. Dabei sollte mit der Kullback-LeiblerDistanz gearbeitet werden.
Christoph Buser Differentialgleichungen mit zufälligen zeitvariierenden Parametern Hans R. Künsch Mar-2003
Abstract: Biologische Prozesse werden mit Differentialgleichungen beschrieben. Die Annahme, dass die Parameter zeitlich invariant sind, erleichtert das Lösen und wird in der Praxis oft getroffen. Die dadurch entstehenden systematischen Fehler werden in Kauf genommen, solange sie nicht zu gross sind.In unserem Beispiel haben wir drei Grössen: die Biomasse (Bakterien), das Substrat (Nahrung) und den Sauerstoff. Es handelt sich um Konzentrationen. Messbar ist nur die Sauerstoffkonzentration. Wir rekonstruieren die anderen Grössen aus diesen Messdaten. Dazu arbeiten wir mit einem Glätter, welcher Daten der Zukunft und der Vergangenheit berücksichtigt.Wir geben die Konstanz der Parameter auf und modellieren diese mit zeitvariierenden stochastischen Prozessen, genauer gesagt mit dem mean-reverting Ornstein-Uhlenbeck Prozess. Das Modell wird flexibler. Der Ansatz ist bayesianisch. Wir suchen nicht die besten Parameter, sondern konstruieren die bedingte Verteilung der Parameter, gegeben die Sauerstoffmessdaten. Das ist nicht in geschlossener Form möglich. Wir verwenden den Metropolis-Hastings Algorithmus und erzeugen eine Markovkette, welche asymptotisch die gewünschte Verteilung hat. Um zweidimensionale Vorschlagsdichten zu umgehen, arbeiten wir mit dem Gibbs-Sampler, der jeweils einen der beiden Parameter wählt, der neu vorgeschlagen wird.In der ersten Simulation nehmen wir im Metropolis-Hastings Algorithmus bedingte Orn-stein-Uhlenbeck Prozesse als Vorschläge für die neuen Parameterwerte. Die Daten werden nicht in die Vorschlagsdichte einbezogen. Wir unterteilen das Zeitintervall $[0,T]$ in zufällige Intervalle gleicher Durchschnittslänge und ändern den Parameter nur auf einem solchen Intervall. Das ist notwendig, um vernünftige Akzeptierungswahrscheinlichkeiten zu erhalten.In der zweiten Simulation benutzen wir die quadratischen Abweichungen der Sauerstoffdaten, um in einem Intervall einen Vorschlag zu konstruieren. Die zusätzliche Information reduziert die Varianz der Vorschlagsdichte. Der Rechenaufwand vergrössert sich.Während des Verfahrens sind wir mit einem Problem konfrontiert. Solange Substrat vorhanden ist, dominiert der Wachstumsparameter den Sterbeparameter. Dieser Maskierungseffekt erhöht die Unsicherheit bei der Bestimmung des Sterbeparameters im ersten Zeitabschnitt. Die Unsicherheit überträgt sich auf die Hauptprozesse. In beiden Simulationen gelingt es meist gut bis sehr gut, die Verteilungen aller Prozesse zu bestimmen. Probleme des Filters, der nur Messwerte der Vergangenheit verwendet, werden durch den Glätter behoben. Der Glätter bringt mehr Daten in das Verfahren und ist dem Filter vorzuziehen.Der Algorithmus ist rechenintensiv. Einerseits ist zum Erreichen der stationären Verteilung eine lange Einschwingphase erforderlich. Andererseits verringern wir die Abhängigkeiten in der Markovkette, indem wir nicht jedes Element verwenden. Daher ist eine grosse Anzahl Schritte im Algorithmus notwendig.Es gibt Varianten der Vorschlagsdichte. Wir verzichten auf den Gibbs-Sampler und arbeiten zweidimensional. Möglicherweise wird so das Zusammenspiel der beiden Parameter besser wiedergegeben und der Maskierungseffekt kompensiert.Ein anderer Algorithmus versucht, mehr Information aus den Sauerstoffabweichungen zu gewinnen, indem deren Vorzeichen berücksichtigt wird.
Eric André Graf Vorhersage des Luftqualitätsindexes Hans R. Künsch Mar-2003
Abstract: In dieser Arbeit geht es um die Entwicklung eines Modells für die Vorhersage eines Luftqualitätsindexes (LQI). Dieser LQI beschreibt in Worten den Zustand der Luft. Der LQI wird stündlich auf dem Internet publiziert (verb|www.in-luft.ch|).Der Luftqualitätsindex LQI zeigt die Wirkung der aktuellen Luftqualität auf die Gesundheit an.Bei der Messung von Luftschadstoffen (Ozon O3, Stickoxide NOx, Stickstoffmonoxid NO und Feinstaub PM10) werden Zahlen erzeugt. Diese geben Auskunft über die Konzentration der einzelnen Stoffe in der Aussenluft. Der LQI wird aufgrund dieser Konzentrations-Angaben berechnet und gibt Auskunft über den Einfluss der Schadstoffe auf das körperliche Befinden. Die Aussage des LQI ist stark generalisiert, sie entspricht aber den heutigen Kenntnissen über kurzfristigen Auswirkungen der Schadstoffe auf den menschlichen Organismus.Für jeden Schadstoff werden nun Indexstufen von 1 bis 6 zugeordnet in Bezug dessen Konzentration.
JavaScript has been disabled in your browser