ZüKoSt: Seminar on Applied Statistics

Would you like to be notified about these presentations via e-mail? Please subscribe here.

×

Modal title

Modal content

Autumn Semester 2012

Date / Time Speaker Title Location
* 18 September 2012
15:00-16:15
Jodi Lapidus
Oregon Health & Science University, Portland, OR, USA
Details

ZueKoSt: Seminar on Applied Statistics

Title A Variable Selection Method for Logistic Regression Models Based on the Receiver Operating Characteristic Curve
Speaker, Affiliation Jodi Lapidus, Oregon Health & Science University, Portland, OR, USA
Date, Time 18 September 2012, 15:00-16:15
Location HG G 19.1
Abstract Recently, investigations dedicated to identifying and evaluating biomarkers have increased considerably. Rather than searching for a single biomarker to diagnose disease or predict an outcome, studies often focus on combining information from multiple sources to improve classification. While there are ample classification methods proposed in the statistical literature, McIntosh and Pepe (2002) showed that decision rules based on the likelihood ratio function, or equivalently, the risk score, are optimal. Logistic regression can be used to generate a risk score, and the c-statistic or area under the receiver operating characteristic curve (AUC) based on that risk score can then be used to assess classification performance. When several candidate biomarkers are collected -- for example, a multi-analyte assay panel that contains hundreds of proteins – it is labor-intensive to check all possible combinations. Additionally, large-scale cohort studies often collect a host of demographic, medical history, clinical information, as well as serum or other laboratory-based biomarkers – and these measures may be used to predict subsequent health outcomes. One could utilize standard variable selection methods (e.g. best subsets) to build classification/prediction models, but these do not guarantee optimal performance based on AUC. We propose a new procedure to select markers for inclusion in a logistic regression model based on improvement in AUC. The procedure begins by noting the equivalence of the non-parametric two sample test statistic (Mann-Whitney U) and AUC. We make use of the jagged ordered multivariate optimization algorithm for partial ROC curves outlines in Baker (2000) to select additional markers. We built in a stopping rule based on the category-free version of the Net Reclassification Index (NRI) proposed by Pencina (2011). We will illustrate the algorithm using various datasets, including a protein biomarker discovery project based on a small preterm labor cohort, and predictors of fracture in a large multi-site cohort of community-dwelling aging US men.
A Variable Selection Method for Logistic Regression Models Based on the Receiver Operating Characteristic Curveread_more
HG G 19.1
27 September 2012
16:15-17:30
Nicolas Städler
Nederlands Kanker Instituut, Amsterdam
Details

ZueKoSt: Seminar on Applied Statistics

Title Penalized hidden Markov models for high-dimensional genome analysis
Speaker, Affiliation Nicolas Städler, Nederlands Kanker Instituut, Amsterdam
Date, Time 27 September 2012, 16:15-17:30
Location HG G 19.1
Abstract Early, pioneering applications of hidden Markov models (HMMs) to genome data (see [1]) considered univariate or low-dimensional observations (such as the gene sequence itself). However, in recent years technological advances have begun to permit truly multivariate studies. For example, using technologies as DamID [2] or ChIP-seq [3] it is now possible to measure the binding of proteins to the DNA across the entire genome for hundreds of proteins and the dimensionality of such approaches continues to increase. In the moderate-to-large dimensional setting, estimation for HMMs remains challenging in practice, due to several concerns arising from the hidden nature of the states. We consider penalized estimation in HMMs with multivariate Normal observations. Penalization and setting of associated parameters is non-trivial in this latent variable setting: we propose a penalty that automatically adapts to number of states K and state-specific sample size and can cope with scaling issues arising from the unknown states. The methodology is adaptive and very general, applying in particular to both low- and high-dimensional settings. Furthermore, our approach explores the number of states K in an efficient manner by exploiting the relationship between parameter estimates for successive candidate values for K. We consider genome-wide binding data of 53 chromatin proteins in the embryonic Drosophila cell line Kc167 (data from [4]). We demonstrate the ability of our approach to yield huge gains in predictive power and deliver far richer estimates than currently used methods. [1] Durbin, R., Eddy, S. R., Krogh, A. and Mitchison, G. J. (1998) Biological Sequence Anal- ysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press. [2] van Steensel, B. and Henikoff, S. (2000) Identification of in vivo DNA targets of chromatin proteins using tethered dam methyltransferase. Nature Biotechnology, 18, 424-428. [3] Park, P. (2009) ChIP-seq: advantages and challenges of a maturing technology. Nature Reviews Genetics, 10, 669-680. [4] Filion, G. J., van Bemmel, J. G., Braunschweig, U., Talhout, W., Kind, J., Ward, L. D., Brugman, W., de Castro, I. J., Kerkhoven, R. M., Bussemaker, H. J. and van Steensel, B. (2010) Systematic protein location mapping reveals five principal chromatin types in drosophila cells. Cell, 143, 212-224.
Penalized hidden Markov models for high-dimensional genome analysisread_more
HG G 19.1
25 October 2012
16:15-17:30
Simon Barthelmé
Bernstein Center for Computational Neuroscience Berlin
Details

ZueKoSt: Seminar on Applied Statistics

Title Point process models for eye movements
Speaker, Affiliation Simon Barthelmé, Bernstein Center for Computational Neuroscience Berlin
Date, Time 25 October 2012, 16:15-17:30
Location HG G 19.1
Abstract The measure of eye movements is central to neuroscience and psychology, not only because of what they reveal about the distribution of attention, but also for their own sake as a central aspect of motor behaviour. Eye movements are quite complex, but often analysis focuses on fixation locations: these are areas in which the eyes stayed still. We show how the analysis of fixation locations can be thought of as a spatial statistics problem, and how point process models can be used to characterise patterns of fixation. We also discuss how the time dimension can be integrated into the analysis through non-parametric Markov (in time) models, and how these can be treated in essentially the same way as inhomogeneous Poisson point process models. Joint work with Hans Trukenbrod (U Potsdam), Ralf Engbert (U Potsdam), and Felix Wichmann (U Tübingen).
Point process models for eye movementsread_more
HG G 19.1
1 November 2012
16:15-17:30
Oliver Sander
Novartis Basel
Details

ZueKoSt: Seminar on Applied Statistics

Title Non-linear mixed effects models in drug development
Speaker, Affiliation Oliver Sander, Novartis Basel
Date, Time 1 November 2012, 16:15-17:30
Location HG G 19.1
Abstract Clinical development of a new drug requires a series of complex decisions, e.g. which study designs to use, doses and dosing regimens to use, characteristics of patients to include and importantly whether to continue or stop development at important milestones. These decisions can be best supported by continuously integrating all available information along the development process. Non-linear mixed effects models provide an elegant framework for integrating relevant information for example on drug dose and timing of doses, exposure, and clinical response across multiple studies. Such a model-based approach allows to make best use of the available longitudinal data, accounts for typical trends as well as for different sources of variability, takes an integrated rather than a study-by-study perspective, and allows for simulation in order to explore what-if scenarios. The talk will present basics of non-linear mixed effects models, frequently used model types, and their applications in drug development projects.
Non-linear mixed effects models in drug developmentread_more
HG G 19.1
20 December 2012
16:15-17:30
Stefano Castruccio
Department of Statistics, The University of Chicago
Details

ZueKoSt: Seminar on Applied Statistics

Title Space time global models for climate ensembles
Speaker, Affiliation Stefano Castruccio, Department of Statistics, The University of Chicago
Date, Time 20 December 2012, 16:15-17:30
Location HG G 19.1
Abstract Climate sensitivity to anthropogenic forcing can be investigated by the use of global climate models which reproduce physical processes on a global scale and predict variables such as temperature. A collection of different runs (model ensemble) can be obtained setting different initial conditions and greenhouse gas concentrations. The purpose of this work is to show how the runs of a precomputed ensemble can be reproduced (emulated) with a global space/time statistical model that addresses the issue of capturing nonstationarities in latitude more effectively than current alternatives in the literature. Exploiting the gridded geometry of the data, the proposed algorithm is able to fit massive datasets with millions of observations within a few hours. In the last part of the talk, an application to the recent CMIP5 multi model ensemble will be introduced and compared with reanalysis data. An extension to modeling land/ocean nonstationarities will also be discussed.
Space time global models for climate ensemblesread_more
HG G 19.1

Notes: events marked with an asterisk (*) indicate that the time and/or location are different from the usual time and/or location and if you want you can subscribe to the iCal/ics Calender.

Organisers: Peter Bühlmann, Leonhard Held, Markus Kalisch, Hans Rudolf Künsch, Marloes Maathuis, Martin Mächler, Maria Mathis, Lukas Meier, Werner Stahel, Sara van de Geer

JavaScript has been disabled in your browser