ZüKoSt Zürcher Kolloquium über Statistik

Modal title

Modal content

Frühjahrssemester 2025

Datum / Zeit

Referent:in

Titel

Ort

20. Februar 2025
15:15-16:00

Rafael M. Frongillo
CU Boulder

Details

ZueKoSt: Seminar on Applied Statistics

Titel	Incentive problems in data science competitions, and how to fix them
Referent:in, Affiliation	Rafael M. Frongillo, CU Boulder
Datum, Zeit	20. Februar 2025, 15:15-16:00
Ort	HG G 19.2
Abstract	Abstract: Machine learning and data science competitions, wherein contestants submit predictions about held-out data points, are an increasingly common way to gather information and identify experts. One of the most prominent platforms is Kaggle, which has run competitions with prizes up to 3 million USD. The traditional mechanism for selecting the winner is simple: score each prediction on each held-out data point, and the contestant with the highest total score wins. Perhaps surprisingly, this reasonable and popular mechanism can incentivize contestants to submit wildly inaccurate predictions. The talk will begin with intuition for the incentive issues and what sort of strategic behavior one would expect---and when. One takeaway is that, despite conventional wisdom, large held-out data sets do not always alleviate these incentive issues, and small ones do not necessarily suffer from them, as we confirm with formal results. We will then discuss a new mechanism which is approximately truthful, in the sense that rational contestants will submit predictions which are close to their best guess. If time permits, we will see how the same mechanism solves an open question for online learning from strategic experts. Bio: Rafael (Raf) Frongillo is an Associate Professor of Computer Science at the University of Colorado Boulder. His research lies at the interface between theoretical machine learning and economics, primarily focusing on information elicitation mechanisms, which incentivize humans or algorithms to predict accurately. Before Boulder, Raf was a postdoc at the Center for Research on Computation and Society at Harvard University and at Microsoft Research New York. He received his PhD in Computer Science at UC Berkeley, advised by Christos Papadimitriou and supported by the NDSEG Fellowship.

Incentive problems in data science competitions, and how to fix themread_more

HG G 19.2

11. April 2025
15:15-16:15

Victoria Stodden
University of Southern California

Details

ZueKoSt: Seminar on Applied Statistics

Titel	Levering AI in Scientific Research: Transparency, Reproducibility, and Trust
Referent:in, Affiliation	Victoria Stodden, University of Southern California
Datum, Zeit	11. April 2025, 15:15-16:15
Ort	HG G 19.1
Abstract	In the last 10 years colossal cloud infrastructure investments behind the rise of near-ubiquitous global mobile technologies have trickled down to scientific research through innovative infrastructure including cloud compute and storage, I/O tools, data analysis and modeling frameworks, which in turn have generated broad and expanding communities of users and supporters. Arguably, the recent success of Large Language Models were catalyzed by the resulting technological innovations of 1) open and accessible massive data, and 2) re-executable discovery pipelines for model estimation and prediction. These changes are deeply disruptive to the research community since they open new paths to knowledge creation that were previously inaccessible and largely culturally unknown. The scientific community is faced with the challenge of responding to changes in research modalities due to these technological innovations. Research is now conducted as an “Olympics” of benchmarked competitions between Machine Learning models leveraged by the opaque results of Large Language Models, access to massive data, and redeployment of complex scientific discovery workflows. In this seminar I provide a roadmap of challenges and responses by various stakeholders in the research community to ensure that scientific results remain reliable and reproducible, and secure within a position of trust in the broader society.

Levering AI in Scientific Research: Transparency, Reproducibility, and Trustread_more

HG G 19.1

8. Mai 2025
15:15-16:15

Toby Hocking

Details

ZueKoSt: Seminar on Applied Statistics

Titel	Using and contributing to the data.table package for efficient big data analysis
Referent:in, Affiliation	Toby Hocking ,
Datum, Zeit	8. Mai 2025, 15:15-16:15
Ort	HG
Abstract	data.table is an R package with C code that is one of the most efficient open-source in-memory database packages available today. First released to CRAN by Matt Dowle in 2006, it continues to grow in popularity, and now over 1500 other CRAN packages depend on data.table. This talk will discuss basic and advanced data manipulation topics, and end with a discussion about how you can contribute to data.table.

Using and contributing to the data.table package for efficient big data analysisread_more

Hinweise: das hervorgehobene Ereignis markiert das nächste eintretende Ereignis und wenn Sie möchten, können Sie den iCal/ics-Kalender abonnieren.

Archiv: FS 25 HS 24 FS 24 HS 23 FS 23 HS 22 FS 22 HS 21 FS 20 HS 19 FS 19 HS 18 FS 18 HS 17 FS 17 HS 16 FS 16 HS 15 FS 15 HS 14 FS 14 HS 13 FS 13 HS 12 FS 12 HS 11 FS 11 HS 10 FS 10 HS 09

ZüKoSt Zürcher Kolloquium über Statistik

Modal title

Frühjahrssemester 2025

ZueKoSt: Seminar on Applied Statistics

ZueKoSt: Seminar on Applied Statistics

ZueKoSt: Seminar on Applied Statistics

D-MATH intern