IFoDS 2026 : Training Workshops

Training Workshops

IFoDS 2026 offers workshops on Friday, July 3, 2026

Jian Huang: Generative Modeling with Diffusion and Flow Models: Theoretical Foundations and Emerging Applications
Jun Yan: Academic Writing in Statistics and Data Science

Generative Modeling with Diffusion and Flow Models: Theoretical Foundations and Emerging Applications

Outline

Diffusion models and Continuous Normalizing Flows (CNFs) currently represent the forefront of generative learning in artificial intelligence. Distinguished by their mathematical elegance and unprecedented empirical success, these continuous-time frameworks leverage stochastic and ordinary differential equations (SDEs/ODEs) to smoothly map tractable base distributions into complex, high-dimensional target data distributions. This continuous-time approach enables both high-fidelity sample generation and exact likelihood estimation.

In this short course, we will explore the theoretical foundations and computational mechanisms underpinning both diffusion models and CNFs, including score-based generative modeling, the flow-matching, and the application of neural ODEs for invertible transformations. Building on these fundamental principles, we will examine the versatility of this framework across a broad spectrum of modern statistical and machine learning tasks. Specific application areas covered in this course will include:

Conditional Generation: Learning and sampling from conditional distributions using diffusion and flow models.
Semi-Supervised Learning: Learning conditional distributions when a large amount of unlabeled data is available, but labeled data are limited.
Conformal Prediction: Constructing rigorous, distribution-free uncertainty quantification and valid prediction intervals.

We will illustrate these methodologies using a diverse array of data modalities, including tabular datasets, images, and motion trajectories. By the end of this short course, attendees will have developed a solid understanding of these generative models and their applications.

Prerequisites

Basic knowledge on mathematical statistics, regression, and math analysis.

Instructor

Dr. Jian Huang is a Chair Professor of Data Science and Analytics in the Department of Applied Mathematics at The Hong Kong Polytechnic University. He obtained his Ph.D. degree in Statistics from the University of Washington in Seattle. His current research interests include deep generative models and inference, statistical inference in deep learning, deep neural network approximation theory, representation learning, and statistical analysis leveraging pretrained large models. He has published widely in the fields of Statistics, Biostatistics, Machine Learning, Bioinformatics and Econometrics. He was designated a highly cited researcher in the field of Mathematics from 2015 to 2019 by the Web of Science group at Clarivate and included in the list of top 2% of the world's most cited scientists by Elsevier BV and Stanford University (2019-2024). He serves on the editorial boards of the Journal of the American Statistical Association and Journal of the Royal Statistical Society (Series B). Professor Huang is a fellow of the American Statistical Association and a fellow of the Institute of Mathematical Statistics.

Academic Writing in Statistics and Data Science

Outline

Targeting graduate students and early-career researchers in statistics/data science, this short course introduces a principled workflow for academic writing with the full lifecycle from idea organization to polished manuscript. The course begins with the structure of statistical papers, emphasizing strong topic sentences, coherent paragraph development, and disciplined use of citations. It then addresses common writing pitfalls specific to quantitative research, including clarity in model description, interpretation of results, and alignment between methods and conclusions. A central component of the course focuses on LaTeX typesetting, where participants will learn best practices for structuring documents, managing references, formatting equations and tables, and maintaining consistency across sections. The course further integrates reproducible writing tools, including Quarto and Git-based workflows, to connect narrative, code, and results in a unified framework. Practical examples drawn from real manuscripts will be used throughout to illustrate both effective and ineffective practices. By the end of the session, participants will have a concrete template and workflow for producing clear, reproducible, and publication-ready research papers.

Instructor

Jun Yan Dr. Jun Yan is a Professor in the Department of Statistics at the University of Connecticut and a Research Fellow at the Center for Population Health at UConn Health. He earned his Ph.D. in Statistics from the University of Wisconsin–Madison in 2003. Prior to joining UConn in 2007, he spent four years at the University of Iowa. Dr. Yan’s methodological research spans networks, spatial extremes, measurement error, survival analysis, clustered data analysis, and statistical computing, often motivated by cross-disciplinary collaborations. His applied work focuses on environmental sciences, public health, and sports, with notable contributions to statistical methods for the detection and attribution of climate change. Committed to open science, he and his collaborators have developed and maintain a suite of open-source R packages. Since 2020, he has served as Editor of the Journal of Data Science. He is a Fellow of both the American Statistical Association and the Institute of Mathematical Statistics.

Prerequisites

Interest in improving writing skills. Experience would be a plus.