Sessions


Session 1: Complex Data Analysis

Quantifying Individual Risk for Binary Outcome: Bounds and Inference

Yue Liu, Renmin University of China

Abstract: Understanding treatment heterogeneity is crucial for reliable decision-making in treatment evaluation and selection. While the conditional average treatment effect (CATE) is commonly used to capture treatment heterogeneity induced by covariates and design individualized treatment policies, it remains an averaging metric within subpopulations. This limitation prevents it from unveiling individual-level risks, potentially leading to misleading results. This article addresses this gap by examining individual risk for binary outcomes, specifically focusing on the fraction negatively affected (FNA) – a metric assessing the percentage of individuals experiencing worse outcomes with treatment compared to control. Under the strong ignorability assumption,FNA is unidentifiable, and we find that previous Fr´echet-Hoeffding bounds are usually wide and unattainable in practice. By introducing a plausible positive correlation assumption for the potential outcomes, we obtain significantly improved bounds compared to previous studies. We show that even with a positive and statistically significant CATE, the lower bound on FNA can be positive, i.e., in the best-case scenario many units will be harmed if receiving treatment. Additionally, we establish a nonparametric sensitivity analysis framework for FNA using the Pearson correlation coefficient as the sensitivity paramete# thereby exploring the relationships among the correlation coefficient, FNA, and CATE. We also present a practical and tractable method for selecting the range of correlation coefficients. Furthermore, we propose flexible estimators for the refined FNA bounds and prove their consistency and asymptotic normality. Extensive simulations are conducted to evaluate the effectiveness of the proposed estimators. We apply our method to the right heart catheterization (RHC) data to explore the percentage of patients harmed by RHC.

Bio: 刘越,中国人民大学讲师,2019年博士毕业于北京大学。多篇文章发表于Journal of Machine Learning Research(JMLR), Artificial Intelligence(AIJ), IEEE Transactions on Knowledge and Data Engineering(TKDE), IEEE Transactions on Neural Networks and Learning Systems(TNNLS), International Conference on Machine Learning(ICML), Knowledge Discovery and Data Mining(KDD),The Conference on Uncertainty in Artificial Intelligence(UAI)等机器学习与统计学期刊及会议。 研究兴趣主要包括因果推断,贝叶斯网络以及基于因果推断的机器学习算法等。

MedReader: a query-based multisource AI learner of medical publications

Wenxuan Zhong, University of Georgia

Abstract: As the volume and velocity of medical publications have increased at an unprecedented pace, a computational-based learning system is essential to avoid expensive and time-consuming human annotations which in general hinders the deployment of novel therapeutic methods in clinical practice. To achieve this goal, we develop Medreade# a novel multi-channel learning system that can summarize (topic learning), understand (knowledge-graph constructing) and generalize (hypothesis generating) knowledge simultaneously from query-related publications. As with human learne# Medreader can access how faithfully a discovered concept is by using data beyond publications and conducting an novel enrichment analysis. We applied Medreader to a covid-19 related publication set, which include 4,117 abstracts that are deposited into MEDLINE database from 1/1/2020 to 4/30/2020. The hypothesis generated from the 4,117 publications significantly overlapped with the hypothesis that appeared in subsequent publications. For example, 71% of the predicted gene-gene interactions and 100% of the predicted disease-disease interactions are enriched in subsequent articles. Moreove# the whole learning process only takes 3 minutes-a negligible time-frame for clinical practice. Our analysis show that this system can help us to learn from publications at an unprecedented speed and scale. Such learning s ystem can help us to learn from publications at an unprecedented speed and scale. Such learning system not only help us promptly summarize but also affords opportunity for discovery.

Bio: Dr. Zhong is an Athletic Association Professor in the Department of Statistics at the University of Georgia. She holds a B.S. in Statistics from Nankai University, China, and a Ph.D. in Statistics from Purdue University. After completing her Ph.D., Dr. Zhong pursued a postdoctoral fellowship in Statistics and Computational Biology at Harvard University. She served as an Assistant Professor in the Department of Statistics at the University of Illinois at Urbana-Champaign from 2007 to 2013, before joining the University of Georgia in 2013. Dr. Zhong is a ASA Fellow and an elected Fellow of the International Statistical Institute. She is the co-Director of the big data analytics lab.

Statistics in Hospital Research and Quality Improvement Projects

Liping Tong, Advocate Aurora Healthcare

Abstract: With the advent of electronic medical records (EMR), hospitals find themselves overwhelmed with vast quantities of patient data with diverse applications. Given the critical nature of medical data storage and utilization, numerous specialized companies such as Epic, Oracle, and Cerner have emerged. Moreove# hospitals typically employ their own cadre of experts including statisticians, data analysts, and data scientists. Data analysis in hospitals spans a spectrum, ranging from fundamental tasks like data summarization and demonstration using tables and plots to more intricate efforts involving the refinement and creation of statistical methods and models. In this presentation, I will illustrate the necessity of connecting time-dependent survival models with logistic models through a compelling example. Additionally, I will underscore the significance of selecting the most suitable analytical tool to maximize insights from data, drawing from a concrete case study.

Bio: Liping Tong is currently a senior statistician in Advocate Aurora Health, leading a team of research and analysis. Liping got her B.A. in 1997 from the Department of Mathematics, Nankai University. She had two years of graduate school in Nankai before going to the Department of Statistics, University of Chicago in 1999. Liping got her PhD in statistics in 2004 and started to work as a research associate in the Department of Statistics, University of Washington. Starting from 2007, she became an Assistant Professor of Department of Mathematics, Loyola University Chicago. In 2010, she switched to the Department of Public Health Sciences, Loyola University, Stritch School of Medicine. In 2015, she started her career in Advocate Aurora Healthcare, as a senior statistician. The main responsibilities are: 1. Lead the development of prediction models based on millions of patients’ electronic medical records for questions such as readmission risk or chronic disease management. Statistical and computational methods, such as logistic models, hierarchical models, survival analysis, support vector machine, random forest and boosting methods, are used to optimize predictions. 2. Lead the analysis on the evaluation of interventions to reduce adverse events such as emergency department visits and 30-day readmissions after hospitalization. Cox Proportional Hazard models with time dependent covariates are applied in the analysis. 3. Mentor interns, junior statisticians, and data analysts on multiple projects, including evaluation of the program of Palliative Care, application of deep learning and big data strategy in medical science, and so on. 4. Involve in other team members’ projects as a reliable source of expert support. In addition, Liping has an active collaboration with the professors from the Department of Psychiatry, University of Illinois at Chicago since 2020. The main interest is in the data collected for the Chicago Follow-up Study (CFS) that was designed as a naturalistic prospective longitudinal, multi-follow-up research study to investigate the course, outcome, symptomatology, effects of medication, and recovery in participants with serious mental illness disorders. Statistical methods, such as logistic generalized estimating equation (GEE) models, the latent class analysis (LCA), network analysis and clustering methods, have been applied for a wide range of hypotheses of interest.

On detecting the effect of exposure mixture

Zhezhen Jin, Columbia University

Abstract: To study the effect of exposure mixture on the continuous health outcomes, one can use the linear model with a weighted sum of multiple standardized exposure variables as an index predictor and its coefficient for the overall effect. The unknown weights typically range between zero and one, indicating contributions of individual exposures to the overall effect. Because the weight parameters present only when the parameter for overall effect is non-zero, testing hypotheses on the overall effect can be challenging, especially when the number of exposure variables is above two. This paper presents a working model based approach to estimate the parameter for overall effect and to test specific hypotheses, including two tests for detecting the overall effect and one test for detecting unequal weights when the overall effect is evident. The statistics are computationally easy and one can apply existing statistical software to perform the analysis. A simulation study shows that the proposed estimators for the parameters of interest may have better finite sample performance than some other estimators.

Bio: Zhezhen Jin is Professor of Biostatistics in the Department of Biostatistics in Mailman School of Public Health at Columbia University. He received his BS and MS in probability and statistics from Nankai University in 1989 and in 1992 respectively, MA in applied mathematics from the University of Southern California in 1994 and Ph.D. degree in Statistics from Columbia University in 1998. After 1998-2000 two years of postdoctoral studies at Harvard School of Public Health, he returned to Columbia as a faculty member in the Department of Biostatistics in 2000. He has been conducting statistical and biostatistical methodological research on resampling methods, survival analysis, nonparametric and semiparametric methods, smoothing methods, and statistical computing. He has also been collaborating with clinical investigators to address statistical issues in neurology, cardiology, oncology, transplantation, psychiatry, pathology and alternative medicine. He was a co-founding editor of the Contemporary Clinical Trials Communication. He is Statistical Editor for the Journal of American Cardiology College—Cardiovascular Imaging. He has served as an associate editor for several statistical journals including Journal of American Statistical Association, Statistica Sinica, Lifetime Data Analysis, Communications for Statistical Applications and Methods, Journal of Statistical Theory and Practice, and is on the editorial board for Kidney International, the Journal of the International Society for Nephrology. He received Career Award from the National Science Foundation in 2002. He is a Fellow of the American Statistical Association, a Fellow of the Institute of Mathematical Statistics, and an elected member of International Statistical Institute. He served as the President of the International Chinese Statistical Association (ICSA) in 2022.

Fitting an Accelerated Failure Time Model with Time-dependent Covariates via Nonparametric Mixture

Ju-Young Park, Yonsei University

Abstract: An accelerated failure time (AFT) model is a popular regression model in survival analysis. It models the relationship between the failure time and a set of covariates via a log link with an addition of a random error. The model can be either prametric or semiparametric depending on the degree of sepcification of the error distribution. The covariates are usually assumed to be fixed - 'time independent'. In many biomedical studies, however, 'time-dependent' covariates are frequently observed. In this work, we consider a semiparametric time-dependent AFT model. We assum that the distribution of the baseline failure time as an infinite scale mixture of Gaussian densities. Thus, this model is higly flexible compared to that assumes a one-component parametric density. We consider a maximum likelihood estimation and propose an algorithm based on the constrain newton method for estimating model parameters and mixing distributions. The proposed methods are investigated via simulation studies to assess the finite sample properties. The proposed methods are illustrated with a real data set.

Bio: I am a Ph.D. student majoring in Applied Statistics at Yonsei University in South Korea. I am conducting research on Survival Analysis under the guidance of my advisor Prof. Sangwook Kang. My research focuses on survival models that take time-dependent coviarates into account. Thank you for inviting me to this valuable opprotunity.


Session 2: Mordern Statitistical Methods on Time Series and Funictional Data

A Stock Price Trend Prediction Model Based on Supply Chain Matrix

Wu Wang, Renmin University of China

Abstract: This work explores the integration of industry chain network matrices into graph neural network models to enhance the predictive ability of deep learning factors for future stock returns. Historically, subjective investors have predominantly utilized industry chain analysis methods but have been constrained by data limitations, preventing their full utilization in quantitative investment. With natural language processing technology's maturation, data providers can extract relationships between companies and products from annual reports, combining expert knowledge to construct industry chain upstream and downstream relationships. Based on this foundation, we compute a matrix of interrelatedness between listed companies derived from the industry chain. Subsequently, this matrix is introduced into the graph neural network model as prior information. Experimental results demonstrate that our proposed model outperforms the baseline GRU model in terms of predictive performance on the test set, with significantly increased IC mean values and decreased IC standard deviations. This finding is consistent with existing research, while the differences in the stock pool and graph structure information selected in this study contribute as a supplement to the field. Additionally, this research extensively explores and explains the model structure, lookback periods, training labels, and other factors through numerous experiments.

Bio: 王武,中国人民大学数理统计系讲师,沙特阿拉伯阿卜杜拉国王科技大学博士后,复旦大学数理统计博士。主要研究方向是函数型数据分析、空间数据分析、机器学习和深度学习方法在能源、工业领域的应用等。成果发表于Biometrics,Scandinavian Journal of Statistics等期刊。

Testing conditional quantile independence with functional covariate

Jie Li, Renmin University of China

Abstract: We propose a new nonparametric conditional independence test for a scalar response and a functional covariate over a continuum of quantile levels. We build a Cramer–von Mises-type test statistic based on an empirical process indexed by random projections of the functional covariate, effectively avoiding the “curse of dimensionality” under the projected hypothesis which is almost surely equivalent to the null hypothesis. The asymptotic null distribution of the proposed test statistic is obtained under some mild assumptions. The asymptotic global and local power properties of our test statistic are then investigated. We specifically demonstrate that the statistic is able to detect a broad class of local alternatives converging to the null at the parametric rate. Additionally, we recommend a simple multiplier bootstrap approach for estimating the critical values. The finite-sample performance of our statistic is examined through a number of Monte Carlo simulation experiments. Finally, an analysis of an EEG data set is used to show the utility and versatility of our proposed test statistic.

Bio: 李杰,中国人民大学统计学院讲师。2022年毕业于清华大学获统计学博士学位。主要研究方向为函数型数据分析和时间序列分析。目前主持国家自然科学基金青年项目,中国博士后科学基金面上项目。在Biometrics、Statistica Sinica等期刊发表论文多篇。

Unified Principal Components Analysis of Irregularly Observed Functional Time Series

Zerui Guo, Sun Yat-sen University

Abstract: Irregularly observed functional time series (FTS) are increasingly available in many real-world applications. To analyze FTS, it's crucial to account for both serial dependencies and the irregularly observed nature of functional data. However existing methods for FTS often rely on specific model assumptions in capturing serial dependencies, or cannot handle the irregular observational scheme of functional data.To solve these issues, one can perform dimension reduction on FTS via functional principal component analysis (FPCA) or dynamic FPCA. Nonetheless, these two methods may either be not theoretically optimal or too redundant to represent serially dependent functional data. In this article, we introduce a novel dimension reduction method for FTS based on the framework of dynamic FPCA.Through a new concept called optimal functional filters, we unify the theories of FPCA and dynamic FPCA, providing a parsimony and optimal representation for FTS adapting to its serial dependence structure. This framework is referred to as principal analysis via dependency-adaptivity (PADA). Under a hierarchical Bayesian model, we establish an estimation procedure for dimension reduction via PADA. Our method can be used for both sparsely and densely observed FTS, and is capable of predicting future functional data. We investigate the theoretical properties of PADA and demonstrate its effectiveness through extensive simulation studies. Finally, we illustrate our method via dimension reduction and prediction of daily PM2.5 data.

Bio: 郭泽睿,中山大学数学学院博士生,主要研究领域为函数型数据分析、流行病建模等。相关成果发表于European Journal of Epidemiology、中国预防医学杂志等国内外期刊。

Forecasting Interval for Autoregressive Time Series with trend

Qin Shao, University of Toledo

Abstract: We propose a kernel distribution estimator (KDE) for the cumulative distribution function of Autoregressive Time Series with trend. We show that under certain assumptions, this estimator is as efficient as an infeasible KDE that assumes the trend is known. The oracular KDE is used to estimate the quantiles on which a forecasting interval is constructed. Simulation studies confirm the asymptotic properties of the KDE estimator. To illustrate the method, we apply it to monthly average hourly wages data.

Bio: Dr. Qin Shao obtained her bachelor's and master's degrees from Nankai University in 1990 and 1993, respectively. In 1997 she entered the doctoral program in Statistics at the University of Georgia. Upon graduating in 2002, she took up a tenure-track position as Assistant Professor of Statistics at the University of Toledo. She achieved the rank of Professor in 2013. Her research interests encompass both the methodology and applications of statistics. One of her major research interests has been concerned with semi-parametric time series modeling. In addition, she have been always interested in using statistics to address important issues in society.

Inference for Quantile Change Points in High-Dimensional Time Series

Mengyu Xu, University of Central Florida

Abstract: Change-point detection methods that are based on quantiles can effectively detect changes in extreme values. In this study, we propose a novel change-point detection scheme that utilizes fixed quantiles of moving sums from high-dimensional time series data. Our approach employs a moving sum (MOSUM) test statistic that aggregates the component series by the norm. We investigate the asymptotic properties of the proposed test statistic in the context of weak temporal dependent high-dimensional time series, while also allowing for strong and weak cross-sectional dependence. Our analysis relies on a powerful uniform Bahadur representation result. Specifically, we extend the existing uniform Bahadur representation to the high-dimensional setting for dependent data. A simulation study demonstrates the effectiveness of our approach.

Bio: Mengyu Xu received the Bachelor’s Degree in Statistics from Renmin University of China, Beijing, China in 2010. She received the M.S. and Ph.D. degree from the Department of Statistics in the University of Chicago, Chicago, USA in 2012 and 2016. Her research interests include the covariance matrix estimation and time-varying network recovery from high-dimensional time series and the distribution theory of quadratic forms and high-dimensional hypotheses test.



Session 4: Machine Learning and Data Science

Accelerating Convergence in Bayesian Few-Shot Classification

Feng Zhou, Renmin University of China

Abstract: Bayesian few-shot classification has been a focal point in the field of few-shot learning. This paper seamlessly integrates mirror descent-based variational inference into Gaussian process-based few-shot classification, addressing the challenge of non-conjugate inference. By leveraging non-Euclidean geometry, mirror descent achieves accelerated convergence by providing the steepest descent direction along the corresponding manifold. It also exhibits the parameterization invariance property concerning the variational distribution. Experimental results demonstrate competitive classification accuracy, improved uncertainty quantification, and faster convergence compared to baseline models. Additionally, we investigate the impact of hyperparameters and components.

Bio: 周峰,中国人民大学统计学院讲师,中国人民大学杰出青年学者。主持国家自然科学基金青年项目,中国博士后基金特别资助、面上资助,入选博士后国际交流计划引进项目。主要研究方向包括统计机器学习、贝叶斯方法、随机过程、时空数据分析等。在JMLR、MLJ、STCO、NeurIPS、ICLR、AAAI、AISTATS等期刊/会议发表论文20余篇。

A Variable Selection Tree and Its Random Forest

Zhibo Cai, Renmin University of China

Abstract: A novel screening approach is proposed by partitioning the sample into subsets sequentially and creating a tree-like structure of sub-samples called the SIS-tree. SIS-tree is straightforward to implement and can be integrated with various measures of dependence. Theoretical results are established to support this approach, including its "sure screening property". Additionally, SIS-tree is extended to a forest with improved performance. Through simulations, the proposed methods are demonstrated to have great improvement comparing with existing SIS methods. The selection of a cutoff for the screening is also investigated through theoretical justification and experimental study. As a direct application of the screening, the classification of high-dimensional data is considered, and it is found that the ranking and cutoff can substantially improve the performance of existing classifiers.

Bio: 蔡智博现任中国人民大学统计学院数据科学与大数据统计系讲师,主要研究兴趣包括充分降维、变量选择及其在机器学习中的应用,生成式人工智能的理论与应用研究。学术论文在JASA、NeurIPS、ICLR等学术期刊和会议上发表。

U.S.-U.K. PETs Prize Challenge: Anomaly Detection via Privacy-Enhanced Federated Learning

Xinyue Wang, Rutgers University

Abstract: Privacy Enhancing Technologies (PETs) have the potential to enable collaborative analytics without compromising privacy. This is important for collaborative analytics can allow us to really extract value from the large amounts of data that are collected in domains such as healthcare, finance, and national security, among others. In order to foster innovation and move PETs from the research labs to actual deployment, the U.S. and U.K. governments partnered together in 2021 to propose the PETs prize challenge asking for privacy-enhancing solutions for two of the biggest problems facing us today: financial crime prevention and pandemic response. This article presents the Rutgers ScarletPets privacy-preserving federated learning approach to identify anomalous financial transactions in a payment network system (PNS). This approach utilizes a two-step anomaly detection methodology to solve the problem. In the first step, features are mined based on account-level data and labels, and then a privacy-preserving encoding scheme is used to augment these features to the data held by the PNS. In the second step, the PNS learns a highly accurate classifier from the augmented data. Our proposed approach has two major advantages: 1) there is no noteworthy drop in accuracy between the federated and the centralized setting, and 2) our approach is flexible since the PNS can keep improving its model and features to build a better classifier without imposing any additional computational or privacy burden on the banks. Notably, our solution won the first prize in the US for its privacy, utility, efficiency, and flexibility.

Bio: Xinyue Wang received her Ph.D. from Rutgers University in Newark, NJ, USA. Her research interests lie in the interdisciplinary areas of data privacy and security, deep learning, and their applications in various fields such as bioinformatics and finance.

Partition-Insensitive Parallel ADMM Algorithm for High-dimensional Linear Models

Jiancheng Jiang, University of North Carolina

Abstract: The parallel alternating direction method of multipliers (ADMM) algorithms have gained popularity in statistics and machine learning due to their efficient handling of large sample data problems. Howeve# the parallel structure of these algorithms, based on the consensus problem, can lead to an excessive number of auxiliary variables when applied to highdimensional data, resulting in large computational burden. In this paper we propose a partition-insensitive parallel framework based on the linearized ADMM (LADMM) algorithm and apply it to solve nonconvex penalized high-dimensional regression problems. Compared to existing parallel ADMM algorithms, our algorithm does not rely on the consensus problem, resulting in a significant reduction in the number of variables that need to be updated at each iteration. It is worth noting that the solution of our algorithm remains largely unchanged regardless of how the total sample is divided, which is known as partition-insensitivity. Furthermore, under some mild assumptions, we prove the convergence of the iterative sequence generated by our parallel algorithm. Numerical experiments on synthetic and real datasets demonstrate the feasibility and validity of the proposed algorithm. We provide a publicly available R software package to facilitate the implementation of the proposed algorithm.

Bio: Dr. Jiancheng Jiang is Professor of statistics at the Department of mathematics and Statistics & School of data Science, University of North Carolina at Charlotte. His research interest includes Financial Econometrics, Theoretical and Applied Statistics, Biostatistics, and Data Science.

Deep Neural Network-based Accelerated Failure Time Models Using Rank Loss

Sangwook Kang, Yonsei University

Abstract: An accelerated failure time (AFT) model assumes a log-linear relationship between failure times and a set of covariates. In contrast to other popular survival models that work on hazard functions, the effects of covariates are directly on failure times, the interpretation of which is intuitive. The semiparametric AFT model that does not specify the error distribution is sufficiently flexible and robust to depart from the distributional assumption. Owing to its desirable features, this class of model has been considered a promising alternative to the popular Cox model in the analysis of censored failure time data. However, in these AFT models, a linear predictor for the mean is typically assumed. Little research has addressed the non-linearity of predictors when modeling the mean. Deep neural networks (DNNs) have received much attention over the past few decades and have achieved remarkable success in a variety of fields. DNNs have a number of notable advantages and have been shown to be particularly useful in addressing non-linearity. Here, we propose applying a DNN to fit AFT models using Gehan-type loss combined with a sub-sampling technique. Finite sample properties of the proposed DNN and rank-based AFT model (DeepR- AFT) were investigated via an extensive simulation study. The DeepR-AFT model showed superior performance over its parametric and semiparametric counterparts when the predictor was non-linear. For linear predictors, DeepR-AFT performed better when the dimensions of the covariates were large. The superior performance of the proposed DeepR-AFT was demonstrated using three real datasets.

Bio: BS in Statistics, Seoul National University, South Korea, 2001 PhD in Biostatistics, University of North Carolina at Chapel Hill, 2007 Assistant Professor, University of Georgia, US (2007-2010), University of Connecticut, US (2010 - 2013) Assistant, Associate, Full Professor, Yonsei University, South Korea (2013 - )