CSAS 2025 offers 10 workshops in four tracks on Friday, April 11, 2025
R is an open-source programming language for statistical computing and graphics. R offers a wide range of graphical and statistical tools, including time-series analysis, classification, clustering, and linear and nonlinear modeling. This workshop introduces R to those who have had little to no prior experience. Topics include: 1) an overview of basic R; 2) data structure of R; 3) data management of R; and 4) some useful package of R. A real-life sports dataset will be used to provide a better understanding of R.
Lucy Liu is a second-year undergraduate student majoring in Statistics and Applied Mathematics at the University of Connecticut. She serves as the President of the Joint Statistical Club, where she runs workshops and organizes statistical conferences.
A laptop with R/RStudio installed; previous experience using R is NOT required; basic programming knowledge would be helpful but NOT required.
On GitHub
As a popular high-level language, Python has many excellent features that data scientists like: easy to learn, object oriented, cross-platform, open source, and many extensions for machine learning. It is widely used in many data science challenges from the front end to the back end. Good Python programming makes it easier to analyze sports data. The workshop will cover the following contents in-class: Python data types, methods for moving data, and functions.
Charitarth Chugh is a senior majoring in Computer Science at the University of Connecticut and also the President of UConn Artificial Intelligence Club. This year, he is researching the intersections of cybersecurity and deep learning for attack detection alongside multimodal (audio, text, and video) models.
A laptop with internet access. We will be working with Jupyter Notebooks on Google
On GitHub.
Visualizing data, particularly in sports analytics, provides valuable insights for informed decision-making, reveals underlying patterns in the data, and enhances communication among stakeholders. Leveraging Python’s versatility and powerful data visualization libraries enables the creation of well-crafted visual narratives across diverse domains. Matplotlib, Python's most popular visualization package, offers extensive customization options and precise control, making it a preferred tool for crafting detailed and impactful visualizations. This workshop will introduce Matplotlib’s robust plotting capabilities, showcase practical examples of data visualizations in baseball and basketball, and equip participants with versatile techniques applicable across any domain.
Rahul Manna is a junior pursuing a dual degree in Statistical Data Science and Mechanical Engineering. He is currently working as a research assistant in the Laboratory for Advanced Manufacturing Reliability (KKim Lab), where he tests materials for implantable bioelectronics and uses Python and Matplotlib to analyze and visualize data. Outside the classroom and workspace, Rahul enjoys Formula One, where the fusion of cutting-edge engineering, advanced statistics, data science, analytics, and human ingenuity drives both the on-track performances and the strategic decisions behind the scenes.
Familiarity with Python and Jupyter Notebooks. Recommended: familiarity with Pandas. Please visit GitHub repository for more information.
On GitHub
This workshop aims to provide participants with a solid base for directing their own basketball analytics projects using the R programming language. We will review some of the R packages that are designed for acquisition of NBA data, creating visualizations like shot charts and assist networks, and performing statistical analyses. Additionally, for those whose March Madness bracket didn't perform as well as they had hoped, this workshop will go over some basic strategies to improve predictions for future tournaments.
Addison McGhee is a Preceptor in Data Science at Yale University. He earned his M.S. in biostatistics at the Harvard T.H. Chan School of Public Health, and his B.S. in statistics at UW-Madison. Before coming to Yale, he interned at the pharmaceutical company Repare Therapeutics. Outside of his work, he enjoys working on sports analytics projects for baseball and football.
Familiarity with R and RStudio. Recommended to have some familiarity with basketball.
On GitHub
In the modern age of the internet, there are lots of websites that store a variety of valuable data. In the world of sports, many of this data is publicly available but not always in an easily accessible format. Web scraping is an automated process used to gather data from websites that allows us to access and collect large amounts of data directly from web pages if the information is not available for download. This workshop will 1) introduce the basics of web scraping 2) show how to webscrape sports related websites with Python, including examples and 3) discuss how to ensure you are scraping ethically.
Melanie Desroches is a senior at the University of Connecticut majoring in Statistics and minoring in Computer Science. She is currently working with the UConn Men’s Ice Hockey team to determine how prospects' performance will translate to the NCAA and more specifically, the Hockey East conference. Melanie is a hockey fan but also loves learning about a variety of different sports and how statistics can be used to answer questions within the sports world.
In order to perform the code we will be discussing, you need to have downloaded Python, the beautifulsoup4, selenium, pandas, and requests packages, and a web-driver of your choosing (preferably Chrome Driver).
On GitHub
This goal of this presentation is to demonstrate different tennis analyses using Python. We will be using different Python packages to explore various information such as serving success, performance under pressure, and offensive/defensive playstyles. The data will be collected from the Association of Tennis Professionals (ATP) Tennis Statistics dataset, created by Jeff Sackmann.
Jaden Astle is a fourth-year undergraduate at UConn studying Statistical Data Science & Cognitive Science with a minor in Computer Science. He previously interned as a Data Engineer for The Hartford Insurance, where he focused on developing a model training pipeline that was supported by the AWS Cloud. Jaden serves as the President of UConn's Data Science Club, where he leads technical workshops and helps plan other professional development events.
Familiarity with Python & Jupyter Notebooks
On GitHub
Player tracking data offers a great opportunity for creating performance metrics in sports. This workshop will provide a detailed walkthrough of how to build a metric with player tracking data. We will focus on American football and use data provided by the NFL Big Data Bowl competition. The workshop will feature (1) an overview of tracking data and basic visualization and data preprocessing; (2) metric formulation and application to NFL data; and (3) metric validation and statistical properties of sports metrics.
Quang Nguyen is a third-year PhD student in the Department of Statistics & Data Science at Carnegie Mellon University. His current research focuses on statistical analysis of complex data such as player tracking data in sports and network data. Quang previously received his MS in Applied Statistics from Loyola University Chicago and BS in Mathematics and Data Science from Wittenberg University in Springfield, Ohio. He is a two-time NFL Big Data Bowl finalist, and a die-hard supporter of Manchester United.
Familiarity with R and basic data science tasks (data wrangling & data visualization)
On GitHub
In this workshop, we will introduce the fundamentals of causal inference, including the potential outcomes framework and causal discovery, emphasizing the importance of causality in sports analytics. We will apply these concepts to NBA data to address specific challenges and uncover actionable insights. The workshop will also explore recent advancements in causal machine learning techniques, such as meta-learners and causal discovery with temporal ordering.
Shinpei Nakamura Sakai Shinpei Nakamura-Sakai is a Ph.D. candidate in Statistics and Data Science at Yale University. His research in sports analytics introduces a framework for analyzing age curves to examine how factors like rest days impact athlete performance across career stages. Shinpei has industry experience as a quantitative analytics associate at JPMorgan Chase and an applied scientist at Amazon, and he has won Best Poster Awards at both UCSAS 2022 and NESSIS 2023.
Familiarity with R and Python.
On GitHub
TBA
Zuri Hunter TBA
TBA
On GitHub
We conduct a workshop for educators to gain exposure to materials generated by the SCORE Network and to inspire them to use these educational materials with their students. The workshop will consist of an introduction to the SCORE Network, an investigation into pedagogical materials available including specific modules, a discussion of how to utilize these materials with students and a period of brainstorming about potential modules. This workshop will be led by senior personnel from the SCORE Network and attendees will be asked to bring a laptop or tablet for the workshop.
Dr. Rachel Gidaro is an Assistant Professor in the Department of Mathematical Sciences at the United States Military Academy, West Point, New York. She earned her Bachelors of Science in Mathematics in 2019 from Colorado Mesa University. Beginning in 2019, she attended Baylor University and earned a Master of Science in Statistics in 2020 before completing her Doctor of Philosophy in Statistics in 2024. Her research interests in statistics are focused on discrete variate time series analysis. In her free time, Rachel enjoys reading, exercising, and working with the cheer team, the Rabble Rousers, at USMA.
TBA
On GitHub