CSAS 2025 offers workshops on Friday, April 10, 2025
R is a programming language favored by data scientists for its statistically minded design. This workshop introduces R to those with little or no prior experience. We will cover basic R syntax, data structure, and some basic practical uses of R using locally created data as well as a preexisting basketball dataset.
Tyler Adams is an undecided first-year
undergraduate student at the University of Connecticut.
A computer with R and Positron/RStudio (Positron suggested) downloaded.
On GitHub.
Python is a high level, object oriented, cross platform and open source programming language. These phrases may be unfamiliar, but what's most important is that it's easy to use. With python, you can explore and analyze data, lots of it, with many different statistical modeling and machine learning tools through various python packages. These models and tools are all very useful in sports analytics. However, we must start with the basics. In this workshop, we will focus on basic syntax, data types, functions, loops, conditionals and classes, then combining this knowledge to build a simple algorithm.
Matthew
Venzie is currently a junior double majoring in Applied
Mathematics and Statistics at the University of Connecticut. His
current research is focused on numerical studies of how the
Mamba Neural Network Architecture processes Time Series data.
Outside of academics, I enjoy attending statistics conferences
and have a passion for cooking and reading.
Laptop, no knowledge of python needed. We will be using google colab.
On GitHub.
This workshop will introduce the fundamentals of interactive visualization, a style of data exploration that allows users to engage directly with plots through actions such as hovering, filtering, zooming, and selecting data. We will explore two key tools in R: R Shiny, a framework for building interactive web applications using R, and Plotly, a popular library for creating dynamic, responsive graphics. Participants will learn how to build simple Shiny apps with reactive elements, and incorporate Plotly to create interactive dashboards. Through guided examples, attendees will gain practical experience developing and customizing interactive visualizations.
Lucy Liu is
a third-year undergraduate student majoring in Statistics and
Applied Mathematics at the University of Connecticut. She serves
as the President of the Joint Statistical Club, where she runs
workshops and organizes statistical conferences.
A laptop with R/RStudio installed; basic familiarity with R is recommended, but no prior experience with Shiny or Plotly is required.
On GitHub.
Git is a version control system that tracks changes to files in your project, letting you save snapshots of your work, revert to earlier versions, and collaborate across branches. As the industry standard for version control, Git is essential for any developer workflow. This workshop will teach you the fundamentals you need to start using Git effectively in your projects.
Michael Agostino is a senior
undergraduate majoring in Statistics with a minor in Computer
Science. He is developing expertise in data analysis and
automation, with strong foundational skills in Python.
A laptop with the capability to connect to the internet. Some basic familiarity with the computers command line is preferred.
On GitHub.
Basketball Analytics have changed over the past few decades. In this workshop, we are looking into how shooting patterns have changed in the past 20 years, and how far telemetry analytics have come to in the league. We will take a look at player data and see how performance has changed overtime!
Ankith Nagabandi
is a CS Masters Student. He is interested in sports analytics
and was part of the UConn Data Science Clubs EBoard throughout
his undergrad.
Laptop with an IDE and Python installed. Basic familiarity with basketball, recommended; Jupyter Notebooks.
On GitHub.
NFL analytics has expanded rapidly throughout the league in recent years when it comes to in-game and personnel decisions. This workshop looks to provide a framework for conducting football analysis in R. We will make use of football specific packages and decipher how to leverage them to answer football related questions. This includes retrieving relevant data, creating visualizations, and statistical modeling.
Nicholas
Pfeifer is a fourth-year undergraduate student majoring in
Statistics at the University of Connecticut. He has worked with
the UConn Women's Soccer team as a data analyst through the
UConn Sports Statistics Experiential Learning Program for the
past year. He is passionate about many sports including football
and Formula 1.
Familiarity with R and RStudio, familiarity with football recommended.
On GitHub.
Docker is a platform that allows developers to package applications and their dependencies into lightweight, portable containers that run consistently across different environments. By isolating software in these containers, Docker makes it easier to deploy, scale, and manage applications—ensuring that code runs the same on a developer’s laptop as it does on a production server. Attendees will learn how to build their own Dockerized applications from scratch, with one to two hands- on examples using sports data to demonstrate how containerization simplifies reproducibility, deployment, and collaboration in data-driven projects.
Rahul
Manna is a senior at UConn, pursuing a dual degree in
Statistical Data Science and Mechanical Engineering. Currently,
he is collaborating with Dr. Jun Yan to develop machine learning
models that predict utility demand for buildings across the
UConn campus. He is a research assistant in the Laboratory for
Advanced Manufacturing Reliability (KKim Lab), where he
evaluates materials for implantable bioelectronics. His first
research paper was recently published in Royal Society of
Chemistry's Applied Polymers. Beyond academics and
research, Rahul is an avid Formula One enthusiast, drawn to the
sport’s fusion of cutting-edge engineering, advanced statistics,
data science, and human ingenuity—elements that shape both
on-track performance and the strategic decisions behind the
scenes.
Laptop with Docker and Python installed (AWS optional). Please check the GitHub repository for the most up-to-date list of requirements.
On GitHub.
Player tracking data offers a great opportunity for creating performance metrics in sports. In this workshop, we cover the problem of unsupervised labeling in sports with player tracking data. We focus on American football and use data provided by the NFL Big Data Bowl competition. We first discuss basic tools (visualization and preprocessing) for getting started with tracking data, followed by a case study of clustering pre-snap motion in the NFL.
Quang Nguyen is
a PhD student in the Department of Statistics & Data Science
at Carnegie Mellon University. His current research focuses on
statistical analysis of complex data such as player tracking
data in sports and network data. Quang previously received his
MS in Applied Statistics from Loyola University Chicago and BS
in Mathematics and Data Science from Wittenberg University in
Springfield, Ohio. He is a two-time NFL Big Data Bowl finalist,
and a die-hard supporter of Manchester United.
Familiarity with R and basic data science tasks (data wrangling & data visualization)
On GitHub.
In this workshop, we will introduce core concepts of causal inference, including the potential outcomes, confounding, and causal diagrams, through the lens of sports analytics. Using the MLB infield shift as a motivating case study, we will illustrate how causal methods can be used to evaluate strategic decisions and quantify their impact on game performance. Participants will learn how techniques such as matching, weighting, and instrumental variables enable rigorous causal effect estimation from observational sports data.
Ying Zhou is
an Assistant Professor of Statistics at the University of
Connecticut. Before joining UConn, she was a Postdoctoral Fellow
at the University of Pennsylvania from 2023 to 2024. She
received her Ph.D. in Statistics from the University of Toronto.
Her research focuses on methodological and applied challenges in
causal inference, particularly those arising from complex data
structures.
Basic R programming and introductory probability.
On GitHub.