Training Workshops

CSAS 2025 offers workshops on Friday, April 10, 2025

Track A (Introductory Level)
Track B (Intermediate Level)
Track C (Advanced Level)

Introduction to R

Outline

R is a programming language favored by data scientists for its statistically minded design. This workshop introduces R to those with little or no prior experience. We will cover basic R syntax, data structure, and some basic practical uses of R using locally created data as well as a preexisting basketball dataset.

Instructor

Tyler Adams Tyler Adams is an undecided first-year undergraduate student at the University of Connecticut.

Prerequisites

A computer with R and Positron/RStudio (Positron suggested) downloaded.

Training Materials

On GitHub.


Introduction to Python

Outline

Python is a high level, object oriented, cross platform and open source programming language. These phrases may be unfamiliar, but what's most important is that it's easy to use. With python, you can explore and analyze data, lots of it, with many different statistical modeling and machine learning tools through various python packages. These models and tools are all very useful in sports analytics. However, we must start with the basics. In this workshop, we will focus on basic syntax, data types, functions, loops, conditionals and classes, then combining this knowledge to build a simple algorithm.

Instructor

Matthew Venzie Matthew Venzie is currently a junior double majoring in Applied Mathematics and Statistics at the University of Connecticut. His current research is focused on numerical studies of how the Mamba Neural Network Architecture processes Time Series data. Outside of academics, I enjoy attending statistics conferences and have a passion for cooking and reading.

Prerequisites

Laptop, no knowledge of python needed. We will be using google colab.

Training Materials

On GitHub.


Introduction to Interactive Visualization

Outline

This workshop will introduce the fundamentals of interactive visualization, a style of data exploration that allows users to engage directly with plots through actions such as hovering, filtering, zooming, and selecting data. We will explore two key tools in R: R Shiny, a framework for building interactive web applications using R, and Plotly, a popular library for creating dynamic, responsive graphics. Participants will learn how to build simple Shiny apps with reactive elements, and incorporate Plotly to create interactive dashboards. Through guided examples, attendees will gain practical experience developing and customizing interactive visualizations.

Instructor

Lucy Liu Lucy Liu is a third-year undergraduate student majoring in Statistics and Applied Mathematics at the University of Connecticut. She serves as the President of the Joint Statistical Club, where she runs workshops and organizes statistical conferences.

Prerequisites

A laptop with R/RStudio installed; basic familiarity with R is recommended, but no prior experience with Shiny or Plotly is required.

Training Materials

On GitHub.


Introduction to Project Management with Git

Outline

Git is a version control system that tracks changes to files in your project, letting you save snapshots of your work, revert to earlier versions, and collaborate across branches. As the industry standard for version control, Git is essential for any developer workflow. This workshop will teach you the fundamentals you need to start using Git effectively in your projects.

Instructor

Michael Agostino Michael Agostino is a senior undergraduate majoring in Statistics with a minor in Computer Science. He is developing expertise in data analysis and automation, with strong foundational skills in Python.

Prerequisites

A laptop with the capability to connect to the internet. Some basic familiarity with the computers command line is preferred.

Training Materials

On GitHub.


Basketball Analytics

Outline

Basketball Analytics have changed over the past few decades. In this workshop, we are looking into how shooting patterns have changed in the past 20 years, and how far telemetry analytics have come to in the league. We will take a look at player data and see how performance has changed overtime!

Instructor

Ankith Nagabandi Ankith Nagabandi is a CS Masters Student. He is interested in sports analytics and was part of the UConn Data Science Clubs EBoard throughout his undergrad.

Prerequisites

Laptop with an IDE and Python installed. Basic familiarity with basketball, recommended; Jupyter Notebooks.

Training Materials

On GitHub.


Football Analytics

Outline

NFL analytics has expanded rapidly throughout the league in recent years when it comes to in-game and personnel decisions. This workshop looks to provide a framework for conducting football analysis in R. We will make use of football specific packages and decipher how to leverage them to answer football related questions. This includes retrieving relevant data, creating visualizations, and statistical modeling.

Instructor

Nicholas Pfeifer Nicholas Pfeifer is a fourth-year undergraduate student majoring in Statistics at the University of Connecticut. He has worked with the UConn Women's Soccer team as a data analyst through the UConn Sports Statistics Experiential Learning Program for the past year. He is passionate about many sports including football and Formula 1.

Prerequisites

Familiarity with R and RStudio, familiarity with football recommended.

Training Materials

On GitHub.


Building and Deploying Sports Data Apps with Docker

Outline

Docker is a platform that allows developers to package applications and their dependencies into lightweight, portable containers that run consistently across different environments. By isolating software in these containers, Docker makes it easier to deploy, scale, and manage applications—ensuring that code runs the same on a developer’s laptop as it does on a production server. Attendees will learn how to build their own Dockerized applications from scratch, with one to two hands- on examples using sports data to demonstrate how containerization simplifies reproducibility, deployment, and collaboration in data-driven projects.

Instructor

Rahul Manna Rahul Manna is a senior at UConn, pursuing a dual degree in Statistical Data Science and Mechanical Engineering. Currently, he is collaborating with Dr. Jun Yan to develop machine learning models that predict utility demand for buildings across the UConn campus. He is a research assistant in the Laboratory for Advanced Manufacturing Reliability (KKim Lab), where he evaluates materials for implantable bioelectronics. His first research paper was recently published in Royal Society of Chemistry's Applied Polymers. Beyond academics and research, Rahul is an avid Formula One enthusiast, drawn to the sport’s fusion of cutting-edge engineering, advanced statistics, data science, and human ingenuity—elements that shape both on-track performance and the strategic decisions behind the scenes.

Prerequisites

Laptop with Docker and Python installed (AWS optional). Please check the GitHub repository for the most up-to-date list of requirements.

Training Materials

On GitHub.


Building a performance metric with player tracking data

Outline

Player tracking data offers a great opportunity for creating performance metrics in sports. In this workshop, we cover the problem of unsupervised labeling in sports with player tracking data. We focus on American football and use data provided by the NFL Big Data Bowl competition. We first discuss basic tools (visualization and preprocessing) for getting started with tracking data, followed by a case study of clustering pre-snap motion in the NFL.

Instructor

Quang Nguyen Quang Nguyen is a PhD student in the Department of Statistics & Data Science at Carnegie Mellon University. His current research focuses on statistical analysis of complex data such as player tracking data in sports and network data. Quang previously received his MS in Applied Statistics from Loyola University Chicago and BS in Mathematics and Data Science from Wittenberg University in Springfield, Ohio. He is a two-time NFL Big Data Bowl finalist, and a die-hard supporter of Manchester United.

Prerequisites

Familiarity with R and basic data science tasks (data wrangling & data visualization)

Training Materials

On GitHub.


Causal Tools for Sports: Evaluating Defensive Strategies in MLB

Outline

In this workshop, we will introduce core concepts of causal inference, including the potential outcomes, confounding, and causal diagrams, through the lens of sports analytics. Using the MLB infield shift as a motivating case study, we will illustrate how causal methods can be used to evaluate strategic decisions and quantify their impact on game performance. Participants will learn how techniques such as matching, weighting, and instrumental variables enable rigorous causal effect estimation from observational sports data.

Instructor

Ying Zhou Ying Zhou is an Assistant Professor of Statistics at the University of Connecticut. Before joining UConn, she was a Postdoctoral Fellow at the University of Pennsylvania from 2023 to 2024. She received her Ph.D. in Statistics from the University of Toronto. Her research focuses on methodological and applied challenges in causal inference, particularly those arising from complex data structures.

Prerequisites

Basic R programming and introductory probability.

Training Materials

On GitHub.