CSAS 2025 offers 10 workshops in four tracks on Friday, April 11, 2025
R is an open-source programming language for statistical computing and graphics. R offers a wide range of graphical and statistical tools, including time-series analysis, classification, clustering, and linear and nonlinear modeling. This workshop introduces R to those who have had little to no prior experience. Topics include: 1) an overview of basic R; 2) data structure of R; 3) data management of R; and 4) some useful package of R. A real-life sports dataset will be used to provide a better understanding of R.
Lucy Liu is
a second-year undergraduate student majoring in Statistics and
Applied Mathematics at the University of Connecticut. She serves
as the President of the Joint Statistical Club, where she runs
workshops and organizes statistical conferences.
A laptop with R/RStudio installed; previous experience using R is NOT required; basic programming knowledge would be helpful but NOT required.
On GitHub
As a popular high-level language, Python has many excellent features that data scientists like: easy to learn, object oriented, cross-platform, open source, and many extensions for machine learning. It is widely used in many data science challenges from the front end to the back end. Good Python programming makes it easier to analyze sports data. The workshop will cover the following contents in-class: Python data types, methods for moving data, and functions.
Charitarth
Chugh is a senior majoring in Computer Science at the
University of Connecticut and also the President of UConn
Artificial Intelligence Club. This year, he is researching the
intersections of cybersecurity and deep learning for attack
detection alongside multimodal (audio, text, and video)
models.
A laptop with internet access. We will be working with Jupyter Notebooks on Google
On GitHub.
Visualizing data, particularly in sports analytics, provides valuable insights for informed decision-making, reveals underlying patterns in the data, and enhances communication among stakeholders. Leveraging Python’s versatility and powerful data visualization libraries enables the creation of well-crafted visual narratives across diverse domains. Matplotlib, Python's most popular visualization package, offers extensive customization options and precise control, making it a preferred tool for crafting detailed and impactful visualizations. This workshop will introduce Matplotlib’s robust plotting capabilities, showcase practical examples of data visualizations in baseball and basketball, and equip participants with versatile techniques applicable across any domain.
Rahul
Manna is a junior pursuing a dual degree in Statistical Data
Science and Mechanical Engineering. He is currently working as a
research assistant in the Laboratory for Advanced Manufacturing
Reliability (KKim Lab), where he tests materials for implantable
bioelectronics and uses Python and Matplotlib to analyze and
visualize data. Outside the classroom and workspace, Rahul
enjoys Formula One, where the fusion of cutting-edge
engineering, advanced statistics, data science, analytics, and
human ingenuity drives both the on-track performances and the
strategic decisions behind the scenes.
Familiarity with Python and Jupyter Notebooks. Recommended: familiarity with Pandas. Please visit GitHub repository for more information.
On GitHub
This workshop aims to provide participants with a solid base for directing their own basketball analytics projects using the R programming language. We will review some of the R packages that are designed for acquisition of NBA data, creating visualizations like shot charts and assist networks, and performing statistical analyses. Additionally, for those whose March Madness bracket didn't perform as well as they had hoped, this workshop will go over some basic strategies to improve predictions for future tournaments.
Addison McGhee is a
Preceptor in Data Science at Yale University. He earned his M.S.
in biostatistics at the Harvard T.H. Chan School of Public
Health, and his B.S. in statistics at UW-Madison. Before coming
to Yale, he interned at the pharmaceutical company Repare
Therapeutics. Outside of his work, he enjoys working on sports
analytics projects for baseball and football.
Familiarity with R and RStudio. Recommended to have some familiarity with basketball.
On GitHub
In the modern age of the internet, there are lots of websites that store a variety of valuable data. In the world of sports, many of this data is publicly available but not always in an easily accessible format. Web scraping is an automated process used to gather data from websites that allows us to access and collect large amounts of data directly from web pages if the information is not available for download. This workshop will 1) introduce the basics of web scraping 2) show how to webscrape sports related websites with Python, including examples and 3) discuss how to ensure you are scraping ethically.
Melanie
Desroches is a senior at the University of Connecticut
majoring in Statistics and minoring in Computer Science. She is
currently working with the UConn Men’s Ice Hockey team to
determine how prospects' performance will translate to the NCAA
and more specifically, the Hockey East conference. Melanie is a
hockey fan but also loves learning about a variety of different
sports and how statistics can be used to answer questions within
the sports world.
In order to perform the code we will be discussing, you need to have downloaded Python, the beautifulsoup4, selenium, pandas, and requests packages, and a web-driver of your choosing (preferably Chrome Driver).
On GitHub
This goal of this presentation is to demonstrate different tennis analyses using Python. We will be using different Python packages to explore various information such as serving success, performance under pressure, and offensive/defensive playstyles. The data will be collected from the Association of Tennis Professionals (ATP) Tennis Statistics dataset, created by Jeff Sackmann.
Jaden
Astle is a fourth-year undergraduate at UConn studying
Statistical Data Science & Cognitive Science with a minor in
Computer Science. He previously interned as a Data Engineer for
The Hartford Insurance, where he focused on developing a model
training pipeline that was supported by the AWS Cloud. Jaden
serves as the President of UConn's Data Science Club, where he
leads technical workshops and helps plan other professional
development events.
Familiarity with Python & Jupyter Notebooks
On GitHub
Player tracking data offers a great opportunity for creating performance metrics in sports. This workshop will provide a detailed walkthrough of how to build a metric with player tracking data. We will focus on American football and use data provided by the NFL Big Data Bowl competition. The workshop will feature (1) an overview of tracking data and basic visualization and data preprocessing; (2) metric formulation and application to NFL data; and (3) metric validation and statistical properties of sports metrics.
Quang Nguyen is
a third-year PhD student in the Department of Statistics &
Data Science at Carnegie Mellon University. His current research
focuses on statistical analysis of complex data such as player
tracking data in sports and network data. Quang previously
received his MS in Applied Statistics from Loyola University
Chicago and BS in Mathematics and Data Science from Wittenberg
University in Springfield, Ohio. He is a two-time NFL Big Data
Bowl finalist, and a die-hard supporter of Manchester
United.
Familiarity with R and basic data science tasks (data wrangling & data visualization)
On GitHub
In this workshop, we will introduce the fundamentals of causal inference, including the potential outcomes framework and causal discovery, emphasizing the importance of causality in sports analytics. We will apply these concepts to NBA data to address specific challenges and uncover actionable insights. The workshop will also explore recent advancements in causal machine learning techniques, such as meta-learners and causal discovery with temporal ordering.
Shinpei
Nakamura Sakai Shinpei Nakamura-Sakai is a Ph.D. candidate
in Statistics and Data Science at Yale University. His research
in sports analytics introduces a framework for analyzing age
curves to examine how factors like rest days impact athlete
performance across career stages. Shinpei has industry
experience as a quantitative analytics associate at JPMorgan
Chase and an applied scientist at Amazon, and he has won Best
Poster Awards at both UCSAS 2022 and NESSIS 2023.
Familiarity with R and Python.
On GitHub
Visualize this! You’re on the field with less than 30 seconds in the 4th quarter. You’re juggling dependencies, battling inconsistent environments and scrambling to get your model into the end zone. You need a way to streamline your pipeline to win. This is when you decide to pull out the “Docker Playbook”, the ultimate game plan! In this seminar, we are going to take a deep dive in turning our Xs and Os into writing efficient Dockerfiles, containerizing our models and distributing them to our peers and the cloud. At the end of the final whistle, you will have the skills to build scalable, portable and efficient pipelines that are built to win.
Zuri
Hunter, a data engineer at the National Football League,
merges her technical skills with a strong commitment to
community service. A Computer Information Systems graduate from
Howard University, she taught herself Ruby on Rails and
showcased her talents in hackathons, notably at the United
States Presidential Innovation Fellows Hack the Pay Gap event.
Outside of her engineering role, Zuri also volunteered as a
Technical Lead for Black Girls Code and organized Howard
University’s hackathon “Bison Hacks.” Her work earned her a
2018 DCA Live Power in Tech nomination. In her downtime, she
enjoys arts and crafts, ice skating and competing in fighting
game tournaments nationwide.
AWS Account, Docker, Laptop, Terraform, Python, R
On GitHub
We conduct a workshop for educators to gain exposure to materials generated by the SCORE Network and to inspire them to use these educational materials with their students. The workshop will consist of an introduction to the SCORE Network, an investigation into pedagogical materials available including specific modules, a discussion of how to utilize these materials with students and a period of brainstorming about potential modules. This workshop will be led by senior personnel from the SCORE Network and attendees will be asked to bring a laptop or tablet for the workshop.
Dr.
Rachel Gidaro is an Assistant Professor in the Department of
Mathematical Sciences at the United States Military Academy,
West Point, New York. She earned her Bachelors of Science in
Mathematics in 2019 from Colorado Mesa University. Beginning in
2019, she attended Baylor University and earned a Master of
Science in Statistics in 2020 before completing her Doctor of
Philosophy in Statistics in 2024. Her research interests in
statistics are focused on discrete variate time series analysis.
In her free time, Rachel enjoys reading, exercising, and working
with the cheer team, the Rabble Rousers, at USMA.
TBA
On GitHub