Training Workshops

UCSAS 2024 offers 7 workshops on Friday, April, 11, 2024

Session A (Introductory Level)
Session B (Intermediate Level)
Session C (Advanced Level)
Session D (Professional Level)

Introduction to R

Outline

R is an open-source programming language for statistical computing and graphics. R offers a wide range of graphical and statistical tools, including time-series analysis, classification, clustering, and linear and nonlinear modeling. This workshop introduces R to those who have had little to no prior experience. Topics include: 1) an overview of basic R; 2) data structure of R; 3) data management of R; and 4) some useful package of R. A real-life sports dataset will be used to provide a better understanding of R.

Instructor

Fusheng Yang Fusheng Yang is a fifth-year Ph.D. student in Statistics at UConn. Her research interests include time series analysis and extreme value analysis. She is a research assistant at the Computational Climate Change Lab, UConn, working on detecting heatwaves in the past and predicting possible heatwave events in the future. She is also a teaching assistant for various undergraduate statistics courses at the Department of Statistics.

Prerequisites

A laptop with R/RStudio installed; previous experience using R is NOT required; basic programming knowledge would be helpful but NOT required.

Training Materials

On GitHub.


Introduction to Python

Outline

As a popular high-level language, Python has many excellent features that data scientists like: easy to learn, object oriented, cross-platform, open source, and many extensions for machine learning. It is widely used in many data science challenges from the front end to the back end. Good Python programming makes it easier to analyze sports data. The workshop will cover the following contents in-class: Python data types, methods for moving data, and functions.

Instructor

Charitarth Chugh Charitarth Chugh is a third year undergraduate student majoring in Computer Science at the University of Connecticut and also the President of UConn Artificial Intelligence Club. This year,he is researching the intersections of cybersecurity and deep learning for attack detection alongside multimodal (audio, text,and video) models.

Prerequisites

A laptop with internet access. We will be working with Jupyter Notebooks on Google Colab.

Training Materials

On GitHub.


Basketball Analytics

Outline

This workshop will give you a solid base for doing your own basketball analytics projects using the R programming language. We will review some of the R packages that can be used to analyze shot charts, assist networks, player similarities, and game flow / win probability charts for both the NBA and NCAA Division I basketball. Additionally, if your March Madness bracket didn't perform as well as you'd hoped this year, this workshop will go over some basic strategies to improve your predictions for next year, and how to implement these in R.

Instructor

Mathew Chandy Mathew Chandy is a senior at the University of Connecticut, where he is double majoring in Statistics and Statistical Data Science, and minoring in Computer Science, Economics, and Mathematics. He previously worked as a sports analytics research intern at Carnegie Mellon University, where he analyzed determinants of NBA player salary and complementary playstyles. He is also a co-founder and current vice president of the UConn Joint Statistical Club.

Prerequisites

Familiarity with R and RStudio. Recommended to have some familiarity with basketball.

Training Materials

On GitHub.


Baseball Analytics with Python

Outline

Throughout the past decade, baseball has evolved to adapt to the digital age. The mass influx of data within the 21st century has given fans of America's pastime the ability to analyze and understand the game in ways previously thought impossible. Yet, understanding how to attain and utilize baseball data is just as complex as the sport itself. During this workshop, you will use Python to learn essential baseball analytics tools, the concept of Sabermetrics, data visualization for baseball analytics, and more.

Instructor

Patrick Cummins Patrick Cummins is currently a third-year undergraduate student at UConn studying data science with a concentration in statistics. This past summer, he was a data analytics intern for Major League Baseball where he worked within multiple different analytics teams to enhance fan experience when using MLB products.

Prerequisites

Familiarity with Python, a general understanding of baseball, access to Jupyter Notebooks or Google Collab.

Training Materials

On GitHub.


Web Scraping for Sports Data

Outline

If you want to start analyzing sports, you need data. Nowaways, there are many sources of pre-built datasets, but at times, you might have a need to make a custom dataset with data found online. Web scraping is the most effective solution to this problem. You can create automated scripts that can quickly and efficiently gather data from webpages. In doing so, you can create datasets specific to the questions that you want to be answered. During this workshop you will learn 1) what web scraping is, 2) how static web scraping works using Python packages pandas, requests, and BeautifulSoup, then 3) how dynamic web scraping works using Python package Selenium in conjunction with the previously learned packages.

Instructor

Tyler Hinrichs Tyler Hinrichs is a fourth-year undergraduate student at UConn studying computer science with a concentration in Software Design and Development. He was previously a Software Engineering Intern for both Travelers and Wellinks. At Travelers, he led development of a Python and Selenium-based web scraping application to automate audit data collection for the Digital Enablement team.

Prerequisites

Familiarity with Python; a laptop with access to Jupyter Notebooks or Google Colab.

Training Materials

On GitHub.


TensorFlow in Sports Analytics

Outline

This workshop will cover important concepts behind neural networks and TensorFlow and how they can be applied to soccer player data from FIFA. More specifically, we will 1) collect soccer player data 2) pre-process and clean the data 2) create an embedding model for these players using tensorflow 3) create a full stack application hosted on AWS to make/deploy our model to potential customers. Get excited to learn common software development and deep learning techniques used in modern sports analytics!

Instructor

Hari Patchigolla Hari Patchigolla Hari Patchigolla is a fourth year undergraduate student majoring in Computer Science with a concentration in Computational Data Analytics. He has interned as a Data Scientist for companies like Optum and Autodesk where his work focused on a myriad of techniques from anomaly detection to leveraging Generative AI tech stacks. Hari is also the President of UConn’s Data Science Club where he hosts various workshops and industry events.

Prerequisites

Familiarity with python; access to AWS (we will be within the free tier); jupyter notebooks.

Training Materials

On GitHub


Causal Inference in Sports Analytics

Outline
  1. Introduction to Causal Inference

    1. Definition and importance
    2. Distinction between correlation and causation
    3. Real-world examples illustrating the need for causal inference
  2. Fundamental Concepts in Causal Inference

    1. Potential outcomes framework
    2. Treatment, control, and counterfactuals
    3. Confounding variables
    4. Identifiability and the backdoor criterion
  3. Methods for Causal Inference

    1. Randomized Controlled Trials (RCTs)
    2. Matching methods
    3. Propensity score matching
  4. Applications and Case Studies

Instructor

Dr. Kevin Cummiskey Dr. Kevin Cummiskey is an Associate Professor in the Department of Mathematical Sciences at West Point and Director of the Operations Research Program. His research is in applied statistics, causal inference, and statistics education. I was recently awarded the American Statistical Association's Causality in Statistics Education Award. Previously, he was an Engineer officer and operations analyst with service in South Korea, Iraq, Qatar, and Afghanistan. Dr. Cummiskey received his Ph.D in Biostatistics from Harvard University, M.S. in Computational Operations Research from the College of William and Mary, and B.S. in Mathematical Sciences from West Point. He chairs the Nutrition for Precision Health's Data Subcommittee and is a co-investigator of its artificial intelligence and computational modeling center for precision nutrition and health. In addition, he chairs West Point's Academic Freedom Advisory Committee.

Prerequisites

Some background in statistics; experience with R or Python for the examples.

Training Materials

On GitHub