Training Workshops

UCSAS 2024 offers 7 workshops on Friday, April, 12, 2024

Session A (Introductory Level)
Session B (Intermediate Level)
Session C (Advanced Level)
Session D (Professional Level)

Introduction to R

Outline

R is an open-source programming language for statistical computing and graphics. R offers a wide range of graphical and statistical tools, including time-series analysis, classification, clustering, and linear and nonlinear modeling. This workshop introduces R to those who have had little to no prior experience. Topics include: 1) an overview of basic R; 2) data structure of R; 3) data management of R; and 4) some useful package of R. A real-life sports dataset will be used to provide a better understanding of R.

Instructor

Fusheng Yang Fusheng Yang is a fifth-year Ph.D. student in Statistics at UConn. Her research interests include time series analysis and extreme value analysis. She is a research assistant at the Computational Climate Change Lab, UConn, working on detecting heatwaves in the past and predicting possible heatwave events in the future. She is also a teaching assistant for various undergraduate statistics courses at the Department of Statistics.

Prerequisites

A laptop with R/RStudio installed; previous experience using R is NOT required; basic programming knowledge would be helpful but NOT required.

Training Materials

On GitHub.


Introduction to Python

Outline

As a popular high-level language, Python has many excellent features that data scientists like: easy to learn, object oriented, cross-platform, open source, and many extensions for machine learning. It is widely used in many data science challenges from the front end to the back end. Good Python programming makes it easier to analyze sports data. The workshop will cover the following contents in-class: Python data types, methods for moving data, and functions.

Instructor

Charitarth Chugh Charitarth Chugh is a third year undergraduate student majoring in Computer Science at the University of Connecticut and also the President of UConn Artificial Intelligence Club. This year,he is researching the intersections of cybersecurity and deep learning for attack detection alongside multimodal (audio, text,and video) models.

Prerequisites

A laptop with internet access. We will be working with Jupyter Notebooks on Google Colab.

Training Materials

On GitHub.


Basketball Analytics

Outline

This workshop aims to provide participants with a solid base for directing their own basketball analytics projects using the R programming language. We will review some of the R packages that are designed for acquisition of NBA data, creating visualizations like shot charts and assist networks, and performing statistical analyses. Additionally, for those whose March Madness bracket didn't perform as well as they had hoped, this workshop will go over some basic strategies to improve predictions for future tournaments.

Instructor

Mathew Chandy Mathew Chandy is a third-year undergraduate student at the University of Connecticut, where he is double majoring in Statistics and Statistical Data Science, and minoring in Computer Science, Economics, and Mathematics. He previously worked as a sports analytics research intern at Carnegie Mellon University, where he analyzed determinants of NBA player salary and complementary playstyles. He is also a co-founder and current vice president of the UConn Joint Statistical Club.

Prerequisites

Familiarity with R and RStudio. Recommended to have some familiarity with basketball.

Training Materials

On GitHub.


Analysis of Formula 1 Data with Python

Outline

Formula 1 is a rapidly evolving sport where data collection is essential to the progress made by each team. The ability to analyze and visualize data is crucial in order to understand a team’s performance and convey that information to others clearly. In this workshop, we will be using Formula 1 data to walk through the process of data processing, analysis and visualization; 3 essential pieces of the data science pipeline. We will use python to conduct this analysis.

Instructor

Abhiram Gunti Abhiram Gunti a fourth-year undergraduate student at UConn studying Computer Science with a concentration in Computational Data Analytics. His past experiences have been working as a software engineer for a couple startups such as Noteworthy AI and Relativity Space. At Noteworthy AI, Abhiram worked on benchmarking and improving the company’s object detection and tracking software.

Prerequisites

Familiarity with Python, access to Jupyter Notebooks or Google Collab

Training Materials

On GitHub.


Web Scraping for Sports Data

Outline

If you want to start analyzing sports, you need data. Nowadays, there are many sources of pre-built datasets, but at times, you might want to make a custom dataset from an online source. Web scraping is the most effective solution to this problem. You can automate the process of gathering data from webpages, and in doing so, you can create datasets specific to the questions that you want to be answered. During this workshop you will learn 1) what web scraping is, 2) how static web scraping works using Python packages pandas, requests, and BeautifulSoup, then 3) how dynamic web scraping works using Python package Selenium in conjunction with the previously learned packages.

Instructor

Tyler Hinrichs Tyler Hinrichs is a fourth-year undergraduate student at UConn studying computer science with a concentration in Software Design and Development. He was previously a Software Engineering Intern for both Travelers and Wellinks. At Travelers, he led development of a Python and Selenium-based web scraping application to automate audit data collection for the Digital Enablement team.

Prerequisites

Familiarity with Python; a laptop with access to Jupyter Notebooks or Google Colab.

Training Materials

On GitHub.


TensorFlow in Sports Analytics

Outline

This workshop will cover important concepts behind neural networks and TensorFlow and how they can be applied to soccer player data from FIFA. More specifically, we will 1) collect soccer player data 2) pre-process and clean the data 2) create an embedding model for these players using tensorflow 3) create a full stack application hosted on AWS to make/deploy our model to potential customers. Get excited to learn common software development and deep learning techniques used in modern sports analytics!

Instructor

Hari Patchigolla Hari Patchigolla Hari Patchigolla is a fourth year undergraduate student majoring in Computer Science with a concentration in Computational Data Analytics. He has interned as a Data Scientist for companies like Optum and Autodesk where his work focused on a myriad of techniques from anomaly detection to leveraging Generative AI tech stacks. Hari is also the President of UConn’s Data Science Club where he hosts various workshops and industry events.

Prerequisites

Familiarity with python; access to AWS (we will be within the free tier); jupyter notebooks.

Training Materials

On GitHub


Causal Inference in Sports Analytics

Outline
  1. Introduction to Causal Inference

    1. Definition and importance
    2. Distinction between correlation and causation
    3. Real-world examples illustrating the need for causal inference
  2. Fundamental Concepts in Causal Inference

    1. Potential outcomes framework
    2. Treatment, control, and counterfactuals
    3. Confounding variables
    4. Identifiability and the backdoor criterion
  3. Methods for Causal Inference

    1. Randomized Controlled Trials (RCTs)
    2. Matching methods
    3. Propensity score matching
  4. Applications and Case Studies

Instructor

Dr. Kevin Cummiskey Dr. Kevin Cummiskey is an Associate Professor in the Department of Mathematical Sciences at West Point and Director of the Operations Research Program. His research is in applied statistics, causal inference, and statistics education. I was recently awarded the American Statistical Association's Causality in Statistics Education Award. Previously, he was an Engineer officer and operations analyst with service in South Korea, Iraq, Qatar, and Afghanistan. Dr. Cummiskey received his Ph.D in Biostatistics from Harvard University, M.S. in Computational Operations Research from the College of William and Mary, and B.S. in Mathematical Sciences from West Point. He chairs the Nutrition for Precision Health's Data Subcommittee and is a co-investigator of its artificial intelligence and computational modeling center for precision nutrition and health. In addition, he chairs West Point's Academic Freedom Advisory Committee.

Prerequisites

Some background in statistics; experience with R or Python for the examples.

Training Materials

On GitHub