UCSAS 2024 offers 7 workshops on Friday, April, 12, 2024
R is an open-source programming language for statistical computing and graphics. R offers a wide range of graphical and statistical tools, including time-series analysis, classification, clustering, and linear and nonlinear modeling. This workshop introduces R to those who have had little to no prior experience. Topics include: 1) an overview of basic R; 2) data structure of R; 3) data management of R; and 4) some useful package of R. A real-life sports dataset will be used to provide a better understanding of R.
Fusheng Yang is a fifth-year Ph.D. student in Statistics at UConn. Her research interests include time series analysis and extreme value analysis. She is a research assistant at the Computational Climate Change Lab, UConn, working on detecting heatwaves in the past and predicting possible heatwave events in the future. She is also a teaching assistant for various undergraduate statistics courses at the Department of Statistics.
A laptop with R/RStudio installed; previous experience using R is NOT required; basic programming knowledge would be helpful but NOT required.
On GitHub.
As a popular high-level language, Python has many excellent features that data scientists like: easy to learn, object oriented, cross-platform, open source, and many extensions for machine learning. It is widely used in many data science challenges from the front end to the back end. Good Python programming makes it easier to analyze sports data. The workshop will cover the following contents in-class: Python data types, methods for moving data, and functions.
Charitarth Chugh is a third year undergraduate student majoring in Computer Science at the University of Connecticut and also the President of UConn Artificial Intelligence Club. This year,he is researching the intersections of cybersecurity and deep learning for attack detection alongside multimodal (audio, text,and video) models.
A laptop with internet access. We will be working with Jupyter Notebooks on Google Colab.
On GitHub.
This workshop aims to provide participants with a solid base for directing their own basketball analytics projects using the R programming language. We will review some of the R packages that are designed for acquisition of NBA data, creating visualizations like shot charts and assist networks, and performing statistical analyses. Additionally, for those whose March Madness bracket didn't perform as well as they had hoped, this workshop will go over some basic strategies to improve predictions for future tournaments.
Mathew Chandy is a third-year undergraduate student at the University of Connecticut, where he is double majoring in Statistics and Statistical Data Science, and minoring in Computer Science, Economics, and Mathematics. He previously worked as a sports analytics research intern at Carnegie Mellon University, where he analyzed determinants of NBA player salary and complementary playstyles. He is also a co-founder and current vice president of the UConn Joint Statistical Club.
Familiarity with R and RStudio. Recommended to have some familiarity with basketball.
On GitHub.
Formula 1 is a rapidly evolving sport where data collection is essential to the progress made by each team. The ability to analyze and visualize data is crucial in order to understand a team’s performance and convey that information to others clearly. In this workshop, we will be using Formula 1 data to walk through the process of data processing, analysis and visualization; 3 essential pieces of the data science pipeline. We will use python to conduct this analysis.
Abhiram Gunti a fourth-year undergraduate student at UConn studying Computer Science with a concentration in Computational Data Analytics. His past experiences have been working as a software engineer for a couple startups such as Noteworthy AI and Relativity Space. At Noteworthy AI, Abhiram worked on benchmarking and improving the company’s object detection and tracking software.
Familiarity with Python, access to Jupyter Notebooks or Google Collab
On GitHub.
If you want to start analyzing sports, you need data. Nowadays, there are many sources of pre-built datasets, but at times, you might want to make a custom dataset from an online source. Web scraping is the most effective solution to this problem. You can automate the process of gathering data from webpages, and in doing so, you can create datasets specific to the questions that you want to be answered. During this workshop you will learn 1) what web scraping is, 2) how static web scraping works using Python packages pandas
, requests
, and BeautifulSoup
, then 3) how dynamic web scraping works using Python package Selenium
in conjunction with the previously learned packages.
Tyler Hinrichs is a fourth-year undergraduate student at UConn studying computer science with a concentration in Software Design and Development. He was previously a Software Engineering Intern for both Travelers and Wellinks. At Travelers, he led development of a Python and Selenium-based web scraping application to automate audit data collection for the Digital Enablement team.
Familiarity with Python; a laptop with access to Jupyter Notebooks or Google Colab.
On GitHub.
This workshop will cover important concepts behind neural networks and TensorFlow and how they can be applied to soccer player data from FIFA. More specifically, we will 1) collect soccer player data 2) pre-process and clean the data 2) create an embedding model for these players using tensorflow 3) create a full stack application hosted on AWS to make/deploy our model to potential customers. Get excited to learn common software development and deep learning techniques used in modern sports analytics!
Hari Patchigolla Hari Patchigolla is a fourth year undergraduate student majoring in Computer Science with a concentration in Computational Data Analytics. He has interned as a Data Scientist for companies like Optum and Autodesk where his work focused on a myriad of techniques from anomaly detection to leveraging Generative AI tech stacks. Hari is also the President of UConn’s Data Science Club where he hosts various workshops and industry events.
Familiarity with python; access to AWS (we will be within the free tier); jupyter notebooks.
On GitHub
Introduction to Causal Inference
Fundamental Concepts in Causal Inference
Methods for Causal Inference
Applications and Case Studies
Dr. Kevin Cummiskey is an Associate Professor in the Department of Mathematical Sciences at West Point and Director of the Operations Research Program. His research is in applied statistics, causal inference, and statistics education. I was recently awarded the American Statistical Association's Causality in Statistics Education Award. Previously, he was an Engineer officer and operations analyst with service in South Korea, Iraq, Qatar, and Afghanistan. Dr. Cummiskey received his Ph.D in Biostatistics from Harvard University, M.S. in Computational Operations Research from the College of William and Mary, and B.S. in Mathematical Sciences from West Point. He chairs the Nutrition for Precision Health's Data Subcommittee and is a co-investigator of its artificial intelligence and computational modeling center for precision nutrition and health. In addition, he chairs West Point's Academic Freedom Advisory Committee.
Some background in statistics; experience with R or Python for the examples.
On GitHub