The 1-hour workshops are scheduled in two parallel sessions from 1 to 3 pm on Saturday, Oct. 10, 2020.
"R is a language and environment for statistical computing and graphics". It is one of the most popular language in Data Science for its flexibility, extensibility, great community support, open-source, and closeness to cutting edge methodological developments. This workshop offers a jumpstart with R assuming no prior exposure. The substantive topics include: 1) an overview of R basics; 2) R data structure; 3) learn and practice R data management; 4) a brief introduction to some useful R packages (e.g. dplyr). The workshop will be interactive and the participants will work with real-life sports dataset.
Tuhin Sheikh is a Ph.D. student and a teaching assistant in the Department of Statistics, University of Connecticut, and a lecturer of Applied Statistics (on leave) at the University of Dhaka, Dhaka, Bangladesh. His current research works focus on survival data analysis with informative dropout process. He has also been a research assistant with the UConn School of Nursing on microbiome data analysis. He is an instructor for an undergraduate level Mathematical Statistics. Mr. Sheikh is a proficient R user. He has a great enthusiasm to develop statistical methodologies for the emerging research problems on machine learning and deep learning. In the past, he led research projects and has published research articles on public health problems. He has been awarded for his academic excellence in teaching and research. A full biography of Mr. Sheikh can be found at https://www.isrt.ac.bd/people/tsheikh/.
Laptop with R/RStudio preinstalled; previous experience using R is NOT required; basic programming knowledge would be helpful but NOT required.
As a popular high-level language, Python has many excellent features that data scientists like: easy to learn, object oriented, cross-platform, open source, and many extensions for machine learning. It is widely used in many data science challenges from the front end to the back end. Good Python programming makes it easier to analyze sports data. After a brief introduction to Python (more details about installation and introduction are attached in “Starting Guidance”), the workshop will cover four steps in a full sports data analysis project: 1) Data acquisition; 2) Modeling (with modules); 3) Optimization of algorithm (with modules); 4) Result validation. Two case studies will be demonstrated, through which participants can quickly pick up frequently-used Python code.
Jun Jin is a Ph.D. student in the Department of Statistics, University of Connecticut, working as a research assistant. He has four years of programming experience in R and Python. His research interests focus on machine learning, web crawler, text miming, distributed computing, and PySpark. He had worked at the Digital Experience Center of PricewaterhouseCoopers where he created machine-learning solutions for consulting projects.
A laptop with Python and certain modules (pandas, matplotlib, numpy, sklearn, requests, tensorflow, keras, redis, json, logging, basketball_reference_web_scraper) preinstalled.
Who is the fastest pitcher in baseball currently? What is the difference between the most pitcher-friendly and the most hitter-friendly umpires? In which months are home runs more likely to occur? To answer these questions in a more efficient and scientific way, people weave statistical models and machine learning into sabermetrics. The most common purposes of sabermetrics are evaluating past performance and predicting future performance to determine a player's contributions to his team. In this 2-hour workshop, we are going to have: 1) access baseball data; 2) create traditional graphs in R; 3) explore the relation between runs and wins; 4 ) model a player's career trajectory. Example codes will be provided.
Dr. Zhe Wang is an Assistant Professor of Data Analytics at the Denison University. She received her PhD in Statistics in 2020 from the University of Connecticut. She is an experienced instructor of Mathematical Statistics and Statistical Methods. Her current research interests include sequential analysis, change point detection, and sample size determination. Working on large population-based public health datasets, she conducts secondary analyses to estimate the prevalence of and associations between risk factors, behaviors, disease states, and other health-related outcomes.
Passion in Statistics and/or baseball; reading and writing data in R; a laptop with R and RStudio preinstalled.
The proliferation of analytics within basketball has exploded within the last ten years. Even the casual fan has some grasp of concepts like 'player efficiency’ and 'expected field goal percentage’, for example. The purpose of this workshop is to demonstrate how R and its related applications may be used to reproduce, create, and –-- most importantly –-- analyze basketball data. I will introduce foundational basketball statistics and the comprehensive R package ‘BasketballAnalyzeR’. After which, we’ll explore how various statistical techniques may be used to examine both player and team data. We’ll reference the 2017-2018 NBA season. All relevant code and datasets will be made available.
Jackson Lautier is a second-year Ph.D. student and National Science Foundation Graduate Research Fellow in the Department of Statistics, University of Connecticut. Previous to pursuing his doctorate degree, Jackson worked for eight years as an actuary focusing on risk analysis and quantitative finance. His research concentrates on applications of statistics to economics, finance, risk management, sports, and actuarial science. He is particularly interested in using statistics to improve access of disadvantaged groups to the capitalist financial system. Jackson is a Fellow of the Society of Actuaries, a Chartered Enterprise Risk Analyst, and a Member of the American Academy of Actuaries. He holds a bachelor’s degree in Mathematics/Actuarial Science (2011) from the University of Connecticut.
Familiarity with R/RStudio; familiarity with basketball; prior exposure to undergraduate coursework in statistics helpful but not required.
Various sports websites produce tons of data every day. How to collect them might be the first difficulty that people encounter when they start doing sports analytics. We could collect these data by just copying and pasting manually, however, such way is tedious and time consuming. Web scrapping techniques provide an elegant way to solve this problem, which can capture information from website efficiently and reproducibly. In this workshop, topics include: 1) an overview of several web scrapping methods using R; 2) popular R packages for static and dynamic web pages, respectively. Examples are provided to illustrate how the web scrapping techniques are applied in the real world.
Yaqiong Yao is a fourth-year Ph.D. student in the Department of Statistics, University of Connecticut. Her research interests include optimal subsampling for big data, statistical computing and statistical learning. She is a research assistant at the UConn Health Center, working on the Connecticut All-Payer Claims Databases with R to help assess the health care performances of different organizations. She is also a teaching assistant for both introductory and advanced statistical courses.
Previous experience with R; a Laptop with R and RStudio preinstalled.
Data visualization plays an essential role in sports analytics, with a wide range of applications from data exploration to result presentation. An accurate and attractive graph or animation can describe complex data in an easy and understandable way, while a poorly designed one may deliver wrong information and confuse the readers. This workshop will give a clear picture of how to generate an excellent statistical plot in data analysis and introduce a powerful graphics package in R, ggplot2. The contents include 1) a general introduction to ggplot2; 2) detailed instruction to generate advanced figures with ggplot2; 3) a case study with data from basketball games; 4) some useful extensions of ggplot2.
Yiming Zhang is a Ph.D. student in the Department of Statistics, University of Connecticut. Working as a research assistant in the School of Nursing for more than two years, he leads the data analysis for several research projects using healthcare and randomized trial data. He also has one-year consulting experience in the Statistical Consulting Service in the Department of Statistics, providing statistical support for projects from a variety of different research fields. With a great passion for data analysis, Yiming always tries to implement and develop statistical methodologies to solve real-world problems.
Interest in statistical graphs and visualization; basic programming experience with R; a laptop with R/RStudio preinstalled.