See Venue for WebEx Links.
The 1-hour workshops are scheduled in two parallel sessions from 1 to 3 pm on Saturday, Oct. 10, 2020.
"R is a language and environment for statistical computing and graphics". It is one of the most popular language in Data Science for its flexibility, extensibility, great community support, open-source, and closeness to cutting edge methodological developments. This workshop offers a jumpstart with R assuming no prior exposure. The substantive topics include: 1) an overview of R basics; 2) R data structure; 3) learn and practice R data management; 4) a brief introduction to some useful R packages (e.g. dplyr). The workshop will be interactive and the participants will work with real-life sports dataset.
Tuhin Sheikh is a fourth year Ph.D. student in Statistics at the University of Connecticut (UConn). He worked as a Biostatistics and Data Science Intern at the Boehringer Ingelheim in Summer 2020. His research areas of interest include competing risks survival data, Bayesian computation, joint modelling of longitudinal and survival data, model assessment, interim analysis, deep neural network, etc. He has experience as a research assistant with the UConn School of Nursing and, UConn Utility Operations and Energy Management. He has also experience in teaching undergraduate level Statistics courses and conducting workshops at international conferences. He has been awarded for his academic excellence in teaching.
Laptop with R/RStudio preinstalled; previous experience using R is NOT required; basic programming knowledge would be helpful but NOT required.
On GitHub.
As a popular high-level language, Python has many excellent features that data scientists like: easy to learn, object oriented, cross-platform, open source, and many extensions for machine learning. It is widely used in many data science challenges from the front end to the back end. Good Python programming makes it easier to analyze sports data. The workshop will cover the following contents in-class: Python data types, methods for moving data, functions. The visualization and two cases studies with Python will be the supplementary left for after-class practice, through which participants can quickly pick up frequently-used Python code.
Jun Jin is a Ph.D. student in the Department of Statistics, University of Connecticut, working as a research assistant. He has five years of programming experience in R and Python. His research interests focus on machine learning, web crawler, text miming, distributed computing, and PySpark. He had worked at the Digital Experience Center of PricewaterhouseCoopers where he created machine-learning solutions for consulting projects.
A laptop with Anaconda installed. Anaconda can be downloaded for windows users here and for Mac users here.
On GitHub.
Who is the fastest pitcher in baseball currently? What is the difference between the most pitcher-friendly and the most hitter-friendly umpires? Can we predict the winning percentage for a team? To answer these questions in a more efficient and scientific way, people weave statistical models and machine learning into sabermetrics. The most common purposes of sabermetrics are evaluating past performance and predicting future performance to determine a player's contributions to his team. In this workshop, we are going to have: 1) access baseball data and use "dplyr" for data manipulation; 2) data visualization with "ggplot2"; 3) explore the relationship between runs and wins. Example codes will be provided.
Dr. Zhe Wang is an Assistant Professor of Data Analytics at the Denison University. She received her PhD in Statistics in 2020 from the University of Connecticut. She is an experienced instructor of Mathematical Statistics and Statistical Methods. Her current research interests include sequential analysis, change point detection, and sample size determination. Working on large population-based public health datasets, she conducts secondary analyses to estimate the prevalence of and associations between risk factors, behaviors, disease states, and other health-related outcomes.
Passion in Statistics and/or baseball; reading and writing data in R; a laptop with R and RStudio preinstalled.
On GitHub.
The proliferation of analytics within basketball has exploded within the last ten years. Even the casual fan has some grasp of concepts like 'player efficiency’ and 'expected field goal percentage’, for example. The purpose of this workshop is to demonstrate how R and its related applications may be used to reproduce, create, and –-- most importantly –-- analyze basketball data. I will introduce foundational basketball statistics and the comprehensive R package ‘BasketballAnalyzeR’. After which, we’ll explore how various statistical techniques may be used to examine both player and team data. We’ll reference the 2017-2018 NBA season. All relevant code and datasets will be made available.
Jackson Lautier is a second-year Ph.D. student and National Science Foundation Graduate Research Fellow in the Department of Statistics, University of Connecticut. Previous to pursuing his doctorate degree, Jackson worked for eight years as an actuary focusing on risk analysis and quantitative finance. His research concentrates on applications of statistics to economics, finance, risk management, sports, and actuarial science. He is particularly interested in using statistics to improve access of disadvantaged groups to the capitalist financial system. Jackson is a Fellow of the Society of Actuaries, a Chartered Enterprise Risk Analyst, and a Member of the American Academy of Actuaries. He holds a bachelor’s degree in Mathematics/Actuarial Science (2011) from the University of Connecticut.
Familiarity with R/RStudio; familiarity with basketball; prior exposure to undergraduate coursework in statistics helpful but not required.
On GitHub.
Various sports websites produce tons of data every day. How to collect them might be the first difficulty that people encounter when they start doing sports analytics. We could collect these data by just copying and pasting manually, however, such way is tedious and time consuming. Web scrapping techniques provide an elegant way to solve this problem, which can capture information from website efficiently and reproducibly. In this workshop, topics include: 1) an overview of several web scrapping methods using R; 2) popular R packages for static and dynamic web pages, respectively. Examples are provided to illustrate how the web scrapping techniques are applied in the real world.
Yaqiong Yao is a fourth-year Ph.D. student in the Department of Statistics, University of Connecticut. Her research interests include optimal subsampling for big data, statistical computing and statistical learning. She is a research assistant at the UConn Health Center, working on the Connecticut All-Payer Claims Databases with R to help assess the health care performances of different organizations. She is also a teaching assistant for both introductory and advanced statistical courses.
Previous experience with R; a Laptop with R and RStudio preinstalled.
On GitHub.
Data visualization plays an essential role in sports analytics, with a wide range of applications from data exploration to result presentation. An accurate and attractive graph or animation can describe complex data in an easy and understandable way, while a poorly designed one may deliver wrong information and confuse the readers. This workshop will give a clear picture of how to generate an excellent statistical plot in data analysis and introduce a powerful graphics package in R, ggplot2. The contents include 1) a general introduction to ggplot2; 2) detailed instruction to generate advanced figures with ggplot2; 3) a case study with data from basketball games; 4) some useful extensions of ggplot2.
Yiming Zhang is a Ph.D. student in the Department of Statistics, University of Connecticut. Working as a research assistant in the School of Nursing for more than two years, he leads the data analysis for several research projects using healthcare and randomized trial data. He also has one-year consulting experience in the Statistical Consulting Service in the Department of Statistics, providing statistical support for projects from a variety of different research fields. With a great passion for data analysis, Yiming always tries to implement and develop statistical methodologies to solve real-world problems.
Interest in statistical graphs and visualization; basic programming experience with R; a laptop with R/RStudio preinstalled.
On GitHub.