Training Workshops

The 1-hour workshops are scheduled in two parallel sessions from 1 to 3 pm on Saturday, October 9, 2020.

Session A (Introductory Level)
Session B (Intermediate Level)
Session C (Advanced Level)

Introduction to R

Outline

"R is a language and environment for statistical computing and graphics". It is one of the most popular language in Data Science for its flexibility, extensibility, great community support, open-source, and closeness to cutting edge methodological developments. This workshop offers a jumpstart with R assuming no prior exposure. The substantive topics include: 1) an overview of R basics; 2) R data structure; 3) learn and practice R data management; 4) a brief introduction to some useful R packages (e.g. dplyr). The workshop will be interactive and the participants will work with real-life sports dataset.

Instructor

Tuhin Sheikh Tuhin Sheikh is a fifth year Ph.D. student in Statistics at the University of Connecticut (UConn). He worked as a Biostatistics and Data Science Intern at the Boehringer Ingelheim in Summer 2020. His research areas of interest include competing risks survival data, Bayesian computation, joint modelling of longitudinal and survival data, model assessment, interim analysis, deep neural network, etc. He has experience as a research assistant with the UConn School of Nursing and, UConn Utility Operations and Energy Management. He has also experience in teaching undergraduate level Statistics courses and conducting workshops at international conferences. He has been awarded for his academic excellence in teaching.

Prerequisites

Laptop with R/RStudio preinstalled; previous experience using R is NOT required; basic programming knowledge would be helpful but NOT required.

Training Materials

On GitHub.


Introduction to Python

Outline

As a popular high-level language, Python has many excellent features that data scientists like: easy to learn, object oriented, cross-platform, open source, and many extensions for machine learning. It is widely used in many data science challenges from the front end to the back end. Good Python programming makes it easier to analyze sports data. The workshop will cover the following contents in-class: Python data types, methods for moving data, functions. The visualization and two cases studies with Python will be the supplementary left for after-class practice, through which participants can quickly pick up frequently-used Python code.

Instructor

Surya Eada Surya Teja Eada is a third year Ph.D. student in Statistics at the University of Connecticut. Prior to his pursuit of his doctorate degree, he worked for 3 years as a Statistical Analyst focusing on model validation, risk analysis as a consultant for a commercial bank. His research areas of interest include applications of statistics to finance, plant sciences and in particular the study of stochastic processes, dynamic modeling approaches, Hidden Markov models. He has experience as a teaching assistant for 4 years and also as an intern in eClerx, Deloitte, and as an IMSI intern at USDA's RIPE lab. He also holds a bachelor's and master's degree in Statistics from Indian Statistical Institute, Kolkata and Master's in Applied Financial Mathematics at UConn.

Prerequisites

A laptop with Anaconda installed. Anaconda can be downloaded for windows users here and for Mac users here.

Training Materials

On GitHub


Hockey Analytics with R

Outline

This workshop will discuss why you should be using R for your hockey analysis and how to do it effectively. We will review the public data sources that are available, discuss the foundational concepts of hockey analytics, explore data manipulation techniques that are useful when working with hockey data, and touch on various data visualization methods for displaying your results.

Instructor

Meghan Hall Meghan Hall is a data manager in higher ed as well as a data scientist at Zelus Analytics. She contributes to the public sports analytics community by helping beginners learn R as well as by writing and presenting on various aspects of hockey analysis as a member of Hockey-Graphs. She frequently delivers talks and workshops on R—for useR groups, R-Ladies groups, and conferences—and recently taught an undergrad course on data visualization at Carnegie Mellon University.

Prerequisites

Familiarity with R/RStudio.

Training Materials

https://meghan.rbind.io/talk/ucsas


Basketball Analytics with R

Outline

The proliferation of analytics within basketball has exploded within the last ten years. Even the casual fan has some grasp of concepts like 'player efficiency’ and 'expected field goal percentage’, for example. The purpose of this workshop is to demonstrate how R and its related applications may be used to reproduce, create, and - most importantly - analyze basketball data. I will introduce foundational basketball statistics, the comprehensive R package ‘BasketballAnalyzeR’, and give brief tips on collecting valuable NBA data. After which, we’ll use case studies to explore how various statistical techniques (ranging from basic to advanced) may be used to examine both player and team data for the recent 2020-2021 NBA season.

Instructor

Jackson Lautier Jackson Lautier is a third-year Ph.D. student and National Science Foundation Graduate Research Fellow in the Department of Statistics, University of Connecticut. Currently, he is also an Oak Ridge for Science and Education (ORISE) fellow at the Center for Drug Evaluation and Research (CDER) branch of the Food and Drug Administration (FDA). Previously to his graduate studies, Jackson worked for eight years as an actuary focusing on risk analysis and quantitative finance. His research concentrates on applications of statistics to economics, finance, risk management, sports, and actuarial science. Jackson is a Fellow of the Society of Actuaries, a Chartered Enterprise Risk Analyst, and a Member of the American Academy of Actuaries. He also holds a bachelor’s degree in Mathematics/Actuarial Science (2011) from the University of Connecticut.

Prerequisites

Familiarity with R/RStudio; familiarity with basketball; prior exposure to undergraduate coursework in statistics helpful but not required.

Training Materials

On GitHub.


Web Scraping for Sports Data with R

Outline

Various sports websites produce tons of data every day. How to collect them might be the first difficulty that people encounter when they start doing sports analytics. We could collect these data by just copying and pasting manually, however, such way is tedious and time consuming. Web scrapping techniques provide an elegant way to solve this problem, which can capture information from website efficiently and reproducibly. In this workshop, topics include: 1) an overview of several web scrapping methods using R; 2) popular R packages for static and dynamic web pages, respectively. Examples are provided to illustrate how the web scrapping techniques are applied in the real world.

Instructor

Lucas Godoy Lucas Godoy is a third-year Ph.D. student in Statistics at the University of Connecticut (UConn). Previously to the starting of P.h.D., he worked as a Statistician analyzing web data at a consulting company. He also has experience as a Data Science working for a Brazilian bank. His research areas of interest include but are not limited to Spatial Statistics and Statistical Computing. Currently, he is working part-time teaching undergraduate-level courses and part-time as a Research Assistant at the Biostatistics Center of the Connecticut Convergence Institute for Translation in Regenerative Engineering.

Prerequisites

Previous experience with R; a Laptop with R and RStudio preinstalled.

Training Materials

On GitHub.


TensorFlow with Applications in Sports Analytics

Outline

Tensorflow is an open-source machine learning framework developed by Google Brian Team, with a particular focus on training and inference of deep neural networks. It is fast, flexible, and suitable for large-scale applications, taking advantage of multiple CPUs and GPU. It facilitates artificial intelligence in diverse challenges. We will explain the basic concepts and have a taste of its framework. Latest development will be briefly discussed, in particular the convenience brought by the "eager mode". As exercises, a linear regression and a neural network with football data will be carried out.

Instructor

Jun Jin Jun Jin is a Ph.D. student in the Department of Statistics, University of Connecticut, working as a research assistant. He has five years of programming experience in R and Python. His research interests focus on machine learning, web crawler, text miming, distributed computing, and PySpark. He had worked at the Digital Experience Center of PricewaterhouseCoopers where he created machine-learning solutions for consulting projects.

Prerequisites

A computer with Anaconda (or Miniconda) installed. The package "Tensorflow" V2 should be installed. See videos for Windows users and Mac users.

Training Materials

On Github.