More information coming soon.
Abstract: Estimating schedule difficulty in the National Football League is tricky given the limited number of games and the number of factors that impact game outcomes, including time-varying team strengths, the home advantage, and changes in rest, travel, and time zones. From the league’s perspective, understanding each of these factors can give us a better understanding of scheduling equity and competitive balance. We extend the Bayesian state-space model of Lopez, Matthews, and Baumer (2018) to estimate varying levels of rest and travel advantages using betting market data. The model accounts for team strength that varies by week and season. We estimate that a team coming off a bye is worth about three-quarters of a point, while a shorter rest advantage is worth about half of that. In addition, we find that the benefit of playing at home has dropped approximately a point in the last decade and explore if and how the game looks different on the field as a result.
Thompson Bliss is a Data Scientist for the National Football League. He completed his master’s degree in Data Science at Columbia University in the City of New York in December 2019. At Columbia, he worked as a graduate assistant for a Sports Analytics course taught by Professor Mark Broadie. He received a Bachelor of Science in Physics and Astronomy at University of Wisconsin - Madison in 2018.
Abstract: Continuous-time assessments of game outcomes in sports have become increasingly common in the last decade. In American football, only discrete-time estimates of play value were possible, since the most advanced public football datasets were recorded at the play-by-play level. While measures such as expected points and win probability are useful for evaluating football plays and game situations, there has been no research into how these values change throughout the course of a play. In this work, we make two main contributions: First, we introduce a general framework for continuous-time within-play valuation in the National Football League using player-tracking data. Our modular framework incorporates several modular sub-models, to easily incorporate recent work involving player tracking data in football. Second, we use a long short-term memory recurrent neural network to construct a ball-carrier model to estimate how many yards the ball-carrier is expected to gain from their current position, conditional on the locations and trajectories of the ball-carrier, their teammates and opponents. Additionally, we demonstrate an extension with conditional density estimation so that the expectation of any measure of play value can be calculated in continuous-time, which was never before possible at such a granular level.
Ronald Yurko is a PhD student in the Department of Statistics & Data Science at Carnegie Mellon University. In addition to his statistical methodology research oriented towards applications in genetics and genomics, he is actively involved in statistics in sports research. He is a co-organizer of the annual Carnegie Mellon Sports Analytics Conference and Reproducible Research Competition, and has developed multiple R packages to enable easy access of publicly available data, such as nflscrapR (see paper on arXiv) which has been featured in popular media such as The Athletic, Wall Street Journal, and FiveThirtyEight. He previously worked as a quantitative analyst in finance as well as a baseball operations data and analytics intern for the Pittsburgh Pirates. He received a Bachelor of Science in Statistics at Carnegie Mellon in 2015.
Abstract: Many of the most interesting questions in sport center on "intangibles”: aspects of performance that cannot be directly observed. Mixture models represent a broad class of tools for modelling latent variables. In this talk, I will give an overview of three types of mixture models --- finite mixtures, mixed membership, and hidden Markov models --- and discuss broad areas of their application in sport. Specific case studies will be presented for tennis including the use of mixture models for the categorisation of playing style, for building a generative model for shots, and for studying injury and recovery. I will show how each model type can be implemented in a fully Bayesian framework using the Stan language and discuss Stan's strengths and limitations for mixture model development.
Dr. Stephanie Kovalchik currently a Senior Data Scientist at Zelus Analytics, where she and her colleagues are developing world-leading intelligence platforms for multiple sports. In her previous role, she led data science innovation for the Game Insight Group of Tennis Australia, building first-of-a-kind metrics and real-time applications with tracking data. An expert in tennis analytics and causal inference with 40+ published papers, she also developed novel statistical methods in her previous roles at the NCI and RAND Corporation. She runs a tennis analytics blog at on-the-t.com and tweets @StatsOnTheT.
Abstract: We will give an example of the entire data science pipeline put into practice at ESPN by discussing the redesign of NBA Basketball Power Index (BPI), ESPN’s team performance metric for the NBA. Representatives from ESPN’s Stats and Information Group, including several members of ESPN’s Sports Analytics team, will present, along with a few key users from other groups at ESPN. Each will describe their role in a stage of the development process of NBA BPI and its related metrics, from formulating what the metrics will incorporate and how we would like them to be used by writers and on-air personalities, to describing the technical development of NBA BPI and related metrics, to communicating results to key users and training them on how the metrics can be used to enhance storytelling for the NBA.
ESPN’s Sports Analytics team is in its 10th year of developing state-of-the-art metrics and providing in—depth statistical analysis as part of ESPN’s Statistics and Information Group. The developers and analysts on the team create advanced player and team metrics, build predictive models, conduct simulations, and analyze data for players, teams, games, and seasons that the Statistics and Information Group and others at ESPN use for storytelling. The team disseminates the results of the work using automated reports, data visualizations, and interactive web applications, and partners with production teams to integrate the content into ESPN’s sports coverage across multiple platforms.