Poster Abstracts

Information about poster size, poster award and the criteria of evaluation are here.

Basketball Network Visualization and Analysis

Presenter: Calvin Finley, Florida Gulf Coast University

To measure the effectiveness of player combinations and how players contribute to scoring plays, we create and analyze weighted graphs for National Basketball Association (NBA) teams in the 2021-22 season using play-by-play and other advanced data gathered from Each node in a team's graph represents a player on that team, and the presence of an edge between two nodes means that the two respective players played together at some point in the season. The weight of an edge is equal to the points scored per minute by either of an edge's players while the two were on the court at the same time, and we analyze these weights for every two-person lineup to determine the pairs of players that work best together. Lastly, we use existing algorithms to generate metrics for teams' scoring graphs as they changed over the course of a season, leading to the creation of moving average visualizations and compare teams' abilities.

Should I Stay or Should I Go

Presenter: Billy Fryer, North Carolina State University

In baseball, every run counts and every decision matters. Even decisions such as attempting to advance 2 bases rather than just 1 could be the difference in the game and ultimately making the playoffs. For a ball hit to the outfield with at least one runner on first or second, I wanted to create a model to evaluate these decisions - whether the runner should stay after advancing one base or go - attempt to advance two bases (first to third or second to home). This model was created based on Four Factors: distance from base runner to target base, distance from ball to target base, base runner speed, and game situation information. I then combined these probabilities with run expectancies to evaluate whether the advantage of being in a better future game state is worth the risk of being thrown out.

To Shift or not to Shift? Using Player & Ball Motion Data to Build a Highly Flexible Defensive Positioning Algorithm in Baseball

Presenter: Cameron Grove, Durham University (UK)

In this project, I have measured many aspects of baseball player defense. These include catch probability, infield range, arm strength, & transfer speed. Combining models for these actions with batted ball distributions, I have built a general model for the likelihood of a ball in play resulting in an out.

This model can be used to assess the efficiency for hypothetical fielding alignments, and a fitting method can be used to produce an estimate for the ideal defensive shift. The variation of these fits with fielder ability, batter spray distribution, and batter speed is presented. A website where any custom configuration can be tested has been hosted online.

English Premier League Player Advanced Statistics Visualizer

Presenter: Shane Hauck, St Lawrence University

The purpose of this app is to provide factual and comparable statistical data to support pre-existing information collected to analyze individuals competing in the English Premier League. This app provides value in that it delivers a more precise statistical understanding of players overall abilities based on data in accordance with preexisting knowledge of players skill. It compares players in various ways while analyzing statistical and visual data that can help give a more comprehensive understanding of the player. Tool can be used as an additional benefit for coaches, players, scouts, general managers, agents, etc. by providing accurate depictions of players for various areas of decision making. Offers additional statistical information about players in accordance with the “eye test”. Displays what type of player each player is in regard to the system that each team plays. App can be used for selecting players in accordance with both team management and recruitment (finding players of good value). The outcomes of the app include an inhibiting of an understanding that certain statistics are results of players style of play. By being able to clearly identify player styles, best fits can be determined that provide the most value for team and player. Data was compiled from where data was split up into 5 main categories (shooting, passing, dribbling, defending, and possession) to display radar charts and tables based on a scaled value. The shape of the radar chart is meant to show an accurate visualization based on the type of player that is being featured as a result of various different factors.

Clustering of UFC Fighters using Fight Data

Presenter: Richard Hauser, Loyola University Chicago

The Purpose of this project was to identify and analyze key variables such as Strikes landed per minute, significant strikes landed per minute, takedown success rate, strike success rate, ground strike success rate, ground strikes attempted per minute, clinches landed per minute, and submission attempts per minute, percentage of strikes as significant strikes, and average fight length. We want to see what makes UFC champions. Hierarchical Clustering and the formation of Dendrograms using Euclidean metrics were used.

Quantitative Analysis of Fantasy Football Drafting

Presenter: Sam Hughes, University of Connecticut

Fantasy football experts have offered various (often conflicting) suggestions regarding the value of each position in fantasy football. Using publicly available drafting data on the FantasyPros website (2012-2021), we can objectively analyze and suggest an optimal strategy of when to choose each position in a fantasy football draft. Specifically, we create a generalized additive model to predict a player’s weekly scoring output based on their NFL position and the average draft pick (ADP) used to select the player in fantasy football drafts. From there, we can approximate the slope of the model to estimate the marginal value for each position at a given draft pick. Among numerous recommendations, the model advises fantasy managers to pick running backs in the early rounds of the draft, target wide receivers in the middle rounds, and wait until the final rounds to select a defense and kicker. These results, along with the supplemental web application, offer a systematic approach that informs better decision making during a fantasy football draft.

Life’s Too Short to Bet the Under: Predicting Totals in the NBA

Presenter: Spencer Kerch, Belmont University

The total, or over/under, is a very popular option in sports gambling where the better picks whether the total points scored in a game are more or less than the set “total”. In this project I use available historical game and gambling statistics and build and compare two models that attempt to predict the outcome of the total. Because sportsbooks do a very good job using predictive models to set the total near the final total, a model that is even 50% accurate would be a success.

Predicting the NHL Draft with Rank-Ordered Logit Models

Presenter: Brendan Kumagai, Simon Fraser University

The National Hockey League Entry Draft has been an active area of research in hockey analytics over the past decade. Prior research has explored predictive modelling for draft results using player information and statistics as well as ranking data from draft experts. In this paper, we develop a new modelling framework for this problem using a Bayesian rank-ordered logit model based on draft ranking data obtained from scouting sites and media outlets. Rank-ordered logit models are designed to model multicompetitor contests such as triathlons, sprints, or golf through a sequence of conditionally dependent multinomial logit models. We apply this model to a set of draft ranking data from the 2019-2022 NHL drafts and use it to provide a consolidated ranking for the draft and estimate the probability that any given player will be selected at any given pick.

Estimating Conditional Average Treatment Effects for Player Performance over Time: Assessing Load-Management in Sports

Presenter: Shinpei Nakamura Sakai, Yale University

Athletes’ performances improve, peak, and eventually decline. This curve is called the “age curve” and we expect this curve to have heterogeneity with respect to the characteristics of the players. In this work, we focus on estimating the effect of rest between games on performance for each age. This is helpful for making decisions about resting a player and so-called “load management”. We make three main contributions: First, we construct a Conditional Expectation Function (CEF) to compare the age curve for different covariates and treatments. Second, using a causal inference approach, we propose a methodology to construct age conditioned treatment effect (ACTE) for a given treatment. The ACTE can test causal hypotheses for each age on the treatment and outcomes of interest. Third, we apply this method to assess the effect of days between games on multiple performance metrics conditional on age.

Analyzing Home-Ice Efficacy in the National Hockey League

Presenter: Stephen Parziale, Yale University

As predictive analysis becomes more utilized in sports, understanding what factors help predict outcomes has never been more important. A common hypothesis in sports is that home teams inherently have greater odds of winning. This is often reflected in betting odds, predictions by analysts, and even the postseason reward for teams who have the better regular season records against their opponents. But in the NHL, does home-field advantage even exist? And if so, what are the factors that affect the odds of home teams winning?

Who are the Sports Media Darlings?

Presenter: Emmanuel Rayappa, Creighton University Department of Mathematics

Sport news is one of the most popular types of news. It not only feeds us the latest game scores but also includes a bunch of professional reviews, prediction, evaluation, and even off-site tidbits. The sport that we focused on for this research project was football, as it seems to be the most popular sport in America. For this project, the goal was to be able to conduct text analysis on a sample of sports articles and be able to draw conclusions on what, who and how the sports media are talking about. In order to achieve these goals, we created an algorithm pipeline to analyze articles through sentimental analysis and name entity recognition, and eventually propose ways to rank people being mentioned in the sport news.

Does Speed Kill…Base Hits? (Exploring Speed as Proxy for Defensive Ability)

Presenter: Ethan Rendon, New York University

A key to being a baseball analyst is testing traditions, versus numbers, to better understand the game. One of these traditions, scouting grades, uses a 20 to 80 scale (where 50 is average) to define how to evaluate five main skills: Hit, Power, Speed, Arm, and Field. These are mostly easily quantifiable: Hit (bat path), Power (bat speed), Speed (running), Arm (throwing speed), but evaluating Field is complicated, so scouts typically rely on the "eye test" to judge ability. When Field is quantified, it is by approximating ability using a player's quick-burst acceleration, broad jump, and counter-jump movement. Gathering this data requires a specific facility, the right technology, and trained personnel, so sample sizes remain small. However, a fielder’s speed can easily be tracked during games and aggregated over time. Does speed correlate to ability? And can speed be used as proxy to overall defensive ability? The data is anonymous MILB tracking data from SMT filtered down removing tracking errors to 588 plays by 76 fielders. The scope is limited to centerfielders as they make 40% of outfield plays. Python is used to compare four factors of defense and speed.

Applications of Tracking Data in the Evaluation of Baserunner Performance

Presenter: Jack Rogers, University of Minnesota

New MLB player tracking has allowed for the investigation of new pieces of the game including ball flight, fielding, and base running. One piece of the game that this data can be applied to is baserunning.. Ball and player tracking data have made it possible to do this and this project uses those two parameters to estimate whether or not the play at the plate will be close. To do this, there was a bunch of cleaning within the data. The distance of the ball, fielder, and base runner from home plate all had to be calculated. Another piece that is important to determine whether or not the play at the plate will be close is the momentum of the outfielder running in to make the throw; typically if a player is flat-footed, it will be tougher to make the throw. Determining whether or not a play was close or not was also something we had to think about and we determined, arbitrarily, that a “close” play was one in which the ball and the base runner were 12 feet apart before the runner scored. What we found was that only 6% of plays were deemed “close”, which is a very low number. This is significant because it seems that in 94% of instances, either the runner was not sent home or the play was not close. This is something that could be dug into further to investigate the efficiency of base running decisions. In addition we were able to find many situations identical to those in which runners had easily scored where the third base coach held up the runner. This project shows significant results that close base running plays at the plate do not occur as many may have thought they would, this could lead to more work in the future about the efficiency of base runners and coaches making decisions while they are rounding third.

The Causal Effect of the Two-for-One Strategy in the NBA

Presenter: Daryl Swartzentruber, The Ohio State University

The two-for-one (TFO) strategy refers to a basketball team trying to gain two possessions at the end of the period while limiting its opponent to only one possession. We define TFO attempts and non-attempts and apply these definitions to play-by-play data from the National Basketball Association. We use inverse propensity score weighting with covariate balancing propensity scores to estimate the overall average effect of attempting a TFO on the score margin from the time of the TFO to the end of the period. We also use causal forests to estimate heterogenous team-specific effects.

Evaluating Minor League Outfielder Fly Ball Success using Player Tracking Data

Presenter: Jack Weyer, University of Southern California

With the advent of player and ball tracking data to the game of baseball, we are able to answer, in a statistical and precise fashion, questions that have been left to human feel for centuries. One facet of the game, outfielder coverage, is the focus of this analysis. Using 61 games of minor league data over the course of three seasons, this study builds a 97% effective classifier to assign a catch probability to each outfielder for a given batted ball hit to the outfield. The analysis found that distance from the outfielder at the pitch to the eventual landing spot, the ball’s hang time, and whether the outfielder must travel away from home plate are all important features that drive catch success. By aggregating outfielders’ opportunities over an entire season, we found the most successful players and teams at fly ball coverage.

Using Routine Plays to Assess Infielder Arm Ability

Presenter: Cale Williams, Georgia Institute of Technology

The purpose of this study is to assess infielder arm strength and accuracy using tracking data during routine plays. The dataset is reduced to include only batted balls thrown by an infielder to the first baseman. Then, throw speed by each infield position is analyzed for close plays and non-close plays. Throw accuracy is evaluated for each infield position and accuracy is represented from the perspective of the first baseman via a combination of rotation matrices. Finally, a summary and comparison of the most common players present in the dataset is presented.