Abstract: As sports leagues continue to
experiment with rules changes to improve safety, increase
scoring or excitement, and bolster revenues, evaluating the
effects of these changes becomes more important than ever. Lab
leagues, modeling, and pilot studies are one way to address this
beforehand. After the fact, quasi-experimental designs provide
flexible analyses that can evaluate impacts on a variety of
outcomes, at the league, team, and player levels. I will discuss
the possibilities and limitations of these approaches, using
MLB's extra-innings zombie runner and infield shift ban as
examples.Session 1: Sports Economics
Lee Kennedy-Shaffer, Assistant Professor of Biostatistics,
Yale University
Title:
Evaluating Rule Changes Using Quasi-Experimental Designs
Lee Kennedy-Shaffer is an assistant
professor of biostatistics at the Yale School of Public Health.
His primary research interests are in study designs to evaluate
vaccines, infectious disease control, and health policies. A
lifelong Mets fan, he is also interested in using these methods
to understand baseball and sports more generally, identifying
what causal inference methods work in sports settings, and using
sports to broaden interest in statistics to students and the
wider public.
Abstract: The guaranteed salaries of players in top professional leagues have grown substantially in recent years. Thus, the need for insurance policies that financially protect both teams and players in the event of injuries has become more commonplace. Despite the growing prevalence of such insurance products, there is a paucity of actuarial analysis regarding those products. This study presents the first known actuarial analysis of player injuries and resulting claims severity using a compound risk model that separately analyzes injury frequency and claim severity. Our model incorporates a covariate analysis using a comprehensive dataset that includes player biographies, player contracts, team travel, player performance, and player injury data. This method identifies key risk factors influencing claim frequency and severity. Subsequently, it suggests premium models and potential risk mitigants. We first summarize this novel dataset of 508 players from the 2022-2023 NBA regular season with 640 unique injury details. We then discuss significant risk factors related to player injuries in the NBA. Consequently, we utilize the proposed premium pricing model to establish fair actuarial insurance rates for NBA players' sports injury insurance. Finally, we discuss the implications of our findings, including potential risk mitigation strategies and the long-term economic impact on player careers and league-wide financial stability.
Hashan Peiris is a PhD candidate in the
Department of Statistics and Actuarial Science of Simon Fraser
University, supervised by Dr. Himchan Jeong and Dr. Tim Swartz.
His research interests are risk analysis with telematics data in
automobile insurance and sports analytics. In 2021, Hashan
started pursuing the actuarial pathway through the Society of
Actuaries and completed a master's in actuarial science from SFU
in 2023. His recent research work with Dr. Jeong and Dr. Swartz
has been published in the ASTIN Bulletin and the International
Journal of Sports Science & Coaching, respectively. Further,
he has collaborated with Dr. Tim Swartz to conduct sports
analytics workshops in Canada and Sri Lanka. Hashan is one of
the recipients of the James C. Hickman Scholarship from the
Society of Actuaries in 2024.
Abstract: For penalty kicks in soccer, the classical and statistical game-theoretic models developed herein predict a non-negative relationship between goal area partition-conditional shot-volume and conversion-rate for a goal-optimizing penalty kick taker in soccer. However, we find a strong negative relationship between goal area partition subsets of the study data, which considers 536 penalty kicks from the 2020 UEFA Champions and Europa Leagues. The estimated indifference sets and underlying inferential statistics of linear, polynomial, and ML-regularized Lasso regression models indicate that penalty-takers (significantly) value both conversion-rate and on-target rate when locating PK-shots. While partial optimizers, PK-takers in soccer deviate from optimal PK-locating strategies in a manner consistent with the behavioral valuation of keeping up appearances of highly-skilled play, in this case by limiting the likelihood of missing the goal entirely, as might a novice player. That is, players are revealed to value the optics of performance rather than strictly performance optimization. The result is consistent with stated and revealed aversion to underhanded free-throw shooting among large-handed NBA and EuroLeague Centers despite evidence of performance benefits for this group. The result is also consistent with evidence that corner three-point shooters are twice as likely to hit the front of the rim as compared to the side of the backboard. The optics of the latter causes players to bias their shot and decrease the overall conversion rate. Optimization utility is estimated to be 3.12 times as important to the representative PK-taker on the margin as is behavioral utility. The models are highly-explanatory, indicating that optimization and behavioral factors explain approximately 85.2% of PK-shot locating variation for top professional players. The negative estimated slope of the indifference sets provides visual evidence that players are revealed to value both conversion-rate and on-target rate. They trade-off between these interests when selecting penalty-shot locations. We conclude strong statistical evidence that suggests penalty-takers represent hybrid decision-makers: part rational-optimizers and part behavioral-agents seeking to keep up appearances of highly-skilled play. Our polynomial regression also suggests that PK-takers are risk averse, as their partition-dependent shot-volume increases at a decreasing rate in conversion-rate. As the purpose of a PK-attempt is to maximize expected goal (likelihood) directly, the presence of risk-aversion here represents another behavioral factor on the part of the PK-taker.
Shane Sanders is Professor of Economics
& Sport Analytics at Syracuse University. He conducts
research in the areas of player performance analytics with
emphasis on team and player (sub-)optimization. He also studies
issues of player valuation and league design. Sanders has
consulted on basketball roster construction for teams in the
EuroLeague and NCAA and has advised NBA teams on cross-league
player projections. Sanders 88 academic journal articles, many
in top journals of economics, statistics, finance, and sport (J
of Business & Economic Statistics, J of Behavioral &
Experimental Finance, Economics Letters, J of Sport Management,
J of Sports Economics, Social Indicators Research, and J of
Quantitative Analysis in Sport among them). His research has
been supported by research grants from FIFA, PARCC, and the
Mercatus Center Policy Analytics Program. Sanders’ research has
been cited in a U.S. Supreme Court sport antitrust case
(American Needle, Inc. v NFL), as well as in leading media
outlets such as USA Today, NPR Here and Now, MSNBC, Globe and
Mail, Fox Sports, TrueHoop, and The Late Show with Stephen
Colbert. Last year, Sanders was a Research Finalist at MIT SSAC
for his joint work (with Justin Ehrlich) on NBA advanced shot
charts and the increasing 3PA dispremium in the League. He also
also presented his work at Carnegie Mellon Sport Analytics
Conference, the Harvard New England Symposium on Statistics in
Sport, and the SABR Analytics Conference. Institution page: https://falk.syr.edu/people/sandersshane/
Ava Uribe is a current senior Sport
Management and Sport Analytics student at Syracuse University.
Uribe recently completed her second season with the Syracuse
University Women’s Soccer team, and in her youth career has
represented the U.S. Youth National Team from Under-14 to
Under-18. With the U.S National Team, she competed
internationally in tournaments such as the UEFA competition in
England as well as international friendlies across Europe. She
aspires to play professionally post-graduation and has focused
her academic and athletic experiences on performance analysis in
a variety of sports, especially soccer. As a primary penalty
kick taker for the Syracuse Orange, Uribe developed a research
interest in penalty kick trends at the professional level,
particularly as major international tournaments increasingly see
matches decided by shootouts. Her analysis has provided valuable
insights into goal conversion strategies, which she has shared
with her coaches and teammates to enhance performance in the
highly competitive ACC. Syracuse University player page: https://cuse.com/sports/womens-soccer/roster/Ava-Uribe/24409
Dr. Justin Ehrlich is an Associate
Professor in Sports Management at Syracuse University,
specializing in sport analytics, machine learning, and computer
science. His diverse research portfolio spans virtual reality,
3D human pose estimation, advanced visualization, sports rating
and ranking, the business of sport, risk analysis for CTE in
football players, and biomechanical assessment. As a faculty
member in Syracuse University's Big Data Cluster, Dr. Ehrlich
focuses on big data applications, performance analytics, and
advanced visualization tools such as shot charts.His innovative
work has been showcased at the MIT Sloan Sports Analytics
Conference and published in journals including the Journal of
Behavioral & Experimental Finance, JAMA, Public Choice, and
PLOS ONE. Dr. Ehrlich has also conducted extensive golf research
in collaboration with the University of Nevada, Las Vegas,
exploring topics like the effects of weather on performance,
optimizations in swing sequencing, and the impact of swing
consistency on course outcomes. Institution page: https://falk.syr.edu/people/ehrlich-justin/
Abstract: Outcomes are a function of luck and skill. Differentiating between them is challenging, however, despite their distinction being important for performance evaluation, compensation, incentives, and resource allocation. In sports, this distinction matters critically and has implications for entertainment, rule changes, and who wins and why. We investigate, identify, and measure the role of skill versus luck in tennis using a parsimonious hierarchical structural model that can answer a rich set of questions. Because skill accumulates with scale while luck partially cancels out, three set matches are more prone to luck than five set matches – begging the question: are Serena Williams’ 23 grand slams more impressive than Novak Djokovic’s 24? There are also second and third order effects – such as having to face tougher opponents later in five-set tournaments and seeding, which are also functions of luck, that differs for men and women. We attempt to answer questions like these and conduct counterfactual analysis, such as how many slams would Serena have won if best out of five? How many would Venus Williams have won if Serena wasn’t present? Which men’s and women’s players “exceeded” (lucky) or “underperformed” (unlucky) their skill level? Given the precision of the model, we also identify multiple dimensions of skill (serving, returning, surface, stamina) and multiple sources of luck (point, match draw, ranking/seeding). The model matches data and betting odds, and can be used to infer distributions in winning, earnings, rankings, and career dynamics. Our framework can potentially explore other aspects of performance such as momentum, clutch, and choking, as well as the expected impact of rule changes in tennis and how each of these might differ across the men’s and women’s game.
Tobias "Toby" Moskowitz currently
holds the Dean Takahashi ’80 B.A., ’83 M.P.P.M. Chaired
Professorship in Finance at Yale University at the Yale School
of Management, for which he is the inaugural chair holder (in
2016). He was previously the Fama Family Professor of Finance at
the University of Chicago Booth School of Business, where he
taught from 1998 to 2016. Professor Moskowitz was recognized by
the American Finance Association with its 2007 Fischer Black
Prize, which is awarded biennially to the top finance scholar in
the world under the age of 40 in years when one is deemed
deserving.
The award cited his "ingenious and careful use of newly available data to address fundamental questions in finance." Moskowitz also won the Ewing Marion Kauffman Medal in 2012 for the top research in entrepreneurship for scholars under the age of 40. His research papers have received numerous awards, including the Smith-Breeden Prize for the best paper published in the Journal of Finance, the Brattle Prize for the best non-asset pricing paper published in the Journal of Finance, two Fama-DFA Prizes for the best paper published in the Journal of Financial Economics, two Michael Brennan Awards for the best paper published in the Review of Financial Studies, the Swiss Finance Institute Outstanding Paper Award, and two Bernstein-Fabozzi and Jacobs-Levy Awards for the best paper in the Journal of Portfolio Management (most recently in 2024), two Harry Markowitz prizes for the best paper published in the Journal of Investment Management, the Whitebox Prize for best financial research, and multiple Q-Group Best Paper Awards.
His work has been cited in the Wall Street Journal, the New York Times, Financial Times, US News and World Report, Money magazine, and a 2005 speech by then Federal Reserve Chairman Alan Greenspan. He has also appeared on CNBC's Closing Bell and Squawk Box, CNN, FOX, Bloomberg, as well as ESPN, Sport Illustrated, HBO's Real Sports with Bryant Gumbel.
Professor Moskowitz serves as a research associate for the National Bureau of Economic Research and is a former editor of the Review of Financial Studies, an associate editor of the Journal of Finance, and an associate editor at the Journal of Financial Economics. His research studies financial markets and investments, including the behavior of prices and investors. He has explored topics as diverse as momentum in stock returns, biases in investment portfolios, the return to private business ownership, financial contracting in commercial real estate, mutual and hedge fund performance, the political economy of financial regulation, and the economics of sports. He has presented his research at many academic, corporate, and government institutions worldwide. Professor Moskowitz spent the 2007-2008 and 2014-2015 academic years on leave at AQR Capital Management, LLC a hedge fund in Greenwich, CT, with which he has an ongoing consulting relationship. He was made a principal at the firm in 2015, where he contributes to research on asset pricing and investments for AQR’s domestic and international investment strategies.
In collaboration with David F. Swensen, the CIO of the Yale Investments Office, Professor Moskowitz founded the Yale School of Management’s Master’s program in Asset Management, for which he serves as the Academic Director and whose inaugural class will graduate in 2022. The one-year program seeks to train Master’s students in the theory and practice of asset management, with a curriculum that is tailored for that aim. In 2011, he wrote the New York Times best-seller "Scorecasting: The Hidden Influences Behind How Sports are Played and Games are Won," (Crown Archetype, Random House) co-authored with L. Jon Wertheim of Sports Illustrated, that uses economic principles to explain the hidden side of sports. He and Wertheim are also authors of the children’s book “The Rookie Bookie” in the Fall of 2015 (Little Brown). Professor Moskowitz currently serves on the board of trustees of the Commonfund, where he is the chair of the audit and risk committee. He previously served as a board member for Ariel Capital (2012-2015), the Center for Research in Security Prices at the University of Chicago (2005-2015), and Mercer Global Advisors.
Born in West Lafayette, IN, Moskowitz earned a bachelor's degree in industrial management and industrial engineering (with distinction) in 1993 from Purdue University, a master's degree in management from Purdue in 1994, and a Ph.D. in finance from UCLA in 1998.
Abstract: One active thread of sports
analytics research proposes mechanisms for awarding points,
dollars, or other rewards that are more "fair". However, we
don't always articulate a shared understanding of what
constitutes fairness. In recent years, analysis of various
statistical criteria for non-discrimination in the machine
learning community has yielded some fascinating incompatibility
results. Namely, Kleinberg, et al. (2016) prove that in all but
the most trivial of cases, an algorithm cannot be unbiased
across all reasonable definitions (of fairness). We apply this
lens to sports, and explore how various existing reward systems
embody different notions of fairness.Session 2: Advancements in Sports Analytics and
Education
Ben Baumer, Professor of Statistical & Data Sciences,
Smith College
Title:
What Does 'Fairness' Mean in Sports Analytics?
Benjamin S. Baumer is a professor in the
Statistical & Data Sciences program at Smith College. Ben is
a co-author of The Sabermetric Revolution, Modern Data Science
with R, and the second and third editions of Analyzing Baseball
Data with R. Ben has received the Waller Education Award from
the ASA Section on Statistics and Data Science Education, the
Significant Contributor Award from the ASA Section on Statistics
in Sports, and the Contemporary Baseball Analysis Award from the
Society for American Baseball Research. His research interests
include sports analytics, data science, statistics and data
science education, statistical computing, and network
science.
Abstract: Devon Allen’s disqualification at the men's 110-meter hurdle final at the 2022 World Track and Field Championships, due to a reaction time (RT) of 0.099 seconds---just 0.001 seconds below the allowable threshold---sparked widespread debate over the fairness and validity of RT rules. This study investigates two key issues: variations in timing systems and the justification for the 0.1-second disqualification threshold. We pooled RT data from men’s 110-meter hurdles and 100-meter dash, as well as women’s 100-meter hurdles and 100-meter dash, spanning national and international competitions. Using a rank-sum test for clustered data, we compared RTs across multiple competitions, while a generalized Gamma model with random effects for venue and heat was applied to evaluate the threshold. Our analyses reveal significant differences in RTs between the 2022 World Championships and other competitions, pointing to systematic variations in timing systems. Additionally, the model shows that RTs below 0.1 seconds, though rare, are physiologically plausible. These findings highlight the need for standardized timing protocols and a re-evaluation of the 0.1-second disqualification threshold to promote fairness in elite competition.
Owen Fiore is a recent graduate of the
University of Connecticut's master's in data science program
after previously graduating from UConn in 2023. His senior
thesis was an analysis of track and field reaction times that
evolved into a comprehensive study of track and field
disqualification standards. He collaborated with Dr. Jun Yan
and Dr. Elizabeth Schifano and their work is expected to be
published in The American Statistician. In his free time, he
enjoys running, and watching UConn basketball.
Abstract: Player tracking data in American football have made it possible to create new measures of defensive ability at halting the ball-carrier's forward progress, addressing the shortcomings of subjective tackle statistics. In this work, we introduce a Bayesian hierarchical framework for modeling the change in ball-carrier momentum (mass times velocity) during contact windows (i.e., tackle opportunities defined by a distance threshold) within a play. Our modeling framework accounts for ball-carrier and defender random effects, with separate distributions for each position. This approach enables us to quantify differences in tackling ability between defensive positions and allows us to compare positional sources of variation. Furthermore, we model the rate at which the ball-carrier's momentum is being reduced by defenders and demonstrate that linebackers excel at quickly suppressing the ball-carrier's forward progress. Our results from the first nine weeks of the 2022 NFL season reveal the top NFL defenders in terms of tackling ability, as well as the best (Derrick Henry) and worst (Alvin Kamara) ball-carriers at maintaining momentum through contact.
Ron Yurko is an Assistant Teaching Professor
in the Department of Statistics & Data Science at Carnegie
Mellon University, and is the Director of the Carnegie Mellon
Sports Analytics Center. His research focuses on developing
methods at the interface of inference and machine learning,
oriented towards problems in sports analytics and natural
language processing. His work has been featured in popular media
outlets such as The Athletic, FiveThirtyEight, The Wall Street
Journal, and The Washington Post. He is a three-time degree
holder from Carnegie Mellon: with a bachelors, masters, and PhD
in Statistics. He also has industry experience in both finance
and professional sports.
Abstract: In team sports, standard ranking metrics do not consider how smaller groups of players interact. Individual performance measures rarely reflect the synergy or friction among pairs or trios, while full-lineup assessments can’t pinpoint the exact contributions of these lower-order combinations. To bridge this gap, we propose a novel adjusted plus-minus (APM) framework that simultaneously evaluates individuals, smaller groups, and entire lineups. Underlying our approach is a link between APM and the hypergraph representation of a team, which captures these overlapping interactions. In this talk, we’ll demonstrate how this perspective applies to NBA data from 2012–2022 and highlight the insights gained from viewing a team as a network.
Elizabeth Upton is an Assistant
Professor in the Mathematics and Statistics Department at
Williams College, where she has been since completing her Ph.D.
in Statistics at Boston University in 2019. Her research focuses
on applied statistics, with a particular interest in regression
models for network data. Before graduate school, she taught high
school math and worked in finance analyzing currency markets.
Liz loves teaching statistics, collaborating with researchers
across disciplines, and talking sports data whenever she gets
the chance. Outside of work, she’s usually chasing around her
three kids or cheering on the New England Patriots (#54
forever).
Abstract: The NBA & WNBA are dynamic, high-intensity leagues where game demands are constantly evolving. Understanding what drives these demands is critical for teams, coaches, and performance staff aiming to optimize player readiness, health, and performance. This talk will explore the key factors that influence game demands, including scheduling density, and player workload management. By examining these dynamics, we can gain insights into how the demands of the game shift over a season and the factors that effect them. I will also discuss the external influences—such as travel, return to play, and recovery strategies that impact how players perform and adapt to these challenges. This session is designed to provide a broader perspective on what truly drives game demands, offering practical takeaways for those looking to better understand the realities of elite basketball and the strategic considerations that come with it.
Dr. Keith D’Amelio is a leader in the
realm of human performance. With a PhD in applied sports science
and an extensive background in elite sports, including roles
with such organizations as the Boston Celtics, Toronto Raptors
and Stanford University as well as Nike. Keith stands squarely
at the intersection of performance optimization, technological
innovation, and the evolving landscape of human performance.
A pioneer in leveraging technology and data-informed methodologies, Keith has catalyzed industry-wide adoption of cutting-edge approaches. His influence extends beyond one sport, working with athletes, teams, and organizations spanning the NBA, MLB, NBA Players Association, WNBA, NWSL, NFL, EPL, AFL, as well as within the business and start up world, showcasing his ability to drive innovation and progress across diverse sectors.
As the former Director of Athlete Performance at Nike, Keith led transformative initiatives to support Nike’s elite athletes and teams and drive innovation across the organization. His leadership in this role underscored his commitment to pushing the boundaries of human performance through focused exploration. Now as an investor, advisor, and consultant for both spor1ng organizations and companies alike, Keith empowers teams and enterprises with his unique approach and understanding of organizational dynamics.
Abstract: Can college basketball stats predict a player's NBA success if adjusted for competition level? By evaluating competition level and statistical translation, we aim to develop create an adjustment model to better predict NBA performance. Our methodology includes calculating opponent strength using historical data, adjusting college stats based on competition level, and comparing adjusted stats to first-year NBA performance. To accomplish this we developed a predictive model using regression/machine learning.
Abstract: Schedules across college and professional sports display display substantial heterogeneity. In this work, we introduce a novel metric to quantify schedule balance in a manner that is agnostic to team strength. This metric, which we title schedule path ratios, compares graph-based distances in the observed schedule network to those in a network which represents an optimally balanced schedule. We illustrate this metric both globally and locally (within conferences and divisions) across major North American professional sports and NCAA football. Finally, using schedule path ratios, we conduct a simulation based analysis which demonstrates that a decline in schedule balance over the past 20 years has made it more difficult to identify the best teams, a salient point given the new college football playoff.
Abstract: We run multiple different types of tournament structures to compare teams' true ranks to their simulation ranks. In order to numerically determine which structure is better in accurately outputting the correct ordering of teams, we create plots for the information (currently using mutual information) obtained by many simulations.
Abstract: Regularized Adjusted Plus Minus has become a cornerstone of modern basketball analytics, creating a measure of player impact by combining On-Off court differentials with regularized regression techniques. While RAPM and its successors such as RPM, EPM and LEBRON have shaped player evaluation in the NBA, no such model has been systematically applied and made public for the WNBA. This project presents the development of a custom RAPM model for the WNBA using publicly available play by play data spanning multiple seasons. To address the shorter season, RAPM is stabilized using box-score statistics as empirical priors, allowing the model to borrow from known performance indicators and stabilize estimates for players with sparse minutes. Furthermore, a time decay factor is applied, weighting recent seasons more heavily to reflect current player ability and trends. The result is a robust and transparent RAPM framework tailored to the WNBA, allowing reliable insights to player impact and laying the groundwork to advanced analytics in women's basketball.
Abstract: The rise of sports betting has led to increased interest in developing data-driven strategies for identifying market inefficiencies. While predictive modeling for sports betting is well studied in team-based sports, individual combat sports such as mixed martial arts (MMA) present unique challenges due to limited data availability, outcome volatility, and the sport’s dynamic nature. Existing public research has relied primarily on data from a single source, UFC Stats, which provides detailed fight metrics but lacks broader contextual information. This study explores the use of alternative data sources—including fight history databases, betting odds aggregators, and community-driven metadata—to construct a more comprehensive dataset for modeling fights in the Ultimate Fighting Championship (UFC). By leveraging diverse data streams, this approach aims to uncover hidden signals that may provide an edge. Additionally, we experiment with ideas from conformal prediction and robust optimization to optimize bet sizing under uncertainty. To evaluate performance, our methods are tested on out-of-sample UFC fights from 2017 to 2024, with a focus on profitability and calibration. The results provide insight into the potential for systematic edges in MMA betting and a starting point for incorporating more diverse data sources into the modeling process.
Abstract: This study examines the relationship between bat speed, swing length, and baseball batting performance using k-means clustering. The goal is to determine whether distinct swing profiles are associated with more favorable offensive outcomes and how batters adjust their swings in different pitch contexts. Data from the 2024 MLB season were wrangled to address missing values and reduce multicollinearity. Composite swing metrics, including swing efficiency (bat speed/swing length) and swing combined (bat speed × swing length), were introduced. K-means clustering grouped swings based on bat speed and swing length, and batter performance was assessed using a batter score metric that incorporates both event value (e.g., single, home run) and changes in run expectancy. Statistical tests, including permutation tests and Kruskal-Wallis tests, were conducted to compare clusters. Three swing clusters were identified. The cluster with the highest bat speed and swing length exhibited the most favorable batting outcomes, while the cluster with the lowest values performed poorest. Swing adjustments were observed in high-pressure situations, with batters reducing bat speed and swing length when facing two-strike counts. The findings suggest that both high bat speed and longer swing length contribute to better offensive performance, with situational adjustments playing a role in approach at the plate.
Abstract: This study examines the impact of corporate sponsorship by Coca-Cola and Pepsi on college basketball teams' performance in the 2024 NCAA March Madness tournament. Given the prominence of corporate sponsorship in collegiate athletics, understanding its influence on competitive outcomes is valuable for athletic programs and corporate partners.
Using data from the 2024 season, we analyzed whether a college's beverage sponsor correlated with its tournament performance. Sponsorship data was sourced from open web sources, and team data was obtained from Kaggle. Fishers exact test assessed the association between sponsorship and the round reached, while random forest and logistic regression analyses evaluated sponsorships predictive power.
Results indicate no significant relationship between sponsorship and tournament participation (p > 0.05). However, a statistically significant dependence was found between sponsorship and advancement (p = 0.01171). Notably, only Coca-Cola-sponsored schools reached the Elite Eight and beyond. Despite this, neither the random forest nor logistic regression identified sponsorship as an influential predictor of performance.
These findings suggest a potential, yet unexplained, relationship between corporate sponsorship and team success in the NCAA tournament. Future research should explore this association across multiple years and sports to determine its broader significance.
Abstract: How do extenuating life circumstances have an effect on player performance in the NBA? This project seeks to investigate the effect of such events on performance by using BPM (Box Plus/Minus) and PER (Player Efficiency Rating). For this project, a random sample of 100 current NBA players were researched using publicly available information on the internet. For each player, average BPM and PER was calculated for each player in two separate categories: seasons with extenuating life circumstances and normal seasons. Wilcoxon signed-rank tests and paired t-tests were conducted for the differences in BPM and PER between normal seasons and seasons with extenuating circumstances, which produced p-values of 0.000011, 0.000035 for BPM and 0.000037, 0.0012 for PER. Overall, when dealing with extenuating life circumstances 65% of players had a higher BPM, 30% had a lower BPM, and 5% experienced no change. For PER, 69% of players had a higher PER, 27% had a lower PER, and 4% experienced no change. These results suggest that extenuating life circumstances do have an effect on player performance, and that players can be categorized into three groups: players that thrive under stress and perform better, players that perform consistently under both normal and extenuating circumstances, and players that crumble under stress and underperform. For this sample, a majority of players actually performed better when dealing with extenuating life circumstances. This suggests that the mental aspect of how players perform under stress could be another helpful metric in player evaluations.
Abstract: NASCAR is the premier American motorsport, with millions of fans across the United States and beyond eagerly tuning in each week to see who comes out on top in a test of skill, endurance, and most of all, speed. As with any sport, winning is of paramount importance, and prior research has attempted to model important driver and team characteristics, as well as methods for optimizing driver performance towards this end. However, no prior work has investigated ways to improve driver qualifying performance. This problem is significant since qualifying well will naturally improve the likelihood of achieving a strong finishing position in the race that follows. To address this gap, we analyze qualifying lap data from the NASCAR AdventHealth400 through the use of functional principal components analysis, and agglomerative hierarchical clustering. This allows us to generate distinct groups of drivers from which typical behavior patterns can be extracted. In particular, we observe multiple patterns of braking, throttle application, and steering which are differentially associated with qualifying performance. By identifying the highest performing clusters, we successfully identify actionable insights which can be used to enhance the qualifying performance of other drivers and ultimately achieve stronger race results.
Abstract: Can analyzing a batters swing type provide deeper insights into their overall performance and potential? If so, how can teams leverage this insight for player evaluation and lineup optimization? Our analysis explores whether Swing Length and Bat Speed, newly introduced by Major League Baseball through Statcast, reflect intrinsic batter characteristics. By integrating these physical swing metrics with plate discipline indicators, we classified batters into distinct swing profiles and examined their correlation with key performance metrics such as OPS and wRC+.
To assess the strategic implications of swing analysis, we employed Monte Carlo simulations to estimate expected run production across various lineup configurations. Unlike traditional models that primarily focus on batted ball data, our approach incorporates both swing mechanics and decision-making tendencies. This framework enables teams to optimize their lineup by testing multiple configurations and selecting the one that maximizes run production.
In addition to lineup construction, we developed scenario-based simulations to guide roster decisions. One scenario evaluates the optimal placement of a newly acquired player within an existing lineup to maximize offensive outputproviding valuable insight for free-agent acquisitions. Another scenario identifies the ideal swing profile for specific lineup spots, helping teams address underperforming positions with the most suitable player types.
Our findings introduce a novel approach to batter evaluation and lineup optimization, integrating swing analysis with probabilistic modeling. By combining swing mechanics with strategic decision-making, this research offers practical insights for player development, scouting, and in-game management.
Abstract: This study examines the hypothesis that the Rating Percentage Index (RPI) metric systematically overestimates teams from stronger conferences in NCAA Division I Men’s Soccer. Alternative RPI weightings are proposed and tested, first by using Bradley-Terry-Davidson simulations and then using real data from the 2024 NCAA season, to determine whether they reduce conference bias while improving predictive accuracy. Simulations show that the standard RPI formula is less accurate and tends to favor teams from stronger conferences when compared to alternative weights which emphasize performance over context. However, these findings do not hold in the 2024 season data, where context-heavy RPI formulas beat selected alternatives from an accuracy standpoint despite their preference for teams from stronger conferences. These results suggest that alternative RPI weights hold potential, but further multi-season analysis is warranted.
Abstract: The concept of “clutchness” in professional baseball has long been debated, yet remains difficult to quantify. This study aims to investigate whether high-pressure situations influence player performance and whether distinct “clutch” archetypes exist. Using regular season and postseason data from Baseball Reference, we analyze three key performance metrics: On-Base Plus Slugging (OPS), Total Bases (TB), and Championship Win Probability Added (cWPA)—to assess how players respond under varying levels of pressure. We employ hypothesis testing to determine statistical significance in performance differences, K-means clustering for classifying players into specific archetypes, and logistic regression models to forecast probability of success in high-leverage situations. By integrating statistical analysis with sabermetrics, this study seeks to provide a data-driven perspective on the existence of a “clutch gene” in baseball. Our findings could have implications for player evaluation, recruitment strategies, and in-game decision-making, offering a more objective framework for understanding performance under pressure.
Abstract: This study explores the relationship between swing speed trends and injury-related absences among Major League Baseball (MLB) players. It was inspired by Rafael Devers’ injury- impacted 2024 season. Devers’ swing speed significantly declined following his midseason injury, motivating a broader investigation into whether similar trends could be identified across other MLB players. The analysis examined swing speed and length using rolling averages, highlighting potential injury and performance decline indicators.
Abstract: Despite advancements in sports analytics, there is limited research applying computer vision techniques to squash. This study presents a novel, low-cost AI system that uses a single-camera setup to detect players, the ball, and court boundaries using deep learning models. The system extracts spatial and temporal data from match footage, allowing for detailed performance analysis without the need for expensive multi-camera setups. Top-down heatmaps are generated to illustrate player positioning, movement trends, and time spent in different court areas. In addition, we can determine when the ball is hit, from where, and the location it lands with respect to the opposing player. These insights can help players and coaches refine strategies, optimize training, and improve decision-making. In addition, in our work, we analyze patterns of movement and shot selection by shared characteristics that may influence tactics or behavior such as gender, nationality, and skill level. By reducing hardware costs while maintaining accuracy, this method makes data-driven squash analytics more accessible. The project addresses a key research gap in squash-related AI applications, showcasing the potential of Computer Vision for match evaluation.
Abstract: For some time, there has been a disagreement among fans about the ideal swing type for batters. Some argue that batters are too selfish, citing low batting averages, rising strikeout rates, and increasing home run swings. Others argue that low batting averages and rising strikeout rates have to do more with the advancement of pitching and that hitting for power, rather than contact, typically yields better results in the long run. There are also those who argue that batters should change their swings in different situations. For example, they can swing for the fences when the bases are empty, but should look to shorten their swing and emphasize contact when there are runners in scoring position. For this project, we wanted to evaluate if different swing types are needed to maximize run production or if theres one swing type that provides a perfect balance between power and contact. We utilized k-means clustering to classify seven different swing types and then created two RBI matrices. One evaluates the expected number of RBIs on a given swing based on swing type, base situation, and number of outs, while the other finds the probability of at least 1 RBI scoring on a given swing based on swing type, base situation, and number of outs. We also created an XGBoost model that predicted the change in win probability based on swing type, base situation, number of outs, count, score, inning, and batting team score differential.
Abstract: Introduction Fouling off 2-strike pitches is a strategic skill in baseball that extends at-bats, tires pitchers, and improves offensive outcomes. However, batter mechanics (bat speed, swing length) influencing this skill remain unclear. This gap prevents players, coaches, and analysts from developing true cause-and-effect strategies beyond simple correlations, limiting strategic adjustments in high-pressure situations. This study examines how these mechanics influence 2-strike fouling probability using Bayesian causal inference, providing actionable insights for player development and decision-making. Methods Pitch-level data from Baseball Savant was used to analyze bat speed, swing length, and pitch movement in 2-strike fouling behavior. Although limited to the 2024 MLB season, this methodology provides a foundation for future research. A Bayesian logistic regression model estimated fouling probability while accounting for uncertainty and confounders. Contrast means and High-Density Intervals (HDI) validated statistical significance, while Gelman-Rubin statistic & Effective Sample Size confirmed model robustness. Results Shorter swings significantly increased fouling probability, with batters using shorter swings fouling off 48% of 2-strike pitches, compared to 37% for longer swings. Higher bat speed had a smaller effect, with batters swinging faster fouling off 44% of such pitches, compared to 41% for slower swings. Vertical pitch movement amplified these effects, making short swings more effective against high pfx_z pitches. Conclusion This study integrates batter mechanics and Bayesian causal inference to provide a fuller picture of 2-strike fouling behavior. Findings offer practical insights for refining hitting strategies and pitching tactics, and supports the growing use of causal inference to enhances player evaluation and decision-making.
Abstract: Data analytics is playing a beneficial role in track and field. Wearable devices can easily generate personally actionable data ranging from biomechanical to weather conditions. Students in the Computer and Data Science Department at Goucher College are collaborating with the colleges Track and Field teams to develop personalized machine learning and analytics tools. We present initial steps in our collaboration with coaches to optimize training, help tailor athlete schedules, and ultimately predict race outcomes. This bridge between academic learning and athletic excellence is providing athletes with advanced analytical tools while giving students hands-on experience and generating real-world case studies.
Abstract: Ulnar Collateral Ligament (UCL) reconstruction, commonly referred to as Tommy John Surgery, has seen a significant rise among Major League Baseball (MLB) pitchers in recent decades. While previous studies have explored correlations between pitching mechanics and UCL injuries using frequentist statistical approaches, the present study employs a Bayesian hierarchical model to better quantify the relationship between pitch characteristics and the likelihood of requiring UCL surgery. By incorporating prior knowledge and leveraging Bayesian ANOVA techniques, our approach offers a more structured and probabilistic assessment of the factors contributing to UCL injuries. Our findings aim to refine injury prediction models and provide a more comprehensive statistical framework for understanding the mechanics underlying UCL injuries in professional baseball.
Abstract: My project focuses on helping the Yale Women’s Soccer team by presenting their data in a way that is accessible to coaches. By developing an interactive user interface, the tool will enable coaches and staff to compare their players’ statistics within the team or with those of other Ivy League players, utilizing data visualizations like bar charts, radar maps, and scatter plots. This initiative addresses the need for data-driven insights for the team, which are currently limited to post-match reports and static spreadsheets.
Abstract: This study utilizes a unique shared-course environment to compare the average speeds of the worlds two fastest auto racing formats, Formula One (F1) and IndyCar. These two racing circuits are distinct in terms of their places of origin, where in the world they typically source drivers and hold races, and the locations of their respective fan bases. Though F1 and IndyCar cars share several common physical characteristics, their engines, chassis, tires, steering, and aerodynamics are distinct. While these two formats are commonly recognized as the two fastest forms of full-course auto-racing, the relative average speed capabilities of the two formats (e.g., over a neutral course) are unclear, as they have historically held official events on different sets of courses. In 2019, however, at the Circuit of the Americas in Austin, TX, F1 and IndyCar both held races over the same course, allowing for a natural experimental comparison in which year (state of technology), course, and even general weather conditions were common across the two races. By modeling qualifying times conditional on driver, race format, and other car characteristics, we statistically compare the relative average speeds of the two formats over the same course. In a set of fixed and mixed effects linear and polynomial regression models, we find that F1 was significantly and substantially faster than IndyCar. Average speed difference between the two formats was approximately 16.5 miles per hour over the course. These results inform not only race speed supremacy but returns to technological investment in the private sector.
Keywords: natural experiment, auto racing performance, F1, Indy Car, fixed effects, mixed effects
Abstract: We participated in the competition and were selected to present our poster! "In professional baseball, every swing reflects a decision influenced by pitch characteristics, game context, and a batter's intent. However, deviations from expected swing mechanics may indicate a breakdown in consistency throughout a game. This study investigates the presence and progression of deviations from expected swing mechanics throughout a game, focusing on two key metrics: swing length residuals and the relationship between bat speed and launch speed. Using pitch-level data from Major League Baseball, we apply regression models and residual analysis to quantify these deviations. A logistic regression indicated a 26.4% increase in unintended swing likelihood per additional swing in an inning (p < 0.001). Swing length analysis revealed increasing mechanical variability, with mean squared error increasing significantly with swing count (Estimate: 0.00176, p = 0.00467). These trends suggest a pattern of mechanical inconsistency, potentially linked to fatigue or cognitive strain. This research provides a foundation for further exploration into batter performance trends and offers insights for coaching strategies aimed at injury prevention and fatigue management."
Abstract: Coaching staff cohesion has been one of the most overlooked aspects of team success. A group of coaches that have experience working with each other should theoretically foster team success. But is this true in reality? To answer this question, we created a first-of-its-kind coaching network that maps connections between every head coach, coordinator, position coach, and quality control coach from 2010 to 2024. Our analysis revealed a statistically significant positive association between the closeness of coaching communities and key team performance metrics, including win percentage, playoff appearances, and division titles. These findings highlight the value of long-term coaching relationships, shedding light onto the potential benefits of front office patience and continuity. In addition to these findings, we employed modularity-maximizing clustering algorithms to identify distinct “coaching communities,” which represent the different coaching philosophies in the NFL today. The algorithm divided our coaching network into 19 different communities, 13 were found to be significant. Lastly, inspired by separation theory, we measured the closeness of every coach to Kyle Shanahan. Shanahan has been praised for his influence and success, with several of his disciples earning head coaching jobs. By analyzing the proximity of other coaches to Shanahan, we aimed to understand the extent of his influence on the broader coaching landscape. This could perhaps inform how connections to an influential figure impact team success and offer some power in predicting future head coaches. By examining the structure of coaching networks like never before, we uncover valuable insights that could influence decisions across the NFL.
Abstract: Predicting match outcomes in tennis poses a significant challenge due to the sport's unpredictable nature and the influence of numerous factors on player performance. This study seeks to forecast the top 10 ranked athletes and their respective winning probabilities to win the 2024 US Open Men's Championship using Logistic Regression. The research analyzes data from US Opens from 2016 to 2023. The primary variables selected for the analysis are the winner's rank and the opponent's rank, applied in a logistic regression model using an 80/20 train-test split. The test accuracy was 68%. The probability of winning the US Open was also calculated for the top 10 ranked players, finding that the No.1 ranked players probability of winning the US Open was 3.1%. As No.1 seeds have won 28.9% of the mens US Open singles tournaments, this suggests that using the players and opponents rank is insufficient to determine the probability of an individual winning the US Open.
Abstract: Regularized adjusted plus-minus (RAPM) models are foundational tool in hockey analytics for evaluating individual players in terms of their offensive or defensive contribution. We examine the performance of RAPM models with respect to matrix formats (dense, compressed sparse row, compressed sparse column, coordinate list) as well as collapsing portions of the design matrix into a value relative to the league average. We also investigate the use of box score statistics to predict RAPM coefficients as a step toward a fully Bayesian RAPM model and explore whether expected goals-based alternative metrics improve the results.
Abstract: In baseball, one way to make pitchers more effective (without improving their mechanics) is to select the right pitches. I wanted to know which pitch sequences were the most unpredictable and difficult to bat against; specifically, I searched for those which generated the most whiffs. I implemented a generalized additive model to predict a pitch’s whiff probability given its pitch type - fastball, changeup, slider, or curve - and the hitter’s bat speed and swing length. Then, I looked at sequences of pitches to see which ones generated more swings-and-misses than expected. Some sequences emerged as more effective than others, demonstrating that pitch selection is not just a superficial task for the catcher but a strategic medium for the pitcher to gain an advantage. Results include the best overall sequence, the best starting pitch and follow-up, and specific combinations of pitches to throw or avoid.
Abstract: The strategic interaction between pitchers and batters has always been one of the highlights of MLB games. While it is well established that batters’ behavior are more passive, adjusting their actions in response to the pitcher’s movements, in this study, we aim to explore a more counterintuitive scenario: how pitchers, in turn, are influenced by batters and other on-field contexts. Using models such as Random Forest and BART (Bayesian Additive Regression Trees), we predicted pitch types and examined the causal effects of swing characteristics on pitching strategies. Our findings revealed that Random Forest achieved the highest accuracy (42.02%) in the predictive model. Additionally, we found facing the last bat speed outside the 50-75 mph range, pitchers appear highly inconsistent in their decision-making regarding fastball choice, indicating that extreme cases have a significant impact on their strategy.