UCSAS 2024 USOPC Data Challenge

For this data challenge, your goal is to identify the group of 5 athletes who will enable the Team USA Olympic Men’s and Women’s Artistic Gymnastics teams to optimize success in Paris 2024. You are tasked with developing an analytics model that can be used to identify and compare the expected medal count in the 8 medal events for men (team all-around, individual all-around, floor exercise, pommel horse, still rings, vault, parallel bars, and high bar) and 6 medal events for the women (team all-around, individual all-around, vault, uneven bars, balance beam, and floor exercise).

In total, 192 artistic gymnasts will compete at Paris 2024: 96 men and 96 women. Team events will feature 12 teams of 5 athletes each in the men’s and women’s events.* For countries that do not win a full team entry for Paris 2024, a maximum of 3 individuals per country will be able to qualify. Those remaining 36 entries for each gender will be determined by results of the 2023 World Championships, the 2024 World Cup Series, and the 2024 Continental Championships.

For each gender, 3 teams qualified at the 2022 World Championships in Liverpool, England. In the men’s competition, teams from China, Japan, and Great Britain qualified. In the women’s competition, teams from the United States, Great Britain, and Canada qualified. For each gender, 9 other countries will qualify teams based on their placements at the 2023 World Championships in Antwerp, Belgium, which will take place from September 30, 2023, and October 8, 2023.

*Note that at Tokyo 2020, men’s and women’s teams were composed of 4 athletes. Countries with full teams could also qualify up to 2 additional athletes to compete in Tokyo as individuals. The Team USA men had a 4-person team of Brody Malone, Sam Mikulak, Yul Moldauer, and Shane Wiskus. The USA men had one individual athlete, Alec Yoder, who could compete in the qualifying round at the Olympics to potentially qualify for the individual all-around and the apparatus finals. Similarly, the Team USA women had a 4-person team of Simone Biles, Jordan Chiles, Sunisa Lee, and Grace McCallum. The USA women had two individual athletes, Jade Carey and MyKayla Skinner, who could compete in the qualifying round at the Olympics to potentially qualify for the individual all-around and the apparatus finals.

According to recent news, gymnasts from Russia and Belarus will be allowed to take part in sanctioned competitions as “individual neutral athletes” from the start of 2024. Based on that information, Russia and Belarus will not be able to qualify for team artistics gymnastics events in Paris, but they would each be able to qualify up to 3 athletes to compete as individuals, as described above.

The problem of choosing a team is complicated by the structure of the Olympic competition. For both the men and women, the athletes first compete in a qualifying round in which their scores are used to determine advancement of teams (i.e,, countries) to the team all-around final and individuals to the individual all-around final and apparatus finals. In qualifying, 4 of the 5 athletes on each team compete on each apparatus, so not every athlete will compete in all apparatus. The athletes representing countries who did not qualify a full team may participate on all apparatus in the qualifying round.

The top 8 teams in qualifying advance to the team final based on the sum of the top 3 out of 4 scores on each apparatus (“4 up, 3 count”, for a total of 18 scores for men and 12 scores for women). Athletes must compete on all apparatus in qualifying to be eligible for the individual all-around final. The top 24 athletes qualify for the individual all-around final, with a maximum of two gymnasts per country. The top 8 athletes on each apparatus qualify for the final in that apparatus, again with a maximum of 2 gymnasts per country.

In the team all-around, individual all-around, and individual apparatus finals, all athletes’ scores from qualifying are thrown out. In the team all-around final, the medalists are determined by the sum of the 3 scores on each apparatus (“3 up, 3 count”, for a total of 18 scores for men and 12 scores for women). An athlete’s scores in the team all-around final have no effect on their scores in the individual all-around final, and in the individual apparatus finals, all previous scores from qualifying, the team all-around final, and the individual all-around final are thrown out.

The goal of this data challenge is to put together the best possible men’s and women’s Team USA Olympic Artistic Gymnastics teams, however, the meaning of “best” is left open to the interpretation of the entrant.

For example, a few questions that students could consider for their entry include:

  • How would your recommended team differ if you are trying to maximize total medal count, gold medals, or a weighted medal count (e.g., 3 for gold, 2 for silver, 1 for bronze)? How would your recommended team change if you consider a team all-around medal to be more valuable than the individual all-around medals and/or if you consider the individual all-around medals to be more valuable than the individual apparatus medals? Can Team USA maximize its total medal count by selecting a team of 5 gymnasts who are all-around gymnasts, event specialists (gymnasts who focus on 1 or more apparatus but not all apparatus), or a combination of those? Under what circumstances can Team USA maximize its total medal count by selecting a gymnast who only competes on 1 apparatus (e.g., Stephen Nederoscik, 2021 pommel horse World Champion)?

Data

Data from major domestic and international gymnastics competitions from the seasons leading up to the 2020 Tokyo and 2024 Paris Olympics will be provided to entrants. Because the Code of Points scoring system is changed each Olympic cycle, data from the years (2017-2021) leading up to the 2020 Olympics, which actually took place in 2021, should be used as a separate data set from the data from competitions in the 2022 and 2023 seasons leading up to the 2024 Olympics.

The cleaned data is on GitHub, which is being actively updated. The last update of data will be the 2023 World Artistic Gymnastics Championships in Belgium, which ends in early October, 2023.

Entrants can use any additional data that they choose as long as that data is publicly available, but the additional data need to be included in the final submission for reproducibility

Submission

Students must submit a zip file containing:

  • A pdf report describing their results (max 3000 words)
  • A folder with
    • Documented code files
    • A README file describing what each file does.

Note: Students can include other files including any app code or supporting documents. Any code or apps included need to be self-contained and able to run on reviewers’ computers without modification.

Eligibility

The UCSAS 2024 Data Challenge is open to students only. You must be enrolled as a high school, undergraduate, or graduate student at some point during the 2023-24 academic year. Participants must register using their school email address and participants must be at least 18 years old.

Teams must enter one of the following two tracks:

  • High School / Undergraduate Track
  • Graduate Track

To be eligible for the High School / Undergraduate ALL members of the team must consist of either high school and/or undergraduate students. Each team can have up to 3 members.

Judging Criteria

A panel of judges from across academia and the sports industry will judge your submissions based on the following:

  • How original is the analysis?
  • How applicable is the analysis?
  • How appropriate were the methods used?
  • How well did you communicate your findings? This includes both written text and visualizations. How did the use of facts, data-supported narratives, anecdotes etc. buttress your storytelling?

Terms of Participation

United States Olympic & Paralympic Committee IP (logos, terminology, etc.) may not be used in any manner, without prior written consent of the USOPC. USOPC terminology examples include but are not limited to: Olympic, Paralympic, Olympian, Paralympian, United States Olympic & Paralympic Committee (USOPC), Team USA.

UCSAS 2024 Data Challenge participants may not imply or publicize any relationship or association between themselves and the USOPC, Team USA and/or the Olympic and Paralympic Movements, without prior written consent of the USOPC.

For subsequent use of the project in personal portfolios and presentations other than those related to the UCSAS 2024 Data Challenge and the U.S. Olympic & Paralympic Data, Analytics & Technology Summit (for the Data Challenge winners), participants may anonymize the work product (for example, replace “USOPC” with “U.S. Sport Organization”). Federal law gives the USOPC extensive rights to control the use of USOPC IP in the United States and allows the USOPC to file a lawsuit against any entity using such IP without consent.

Note: The USOPC does not control the trademarks or logos owned by the National Governing Body (NGB) for a specific sport. For instruction on the proper use of those marks, please contact the relevant NGB directly.

Prizes

Finalists (Six teams: three high school / undergraduate and three graduate) will be invited to present their work at UCSAS 2024 in Storrs, CT. Winning teams (one high school/undergraduate team and one graduate team) will receive some travel support and have their registration fees waived. The winning teams will receive a cash prize (UCONN) and a plaque (UCONN). Additionally, the winning teams will have the opportunity to showcase their work, with all travel, registrations, and expenses paid, at the 2024 U.S. Olympic & Paralympic Data, Analytics & Technology Summit in Colorado Springs, CO, providing them with increased exposure and potential opportunities for future collaborations. The runners-up will receive a certificate of achievement (UCONN) and recognition for their outstanding performance in the Data Challenge.

Webinar intro to the Challenge and Data

Another webinar may be arranged depending on demand

Important Dates

  • Data challenge release: August 4, 2023
  • Webinar introduction: September 14, 2023
  • Submission deadline: January 15, 2024
  • Finalists notified: February 15, 2024
  • UCSAS 2024: April 12-13, 2024

Q&A Summary

  1. Question: Where can we find and refer back to this PowerPoint after tonight's seminar finishes?
    Answer: The slides and video recording will be made available on the website.

  2. Question: What does the "all around" mean?
    Answer: In the team all around qualification, 4 of the 5 members of a team compete on each apparatus and the top 3 scores on each apparatus count. In the team all around final, 3 of the 5 members compete on each apparatus and all 3 scores on each apparatus count. In the individual all around, each of the 24 qualified athletes performs on each apparatus once.

  3. Question: Is the qualification to qualify for the Olympics or to qualify for the finals of the Olympics?
    Answer: The qualification is to qualify for the team all around final, individual all around, and event finals within the Olympic competition itself, not to qualify for the Olympics.

  4. Question: Will all data be posted on the website?
    Answer: All data for the competitions provided, with the last one being the 2023 World Championships and the 2023 Asian Games in early October, will be posted on GitHub. Additional publicly available data can also be used, but it must be included with the project submission.

  5. Question: If athletes have similar execution scores, difficulty scores, and performance history, does Team USA consider defending Olympians?
    Answer: For this analysis, probably not, but it’s up to participants if they want to try to incorporate it. There is nothing mentioned in the USA Olympic Selection Procedures for Artistic Gymnastics about consideration based on participation in previous Olympic Games.

  6. Question: In events with two vaults, how much time is allowed between competitors' vaults?
    Answer: There may be rules around timing, but it's generally dictated by judges being ready. Athletes are typically ready before judges give the signal to compete.

  7. Question: Should we develop a model to predict medal counts for combinations, or pick 5 specific athletes?
    Answer: Develop a model for combinations first, but also provide a rank order of combinations and the best team of 5.

  8. Question: Can the final submission content just be the model, or also the 5 athletes the model selects?
    Answer: Include all components - report, code, data. Highlight the optimal teams of 5 in the report.

  9. Question: Can teammates be from different schools?
    Answer: Yes.

  10. Question: Can there be multiple teams from the same university?
    Answer: Yes, no limit.

  11. Question: For missing scores, is it appropriate to replace with zeros?
    Answer: Missing scores likely mean the gymnast didn’t compete. If the gymnast, didn’t compete, delete those records rather than averaging in zeros. However, there are rare cases in which the gymnast competed and earned a score of zero, such as with Donnell Whittenburg on his Vault during the team all around qualification at the 2022 World Championships.

  12. Question: How are collegiate and elite scoring related?
    Answer: Very little relationship. Men's college uses elite scoring, but women's college is very different.

  13. Question: Should there be separate models for men and women?
    Answer: Same modeling approach, but analyze men and women separately.

  14. Question: Do all countries submit team members at the same time?
    Answer: No, each country determines their team at different times after their trials/nationals.

  15. Question: Should results include alternates?
    Answer: An element of originality if you can do it and it provides additional insight to your analysis.

  16. Question: Will the scores of the Asian Games be included in the dataset?
    Answer: Yes, the Asian Games taking place Sept 23 - Oct 8 will also be included.

  17. Question: What is D score and E score in the dataset?
    Answer: D score is the difficulty score, determined by the skills performed and their difficulty value. E score is the execution score, starting from 10 and with deductions for errors in form, landing, etc.

  18. Question: How is the originality of the analysis judged?
    Answer: What you have done that is different from the norm and demonstrates additional insight.

  19. Question: How do we communicate our results?
    Answer: Visualizations in code can be included, select key graphs for the written report.

  20. Question: I observed missing values in the dataset - how to handle?
    Answer: Missing scores likely mean no competition on that apparatus. Delete records rather than imputing zeros.

  21. Question: Do the extra 36 gymnasts qualify for specific apparatus and/or the all-around, and then only compete in those events for which they qualified?
    Answer: The extra 36 Men and the extra 36 Women, who will qualify for Paris through a variety of competitions and both all-around and individual apparatus events, may participate on all apparatus in the Qualifications in Paris. All 96 Men (60 on full teams + 36 extra) in Paris will compete in Qualifications on the first day of competition for Men’s Artistic Gymnastics (July 27, 2024). Likewise, all 96 Women (60 on full teams + 36 extra) in Paris will compete in Qualifications on the first day of competition for Women’s Artistic Gymnastics (July 28, 2024).