Centennial Olympic Games in the History of Data Analysis

The 2020 Tokyo Olympics is over, and I just remembered to write a data analysis article about the Olympics. I was really too keen on chasing the Olympics a few days ago.

I searched and found that there are all the data from the first modern Olympic Games in 1896 to the Rio Olympic Games in 2016 on the Internet. Of course, if you are interested, you can also add the data for 2020, so we can use these data to Revisit the history of the Olympic Games for more than 100 years!

01 Ask a question

The Olympic Games, full name: Olympic Games, originated in ancient Greece more than 2,000 years ago, named after Olympia, held every four years, is the most influential sports event in the world.

In 1896, the Olympic Games, which had been suspended for 1,500 years, were finally re-hosted. This was also the first modern Olympic Games. Today, 32 Summer Olympics and 23 Winter Olympics have been held.

We can analyze this data with the following three questions to understand history.

  1. Geographically, which countries/regions have hosted the most Olympic games? Most athletes participating? Most awarded?
  2. Individually, how have the male and female athletes performed over the years?
  3. In terms of projects, are there any projects that certain countries/regions are strong in?

02 Data Exploration

Data source:
https://www.heywhale.com/mw/dataset/5b62ca77a711e60010ab1154

There are two sets of data, one athlete_events.csv, which contains the basic biological data and medal results of participating athletes.

A noc_regions.csv is the 3-letter code of the National Olympic Committee and the corresponding country information.

Analysis tools: Power BI + Excel

1.1 Field introduction

Athlete data includes the data of each athlete participating in the previous Olympic Games from 1896 to 2016, with a total of 271,116 rows and 15 fields, and each row corresponds to the information of each athlete participating in the Olympic Games.

  • ID: the unique number of each player, a total of 135571 numbers
  • Name: Athlete's name
  • Sex: Athlete's gender, F is female, M is male
  • Age: player age
  • Height: Athlete's height, in cm
  • Weight: Athlete's weight, in kg
  • Team: Athlete representative team, such as China
  • NOC: National Olympic Committee three-letter code
  • Games: which Olympic Games the athlete participated in
  • Year: year
  • Season: season
  • City: host city, such as Beijing
  • Sport: sports, such as basketball
  • Event: A specific event, such as men's basketball
  • Medal: Medal, such as gold, silver, bronze or none

Olympic Committee data:

  • NOC: 3-letter code for the National Olympic Committee
  • Region: Country
  • Notes: Remarks

What needs to be understood here is:

  • The ID number is less than the actual amount of data because one athlete will participate in several events, and one athlete corresponds to one ID number, not one piece of data corresponds to one ID.
  • NOC refers to the three-letter code of the National Olympic Committee, for example, the code of China is CHN
  • GAMES is the Olympic Games named after the year + season, for example, 2016 Summer is the 2016 Summer Olympic Games. The Olympic Games actually include the Summer Olympics, Winter Olympics, Paralympics, etc. The Summer Olympics have attracted more attention. This data includes the Summer Olympics and Winter Olympics.
  • TEAM is the athlete's representative team, that is, the athlete's country. Insert a piece of cold knowledge, the Olympic Committee is a delegation to participate in, not a country as a representative, so it has been emphasizing the concept of "country or region", which is why Taiwan will participate, and in the name of Chinese Taipei , because the Chinese Taipei Olympic Committee is a member of the International Olympic Committee, so it can participate. Historically, it participated in the name of the "Republic of China", which was boycotted by us.

By associating the two data with NOC as the common field, the country/region to which each athlete belongs can be obtained.

Import the data into Power BI and it will automatically set up the associations.

1.2 Data processing

1.2.1 Missing values

This data has missing values ​​in the Age, Height, Weight, and Medal columns:

  • The missing value of Medal indicates that the athlete has not won a medal in this event, so it does not need to be processed
  • The Age column has 9474 missing values, accounting for 3.5%
  • The Height column has 60171 missing values, accounting for 22%
  • The Weight column has 62875 missing values, accounting for 23%

Age, height, and weight belong to personal information. I originally wanted to do some filling processing, but found that many empty values ​​​​are a certain country, a category of items, and a large area is missing. It cannot be filled based on known information, so it is empty here. Well, keep the original data.

1.2.2 Outliers

The naming in the Team is not standardized. You can see that a representative team is followed by several numbers, but fortunately, the NOC codes of these representative teams are all the same, so you can directly match the country/region based on the NOC. Here It will not be dealt with. The other data are relatively standard, and there are no outliers.

1.2.3 Duplicate values

When introducing the field, it is mentioned that an ID represents an event that an athlete participates in, so ID repetition is normal, because an athlete may participate in more than one competition.

03 Data analysis

The Summer Olympics have been held every four years since 1896. This data is up to 2016, with a total of 29 games held , and the Winter Olympics held 22 times.

I believe that you who are sensitive to numbers have already discovered the problem. At the beginning of the article, it was mentioned that the 2020 Tokyo Olympics is the 32nd Olympics. How come the Summer Olympics here have been held 29 times? Which 3 sessions were missing and not held?

In fact, just look at the year and you will find the clues. Because of the two world wars, the three Olympic Games originally planned to be held in 1916, 1940, and 1944 have become blank.

The number of participants is increasing year by year

From 176 athletes from 12 countries/regions participating in the first modern Olympic Games in 1896 to 11,179 206 countries/regions in the 2016 London Olympics, the number of athletes participating in the 2020 Tokyo Olympics has gradually increased (11,669 athletes from 204 countries/regions in the 2020 Tokyo Olympics /region), the figure below is a graph of the number of athletes participating in the Summer Olympics and the number of parameter countries/regions.

The red circles in the picture are 3 obvious low points, which are worth mentioning:

  • In the 1932 Los Angeles Olympics, the number of participants was significantly reduced due to cost issues. It is worth mentioning that China sent a delegation for the first time in this Olympics. Olympic first shot.
  • The 1956 Melbourne Olympic Games was the only Olympic Games held at different times and places in history, and it was against the background of the Cold War between the United States and the Soviet Union. Many countries abstained, so it is conceivable that the number of people was small . Moreover, the Chinese delegation refused to participate in the Olympic Games because the International Olympic Committee recognized the People’s Republic of China and Taiwan’s participation in the Olympic Games under the name of the “Republic of China” on the other hand. In order to oppose the attempt to split China, China seriously boycotted the Olympic Games.
  • The 1980 Moscow Olympics, when the Soviet Union had not disintegrated, was held in a socialist country for the first time. In order to protest the Soviet Union’s invasion of Afghanistan, the United States and other countries initiated a boycott of the Moscow Olympics. In the end, only 80 countries participated, which was the fewest since 1956. A country participating in an Olympic Games.

The number of female athletes is gradually increasing

In 1900, 23 women participated in the Olympic Games for the first time, accounting for 1.87%. Since 1980, the number of women participating has increased significantly. By 2016, 5034 female athletes participated, accounting for 45%.

Historical ratio of male to female athletes.

The types of competitions are gradually enriched

The types of competitions in the previous Olympic Games have also gradually increased. There were only 9 events in the Summer Olympics in 1896, and there were 36 events in 2016.

sport with the most participation

The sport with the most participation in history is track and field, followed by swimming, rowing, and football.

The ratio of the number of male and female athletes participating in these events is shown in the figure below. There are no female athletes participating in 13 events including baseball, Nordic combined (Winter Olympic events), tug-of-war, rugby, polo, and lacrosse. However, in rhythmic gymnastics, There are no male athletes in synchronized swimming or softball.

Top of the country

No.1 Which country has participated in the Olympic Games the most times

A total of 208 countries/regions have participated in the Olympic Games in history. Australia, France, Greece, Italy, and Sweden have participated in all 29 Summer Olympics, and China has participated in 19 times.

No.2 Which country sends the most athletes

It can be seen that the United States has sent the largest number of people to participate in the Olympic Games in history, followed by Germany, and China ranked 11th, which is also related to our lack of participation in the early stage.

No.3 Which city has hosted the most Olympic games

A total of 42 cities have hosted the Olympic Games in history, among which Athens and London have hosted the Olympic Games three times, Innsbruck, Lake Placid, Los Angeles, Paris, St. Moritz, and Stockholm have hosted the Olympic Games twice, and the remaining cities have only Held once.

By drawing the data map, it can be seen that European countries have obvious advantages in both the number and quantity of single cities. However, Beijing will soon become a city that has hosted the Olympic Games twice (2008 Summer Olympics and the upcoming 2022 Olympic Games. Winter Olympics).

No.4 Which country has won the most awards?

The country with the most medals in history is the United States, followed by Russia, Germany, and the United Kingdom. In this year's Tokyo Olympics, we won 38 gold medals and 88 medals.

N0.5 Which country has the most gold medals in which project?

In terms of major events, the gold medals won by the United States in swimming and track and field accounted for almost half of the country.

personal best

By looking at the age distribution of athletes, we can know that the number of players aged 21 to 24 is the largest, and both male and female athletes are similar.

It can also be seen from the age distribution of the players who won the medals that the players aged 22 to 23 won the most awards.

No.1 youngest player

The youngest is the 10-year-old data. I checked it and it turned out to be true. Dimitrios Loundras, a 10-year-old kid, won the bronze medal in the men's gymnastics team at the Athens Olympics in 1896. He is the youngest winner in Olympic history. Athletes with cards.

No.2 oldest player

Then the 97-year-old athlete, I don’t think it’s an outlier. This John Quincy Adams Ward participated in the 1928 Amsterdam Olympics. Although he did not win a medal, he became the oldest at the age of 97. Olympic athletes.

Here is another piece of trivia about the Olympic art competition.

In the 7 Olympic Games from 1912 to 1948, there were art competitions, such as architecture, literature, music, painting and sculpture. Since 1952, the Olympic Art Competition was canceled and later changed to the Olympic Art Conference.

No.3 The shortest player

There are two shortest contestants, both 127cm, one male and one female.

One was Rosario Briones, a female all-around gymnast from Mexico who competed in the 1968 Mexico Olympics.

The other was Lyton Levison Mphande, a male boxer from Malawi who competed in the 1988 Seoul Olympics.

No.4 tallest player

The tallest is our Yao Ming, 226cm, who participated in the Olympic basketball events in 2000, 2004, and 2008.

No.5 lightest player

The lightest player is the all-round female gymnast from North Korea, only 25kg, really light as a swallow, participated in the 1980 Moscow Olympic Games.

No.6 Heaviest player

The heaviest player is the male judoka from Guam, who competed in the 2008 and 2012 Olympics.

Athlete with the most olympics

An equestrian athlete named Ian Milar has participated in 10 Olympic Games. Since 1972, he represented Canada in the Olympic Games. Until 2012, it was the 10th Summer Olympic Games he participated in, and he was the first in the 2008 Beijing Olympic Games. It is a very inspirational story to win the team silver medal in the equestrian event for the first time.

The player with the most gold medals

The player who has won the most gold medals in history is "Flying Fish" Phelps from the United States, with 23 gold medals far surpassing the second player with 13.

Chinese situation

In 1932, my country sent the first Olympic delegation to participate in the Olympic Games, and the name Liu Changchun was deeply remembered by us. After 1984, we began to send large-scale delegations to participate in the competition. In the 2008 Beijing Olympic Games, 633 athletes participated, reaching the highest level in history (China sent 431 athletes to the 2020 Tokyo Olympics).

The proportion of male and female athletes participating in the Chinese Olympic history is shown in the figure below. It can be seen that compared with international data, the status of our female athletes is much higher. In 1994, the proportion of female athletes reached the highest level of 72%.

The sports with the largest number of participants in our country are track and field, followed by swimming, basketball, shooting, and weightlifting. The five sports with the largest number of female athletes are track and field, swimming, weightlifting, volleyball, and basketball. For men, they are track and field, swimming, shooting, basketball, fencing.


The event with the most medals in my country is gymnastics (60 medals), followed by volleyball, weightlifting, swimming, badminton, diving, and table tennis.

The event with the most gold medals is volleyball (probably because of the large number of volleyball players), and the sports with the most gold medals in individual events are diving and weightlifting.

There are 3 Chinese players who have won the most gold medals, all of whom have won 5 gold medals. Diving athlete Chen Ruolin, diving athlete Wu Minxia, ​​and gymnast Zou Kai.

04 write at the end

Finally, to sum up,

  • The number of athletes in the Olympic parameters increased from 176 in the first session to 11,669 in the 32nd session, and the Olympic Games continued to cover more people
  • Female athletes have made a big breakthrough from less than 2% at the beginning to 45% today
  • The types of competitions have also increased from 9 to 36 today, with more and more types
  • The sport with the largest number of participants in history is track and field, followed by swimming, rowing, football
  • Australia, France, Greece, Italy, Sweden participated in all 29 Summer Olympics, China participated in 19
  • The U.S. sends the most people to the Olympics, followed by Germany and China in 11th place
  • A total of 42 cities have hosted the Olympic Games in history, among which Athens and London have hosted the Olympic Games three times
  • The country with the most medals is the United States, followed by Russia, Germany, the United Kingdom, and China ranks 12th
    ...

Writing this article, I checked a lot of Olympic knowledge, and I also learned a lot of cold knowledge. The analysis of the data of the 100-year Olympic Games is actually just the beginning. You can use it to practice if you have any ideas.

I hope that the Olympic Games will always maintain the original intention, so that the higher, faster and stronger Olympic spirit is not just a slogan. See you in the next Olympics.

Attached is the data source, which can be downloaded by yourself:
https://www.heywhale.com/mw/dataset/5b62ca77a711e60010ab1154

Guess you like

Origin blog.csdn.net/data_cola/article/details/119335091