data visualization
Joyful Pandas
Datawhale CommunityJoyful Pandas
basic drawing
one-dimensional data
- numeric
- Histogram plt.hist()
- Box plot plt.boxplot()
- Line chart plt.plot() # ordered numerical type
- Type
- Histogram plt.bar()
- pie chart plt.pie()
Hands-on Data Analysis
Datawhale community hands-on data analysis
2 Chapter 2: Data Visualization
Before starting, import numpy, pandas, and matplotlib packages and data
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
#加载result.csv这个数据
df = pd.read_csv('./result.csv')
df.head()
Unnamed: 0 | PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | respect | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1.0 | 0.0 | A/5 21171 | 7.2500 | NaN | S |
1 | 1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1.0 | 0.0 | PC 17599 | 71.2833 | C85 | C |
2 | 2 | 3 | 1 | 3 | Heikkinen, Miss. A loan | female | 26.0 | 0.0 | 0.0 | STON/O2. 3101282 | 7.9250 | NaN | S |
3 | 3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1.0 | 0.0 | 113803 | 53.1000 | C123 | S |
4 | 4 | 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0.0 | 0.0 | 373450 | 8.0500 | NaN | S |
2.7 How to let people understand your data at a glance?
"Python for Data Analysis" Chapter 9
2.7.1 Task 1
Follow the ninth chapter of the book to understand matplotlib, create a data item by yourself, and perform basic visualization on it
[Thinking] What are the most basic visual patterns? Applicable to those scenarios respectively? (For example, a line chart is suitable for visualizing the trend of an attribute value over time)
This part of the reference content comes from datawhale open source content fantastic-matplotlib
matplotlib provides two of the most commonly used drawing interfaces
-
Explicitly create figures and axes, and call drawing methods on them, also known as OO mode (object-oriented style)
-
Rely on pyplot to automatically create figures and axes, and draw
fig, ax = plt.subplots()
ax.plot([1,2,3,4], [1,4,2,3])
plt.show()
plt.plot([1,2,3,4], [1,4,2,3]);
<matplotlib.lines.Line2D at 0x23155916dc0>
When using matplotlib in jupyter notebook, you will find that a paragraph like this is automatically printed out after the code runs , because matplotlib's drawing code prints out the last object by default. If you don't want to display this sentence, there are three methods:
-
Add a semicolon at the end of the code block
;
-
Add a sentence at the end of the code block
plt.show()
-
When drawing, explicitly assign the drawing object to a variable, such as changing plt.plot([1, 2, 3, 4]) to
line =plt.plot([1, 2, 3, 4])
2.7.2 Task 2
Visualize the distribution of survivors among men and women in the Titanic dataset (try it with a histogram).
#代码编写
sex_survived = df.groupby('Sex')['Survived'].sum()
_ = plt.bar(sex_survived.index, sex_survived.values)
[Thinking] Calculate the number of deaths among men and women in the Titanic data set, and visualize it? How to combine it with the visual histogram of the number of survivors of men and women? See your data visualization and talk about your first impressions (for example: you can see that more boys survived at a glance, so gender may affect the survival rate).
#思想问题answer
Women survive far more than men
2.7.3 Task Three
Visualize the proportion of survivors and deaths among men and women in the Titanic dataset (try it with a histogram).
#代码编写
# 提示:计算男女中死亡人数 1表示生存,0表示死亡
radio_ss = df.groupby(['Sex','Survived'])['Survived'].count().unstack()
radio_ss
Survived | 0 | 1 |
---|---|---|
Sex | ||
female | 81 | 233 |
male | 468 | 109 |
Index Pivot : convert row index to column index,
unstack()
: By default, the innermost row index is moved to the innermost column index
radio_ss.plot(kind = 'bar', stacked = True);
[Tips] For the two data axes of men and women, the number of survivors and deaths is expressed in a histogram in proportion
2.7.4 Task Four
Visualize the distribution of the number of people alive and dead for different fares in the Titanic dataset. (Try it with a line chart) (The horizontal axis is different ticket prices, and the vertical axis is the number of survivors)
[Tip] For data of this statistical nature and represented by broken lines, you can consider sorting or unsorting the data to represent them separately. see what you can find
#代码编写
# 计算不同票价中生存与死亡人数 1表示生存,0表示死亡
df.groupby(['Fare','Survived'])['Survived'].count().unstack().plot(kind = 'line');
2.7.5 Task Five
Visually display the distribution of survivors and dead personnel at different bin levels in the Titanic dataset. (Try it with a histogram)
#代码编写
# 1表示生存,0表示死亡
df.groupby(['Pclass', 'Survived'])['Survived'].count().unstack().plot(kind = 'bar', stacked = True);
[Thinking] After seeing the previous few data visualizations, talk about your first impression and your summary
#Thinking question answer
- High ticket prices, high probability of first-class survival
- Females are more likely to survive
2.7.6 Task Six
Visualize the distribution of the number of survivors and deaths of people of different ages in the Titanic dataset. (unlimited expression)
#代码编写
df.groupby(['Age','Survived'])['Survived'].count().unstack().plot(kind = 'line');
2.7.7 Task Seven
Visualize the age distribution of people in different bin classes in the Titanic dataset. (Try it with a line chart)
#代码编写
df.groupby(['Age','Pclass'])['Age'].count().unstack().plot(kind = 'line');
[Thinking] Do an overall analysis of all the visualization examples above, and see if you can find out by yourself
#Thinking question answer
- Younger people have a higher chance of survival
- Middle-aged people also have a certain survival rate. Combined with the cabin class-age relationship, this part of middle-aged people may be richer, have a certain social status, and have a higher survival rate
- Among these people, the probability of survival of women is generally higher than that of men
【Summarize】
Have a basic understanding of data visualization and learn how to draw basic graphics.