Seaborn data visualization case analysis - shared bicycle

Table of contents

 1. Introduction to Seaborn

2. Data introduction

3. Data Analysis

Statistics on the number of users using shared bicycles in the first quarter of 2011 and 2012


 1. Introduction to Seaborn

        Seaborn is a high-level Python data visualization library developed based on matplotlib. It is used to draw more refined and beautiful graphics. Its drawing logic is basically the same as that of matplotlib, but the rendering effect is clearer and more beautiful than matplotlib. Common charts include scatter plots, line charts, histograms, etc.

        This article takes the shared bicycle data as an example, introduces the histogram, scatter plot in Seaborn, and the related processing methods of pandas to the data, and uses the histogram and scatter plot to realize the visual analysis of the shared bicycle data.

The shared bicycle data in this article can be obtained in the points resource. Download link: https://download.csdn.net/download/m0_52051577/87794022?spm=1001.2014.3001.5503


2. Data introduction

        The data selected in this paper is the data of shared bicycles from 2011 to 2012. The data attributes include datetime (date), season (quarter), holiday (holiday), workingday (working day), weather (weather), temp (temperature), atemp ( Feeling temperature), humidity (humidity), windspeed (wind speed), casual (number of non-registered user rentals initiated), registered (number of registered user rentals), count (total number of car rentals).

3. Data Analysis

Statistics on the number of users using shared bicycles in the first quarter of 2011 and 2012

(1) The implementation process is as follows:

(2) Seaborn related functions involved:

sns.barplot(x,y,data,hue,palette) function: where x,y represent the column names in the dataframe, where data represents the data table or array. Hue indicates the basis of classification, and palette indicates the setting of hue.

(3) Implementation steps

Import Data:

##导入数据并预览
import pandas as pd
data=pd.read_csv(r"D://jupyter_data/bike_train.csv")
data

 Extract data target columns: season, count, datetime columns. Segment the datetime column.

##提取datetime列并以'-'为分隔符进行切分,将日期分为年、月、日
data1=data[['season','count','datetime']]
year=data1['datetime'].str.split('-',expand=True)[0]
month=data1['datetime'].str.split('-',expand=True)[1]
day1=data1['datetime'].str.split('-',expand=True)[2]
data1.insert(2,'year',year)
data1.insert(3,'month',month)
data1.insert(4,'day',day1)
data1

 Extract the data of 2011 and 2012 respectively, and take the first three columns of data, and count the number of car rentals corresponding to each quarter by quarter.

data1_2011=data1[data1['year']=='2011'].iloc[:,0:3]
data1_2012=data1[data1['year']=='2012'].iloc[:,0:3]
group1=data1_2011.groupby('season').agg({'count':'sum'})
group2=data1_2012.groupby('season').agg({'count':'sum'})
# 表格转置并重置索引,重设列名
group1.columns=['2011']
s_count_st=pd.DataFrame(group1.stack(),columns=['count'])
s_count_st=s_count_st.reset_index()
s_count_st.columns=['season','year','count']
mid=s_count_st['year']
s_count_st.pop('year')
s_count_st.insert(2,'year',mid)
s_count_st

 Merge data

s_count1=pd.concat([s_count_st,s_count_st2],axis=0)
s_count1

 

Statistics on the number of car rentals in each month of 2011 and 2012

month1_2011=data1[data1['year']=='2011'].iloc[:,[0,1,3]]
#提取出季度、租车人数、月份三列
m_count2011=month1_2011.groupby('month').agg({'count':'sum'})
#按月份分组,统计每个月的租车人数
m_count2011.columns=['2011']
m_count_st=pd.DataFrame(m_count2011.stack(),columns=['count'])
#重置索引
m_count_st=m_count_st.reset_index()
m_count_st.columns=['month','year','count']
midpro=m_count_st['year']
m_count_st.pop('year')
m_count_st.insert(2,'year',midpro)
m_count_st
month1_2012=data1[data1['year']=='2012'].iloc[:,[0,1,3]]
m_count2012=month1_2012.groupby('month').agg({'count':'sum'})
m_count2012.columns=['2012']
m_count_st1=pd.DataFrame(m_count2012.stack(),columns=['count'])
m_count_st1=m_count_st1.reset_index()
m_count_st1.columns=['month','year','count']
midpro1=m_count_st1['year']
m_count_st1.pop('year')
m_count_st1.insert(2,'year',midpro1)
m_count_st1

After the date column in the original data table is further divided into year, month and day, the day column is divided into timestamps, and the timestamp is further divided into hours.

#提取年
year=data['datetime'].str.split('-',expand=True)[0]
#提取月
month=data['datetime'].str.split('-',expand=True)[1]
#提取日+时间
day1=data['datetime'].str.split('-',expand=True)[2]
#提取日
day=day1.str.split(' ',expand=True)[0]
#提取年月日
date=data['datetime'].str.split(' ',expand=True)[0]
#提取时间
timestamp=data['datetime'].str.split(' ',expand=True)[1]
#提取小时
hour=timestamp.str.split(':',expand=True)[0]
data2=data.drop(['datetime'],axis=1)
data2.insert(11,'date',date)
data2.insert(12,'timestamp',timestamp)
data2.insert(13,'year',year)
data2.insert(14,'month',month)
data2.insert(15,'day',day)
data2.insert(16,'hour',hour)
data2

 2011 and 2012 data:

#分别筛选出2011、2012的数据
data_2011=data2.loc[data2['year']=='2011']
data_2012=data2.loc[data2['year']=='2012']

Graphic display (cluster plot):

plt.figure(figsize=[10,6])
sns.barplot(data=s_count1,x='season',y='count',hue='year',palette='Set2')
plt.xticks(ticks=range(4),labels=['spring','summer','autumn','winter'])
plt.tight_layout()
plt.show()

 It can be seen from the histogram that in each quarter, the number of car rentals in 2012 is more than that in 2011, and the number of car rentals in summer and autumn is more than that in pure winter.

 Plot a scatterplot of air temperature versus wind:

sns.scatterplot(data=data_2011,x='temp',y='windspeed',hue='count')
plt.tight_layout()
plt.show()

It can be seen from the scatter plot that the number of people renting a car is concentrated in the range where the temperature is 20-35 and the wind is 10-40.

 Draw a scatterplot of temperature and humidity:

sns.scatterplot(data=data_2011,x='temp',y='humidity',hue='count')
plt.tight_layout()
plt.show()

 It can be seen from the scatter plot that the number of car rentals is concentrated in the interval between the temperature of 20-35 and the humidity of 20-80.

Guess you like

Origin blog.csdn.net/m0_52051577/article/details/130735989