Data analysis Seaborn Draw Normal mode summary | drawing a variety of methods to find trends, find a relationship, looking distribution | 20 mins Express

Here Insert Picture Description

Trends

- A trend is defined as a pattern of change.
- sns.lineplot - Line charts are best to show trends over a period of time, and multiple lines can be used to show trends in more than one group.
Relationship

- There are many different chart types that you can use to understand relationships between variables in your data.
- sns.barplot - Bar charts are useful for comparing quantities corresponding to different groups.
- sns.heatmap - Heatmaps can be used to find color-coded patterns in tables of numbers.
- sns.scatterplot - Scatter plots show the relationship between two continuous variables; if color-coded, we can also show the relationship with a third categorical variable.
- sns.regplot - Including a regression line in the scatter plot makes it easier to see any linear relationship between two variables.
- sns.lmplot - This command is useful for drawing multiple regression lines, if the scatter plot contains multiple, color-coded groups.
- sns.swarmplot - Categorical scatter plots show the relationship between a continuous variable and a categorical variable.
Distribution

- We visualize distributions to show the possible values that we can expect to see in a variable, along with how likely they are.
- sns.distplot - Histograms show the distribution of a single numerical variable.
- sns.kdeplot - KDE plots (or 2D KDE plots) show an estimated, smooth distribution of a single numerical variable (or two numerical variables).
- sns.jointplot - This command is useful for simultaneously displaying a 2D KDE plot with the corresponding KDE plots for each individual variable.

1. Line Chart

import pandas as pd
pd.plotting.register_matplotlib_converters()
import matplotlib,pyplot as plt
%matplotlib inline
import seaborn as sns

# Path of the file to read
spotify_filepath = "../input/spotify.csv"

# Read the file into a variable spotify_data
spotify_data = pd.read_csv(spotify_filepath, index_col="Date", parse_dates=True)

spotify_data.tail()

	Shape of You	Slowly	Something Just Like This	HUMBLE.	Unforgettable
Date
2018-01-05	4492978	3450315.0	2408365.0	2685857.0	2869783.0
2018-01-06	4416476	3394284.0	2188035.0	2559044.0	2743748.0
2018-01-07	4009104	3020789.0	1908129.0	2350985.0	2441045.0
2018-01-08	4135505	2755266.0	2023251.0	2523265.0	2622693.0
2018-01-09	4168506	2791601.0	2058016.0	2727678.0	2627334.0

# Line chart showing daily global streams of each song
sns.lineplot(data=spotify_data)

<matplotlib.axes._subplots.AxesSubplot at 0x7fc8b2bb6f98>

Here Insert Picture Description

# Set the width and height of the figure
# sets the size of the figure to 14 inches (in width) by 6 inches (in height)
plt.figure(figsize=(14, 6))

# Add title
plt.title("Daily Global Streams of Popular Songs in 2017-2018")

# Line chart showing daily global streams of each song
sns.lineplot(data=spotify_data)

<matplotlib.axes._subplots.AxesSubplot at 0x7fc8b2a74780>

Here Insert Picture Description

Changing styles

# Seaborn has five different themes:(1)"darkgrid", (2)"whitegrid", (3)"dark", (4)"white", and (5)"ticks"
# Change the style of the figure to the "dark" theme
sns.set_style("dark")

# Line chart 
plt.figure(figsize=(12,6))
sns.lineplot(data=spotify_data)

<matplotlib.axes._subplots.AxesSubplot at 0x7f5faa4bc828>

Here Insert Picture Description

Plot a subset of the data

# Set the width and height of the figure
plt.figure(figsize=(14,6))

# Add title
plt.title("Daily Global Streams of Popular Songs in 2017-2018")

# Line chart showing daily global streams of 'Shape of You'
sns.lineplot(data=spotify_data['Shape of You'], label="Shape of You")

# Line chart showing daily global streams of 'Despacito'
sns.lineplot(data=spotify_data['Despacito'], label="Despacito")

# Add label for horizontal axis
plt.xlabel("Date")

Here Insert Picture Description

2.Bar Charts

# Print the data
flight_data

	AA	AS	B6	DL	EV	F9	HE HAS	MQ	NK	AND	UA	US	VX	WN
Month
1	6.955843	-0.320888	7.347281	-2.043847	8.537497	18.357238	3.512640	18.164974	11.398054	10.889894	6.352729	3.107457	1.420702	3.389466
2	7.530204	-0.782923	18.657673	5.614745	10.417236	27.424179	6.029967	21.301627	16.474466	9.588895	7.260662	7.114455	7.784410	3.501363
3	6.693587	-0.544731	10.741317	2.077965	6.730101	20.074855	3.468383	11.018418	10.039118	3.181693	4.892212	3.330787	5.348207	3.263341
4	4.931778	-3.009003	2.780105	0.083343	4.821253	12.640440	0.011022	5.131228	8.766224	3.223796	4.376092	2.660290	0.995507	2.996399
5	5.173878	-1.716398	-0.709019	0.149333	7.724290	13.007554	0.826426	5.466790	22.397347	4.141162	6.827695	0.681605	7.102021	5.680777
6	8.191017	-0.220621	5.047155	4.419594	13.952793	19.712951	0.882786	9.639323	35.561501	8.338477	16.932663	5.766296	5.779415	10.743462
7	3.870440	0.377408	5.841454	1.204862	6.926421	14.464543	2.001586	3.980289	14.352382	6.790333	10.262551	NaN	7.135773	10.504942
8	3.193907	2.503899	9.280950	0.653114	5.154422	9.175737	7.448029	1.896565	20.519018	5.606689	5.014041	NaN	5.106221	5.532108
9	-1.432732	-1.813800	3.539154	-3.703377	0.851062	0.978460	3.696915	-2.167268	8.000101	1.530896	-1.794265	NaN	0.070998	-1.336260
10	-0.580930	-2.993617	3.676787	-5.011516	2.303760	0.082127	0.467074	-3.735054	6.810736	1.750897	-2.456542	NaN	2.254278	-0.688851
11	0.772630	-1.916516	1.418299	-3.175414	4.415930	11.164527	-2.719894	0.220061	7.543881	4.925548	0.281064	NaN	0.116370	0.995684
12	4.149684	-1.846681	13.839290	2.504595	6.685176	9.346221	-1.706475	0.662486	12.733123	10.947612	7.012079	NaN	13.498720	6.720893

# Set the width and height of the figure
plt.figure(figsize=(10,6))

# Add title
plt.title("Average Arrival Delay for Spirit Airlines Flights, by Month")

# Bar chart showing average arrival delay for Spirit Airlines flights by month
sns.barplot(x=flight_data.index, y=flight_data['NK'])

# Add label for vertical axis
plt.ylabel("Arrival delay (in minutes)")

Text(0, 0.5, 'Arrival delay (in minutes)')

Here Insert Picture Description

3.Heatmap

# Set the width and height of the figure
plt.figure(figsize=(14,7))

# Add title
plt.title("Average Arrival Delay for Each Airline, by Month")

# Heatmap showing average arrival delay for each airline by month
sns.heatmap(data=flight_data, annot=True)

# Add label for horizontal axis
plt.xlabel("Airline")

Text(0.5, 42.0, 'Airline')

Here Insert Picture Description

4.Scatter Plots

insurance_data.head()

	age	sex	bmi	children	smoker	region	charges
0	19	female	27.900	0	yes	southwest	16884.92400
1	18	male	33.770	1	no	southeast	1725.55230
2	28	male	33.000	3	no	southeast	4449.46200
3	33	male	22.705	0	no	northwest	21984.47061
4	32	male	28.880	0	no	northwest	3866.85520

sns.scatterplot(x=insurance_data['bmi'], y=insurance_data['charges'])

<matplotlib.axes._subplots.AxesSubplot at 0x7f44f2300048>

Here Insert Picture Description

# Add a regression line
sns.regplot(x=insurance_data['bmi'], y=insurance_data['charges'])

<matplotlib.axes._subplots.AxesSubplot at 0x7f44f222c588>

Here Insert Picture Description

Color-coded scatter plots

# color-code the points by 'smoker', plot the other two columns('bmi', 'charges') on the axes 
sns.scatterplot(x=insurance_data['bmi'], y=insurance_data['charges'], hue=insurance_data['smoker'])

<matplotlib.axes._subplots.AxesSubplot at 0x7f44f19b49e8>

Here Insert Picture Description

# add two regression lines, corresponding to smokers and nonsmokers
# Instead of setting x=insurance_data['bmi'] to select the 'bmi' column in insurance_data, we set x="bmi" to specify the name of the column only.
# Similarly, y="charges" and hue="smoker" also contain the names of columns.
# We specify the dataset with data=insurance_data.

sns.lmplot(x="bmi", y="charges", hue="smoker", data=insurance_data)

<seaborn.axisgrid.FacetGrid at 0x7f44f192d668>

Here Insert Picture Description

sns.swarmplot(x=insurance_data['smoker'],
							y=insurance_data['charges'])

Here Insert Picture Description

5.Histograms

	Sepal Length (cm)	Sepal Width (cm)	Petal Length (cm)	Petal Width (cm)	Species
Id
1	5.1	3.5	1.4	0.2	Iris-silky
2	4.9	3.0	1.4	0.2	Iris-silky
3	4.7	3.2	1.3	0.2	Iris-silky
4	4.6	3.1	1.5	0.2	Iris-silky
5	5.0	3.6	1.4	0.2	Iris-silky

# 'a' chooses the columns of the data
# kde=False is something we'll always provide when creating a histogram, as leaving it out will create a slightly different plot.
sns.displot(a=iris_data['Petal Length(cm)'], kde=False)

<matplotlib.axes._subplots.AxesSubplot at 0x7f96c5b1da20>

Here Insert Picture Description

Color-coded plots

# Histograms for each species
sns.distplot(a=iris_set_data['Petal Length (cm)'], label="Iris-setosa", kde=False)
sns.distplot(a=iris_ver_data['Petal Length (cm)'], label="Iris-versicolor", kde=False)
sns.distplot(a=iris_vir_data['Petal Length (cm)'], label="Iris-virginica", kde=False)

# Add title
plt.title("Histogram of Petal Lengths, by Species")

# Force legend to appear
plt.legend()

<matplotlib.legend.Legend at 0x7f96c5849470>

Here Insert Picture Description

6. Density plots

# Kernel density estimate(KDE) plot is like as a smoothed histogram
# 'shade=True' colors the area below the curve
sns.kdeplot(data=iris_data['Petal Length (cm)'], shade=True)

<matplotlib.axes._subplots.AxesSubplot at 0x7f96c5a664e0>

Here Insert Picture Description

# 2D KDE plot
sns.jointplot(x=iris_data['Petal Length (cm)'], y=iris_data['Sepal Width (cm)'], kind="kde")

<seaborn.axisgrid.JointGrid at 0x7f96c59cbef0>

The color-coding shows us how likely we are to see different combinations of sepal width and petal length, where darker parts of the figure are more likely.
Here Insert Picture Description

the curve at the top of the figure is a KDE plot for the data on the x-axis (in this case, iris_data['Petal Length (cm)']), and
the curve on the right of the figure is a KDE plot for the data on the y-axis (in this case, iris_data['Sepal Width (cm)']).

Color-coded plots

# KDE plots for each species
sns.kdeplot(data=iris_set_data['Petal Length (cm)'], label="Iris-setosa", shade=True)
sns.kdeplot(data=iris_ver_data['Petal Length (cm)'], label="Iris-versicolor", shade=True)
sns.kdeplot(data=iris_vir_data['Petal Length (cm)'], label="Iris-virginica", shade=True)

# Add title
plt.title("Distribution of Petal Lengths, by Species")

Text(0.5, 1.0, 'Distribution of Petal Lengths, by Species')

Here Insert Picture Description

Sany Ho Chan

Published 37 original articles · won praise 0 · Views 810

Private letter concerns