Teach you to use Python for data analysis and visualization

Python is a very good language for data analysis, mainly because data-centric libraries are very suitable. Pandas is one of them, making it easier to import and analyze data. In this article, I used to analyze the data in the Country Data.csv file in the public data set on the Stanford website.

Installation
Install Pandas:

pip install pandas

Create a DataFrame in Pandas by using the pd.Series method to pass multiple Series to the DataFrame class to complete the creation of the data frame. Here, it is passed in two Series objects, with s1 as the first row and s2 as the second row.

example:

Output:

 

 

 

Import data with Pandas

The first step is to read the data. The data is stored as a comma-separated value or csv file, where each row is separated by a newline, and each column is separated by a comma (,). In order to be able to use the data in Python, the csv file needs to be read into a Pandas DataFrame. DataFrame is a way to represent and process table data.

example:

import pandas as pd 

df = pd.read_csv("IND_data.csv") 

df.head() 

df.shape 

Output:

 

29,10

Index DataFrames with Pandas

You can use the pandas.DataFrame.iloc method to build an index. The iloc method allows retrieval of up to rows and columns by location.

example:

df.iloc[0:5,:] 
df.iloc[:,:] 
df.iloc[5:,:5] 

Many people learn python and don't know where to start.
Many people learn python and after mastering the basic grammar, they don't know where to find cases to get started.
Many people who have done case studies do not know how to learn more advanced knowledge.
So for these three types of people, I will provide you with a good learning platform, free to receive video tutorials, e-books, and the source code of the course!
QQ group: 721195303


Indexing using tags in Pandas

You can use the pandas.DataFrame.loc method to index labels, which allows you to use labels instead of positions for indexing.
example:

df.loc[0:5,:] 
df = df.loc[5:,:] 

The above content is actually not much different from df.iloc [0:5,:]. This is because although the row label can take any value, our row label matches the position exactly. However, column labels can make processing data easier. example:

df.loc[:5,"Time period"] 

 

DataFrame Math与Pandas

The calculation of the data frame can be done by using the statistical function of the pandas tool.
example:

df.describe() 
df.corr() 
df.rank() 

 

 

 

Pandas diagram

The graphs in these examples are made using standard conventions for referencing the matplotlib API, which provides the basics of Pandas to easily create beautiful maps.
example:

import the required module 
import matplotlib.pyplot as plt 
df['Observation Value'].hist(bins=10) 

df.boxplot(column='Observation Value', by = 'Time period') 

x = df["Observation Value"] 
y = df["Time period"] 
plt.scatter(x, y, label= "stars", color= "m", 
			marker= "*", s=30) 
plt.xlabel('Observation Value') 
plt.ylabel('Time period') 
plt.show() 

 

 

 

I still want to recommend the Python learning group I built by myself : 721195303 , all of whom are learning Python. If you want to learn or are learning Python, you are welcome to join. Everyone is a software development party and share dry goods from time to time (only Python software development related), including a copy of the latest Python advanced materials and zero-based teaching compiled by myself in 2021. Welcome friends who are in advanced and interested in Python to join!

 

Guess you like

Origin blog.csdn.net/pyjishu/article/details/114580500