30 minutes to complete the entry pandas

pandas are based on a data analysis tool numpy, skilled use of pandas can greatly reduce our workload
to introduce pandas package:

import numpy as np
import pandas as pd

pandas Data Types

There are two types of data pandas: series and dataframe.
is a series of one-dimensional data structure, each element has an index, similar to the one-dimensional array. Index string or number can be made, Series following structure:

dataframe is two-dimensional data structure, which is present in similar excel table, a corresponding row and column with the following structure:
Here Insert Picture Description

Create a Series object

Example 1
#我们可以直接用Series函数来创建对象
import numpy as np
import pandas as pd
a=pd.Series([1,2,3,4,5])
print(a)
'''
输出为:
0    1
1    2
2    3
3    4
4    5
dtype: int64
#程序会自动生成index,从0开始编号
'''
Example 2
#当然也可以指定index,并且Series可以使用已有的列表、元素来创建对象,也可以利用ndarray来创建
import numpy as np
import pandas as pd
a=np.array([1,2,3,4,5,6])
b=pd.Series(a,index=['a','b','c','d','e','f'])
print(b)
'''
输出为:
a    1
b    2
c    3
d    4
e    5
f    6
dtype: int32
'''
Example 3
#Series也可以用来创建时间序列,但是必须指定start,end,period中最少两个值
import numpy as np
import pandas as pd
b=pd.date_range('20200101','20200106')
print(b)
'''
输出为:
DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04',
               '2020-01-05', '2020-01-06'],
              dtype='datetime64[ns]', freq='D')
'''

Creating DataFrame object

There DataFrame row and column names, ranks name we can define my personal understanding of the constructor as follows

pd.DataFrame(object,index=,columns=)
Example 1
import numpy as np
import pandas as pd
s=np.arange(1,7)
s=s.reshape(2,3)
a=pd.DataFrame(s,index=['a','b'],columns=['A','B','C'])
print(a)
'''
输出为:
   A  B  C
a  1  2  3
b  4  5  6
'''
Example 2
#还可以通过字典来创建DataFrame对象,键值对应index
import numpy as np
import pandas as pd
a=pd.DataFrame({'name':pd.Categorical(['dn','muss']),
                'age':pd.Categorical(['18','19']),
               'score':pd.Categorical(['99','98'])})
print(a)
'''
输出为:
   name age score
0    dn  18    99
1  muss  19    98
'''

DataFrame property

View Data Types

We can use DataFrame.dtypes function to view the data type of each column

import numpy as np
import pandas as pd
a=pd.DataFrame({'name':pd.Categorical(['dn','muss']),
                'age':18,
               'score':pd.Series(np.arange(98,100))})
print(a.dtypes)
'''
输出为:
name     category
age         int64
score       int32
dtype: object
'''

View index and column names

Use DataFrame.index function to see the index
import numpy as np
import pandas as pd
a=pd.DataFrame({'name':pd.Categorical(['dn','muss']),
                'age':18,
               'score':pd.Series(np.arange(98,100))})
print(a.index)
'''
输出为:RangeIndex(start=0, stop=2, step=1)
'''
Use DataFrame.columns function to see the column name
import numpy as np
import pandas as pd
a=pd.DataFrame({'name':pd.Categorical(['dn','muss']),
                'age':18,
               'score':pd.Series(np.arange(98,100))})
print(a.columns)
#输出为Index(['name', 'age', 'score'], dtype='object')

View data and statistics

DataFrame.values ​​to view the data using the function
import numpy as np
import pandas as pd
a=pd.DataFrame({'name':pd.Categorical(['dn','muss']),
                'age':18,
               'score':pd.Series(np.arange(98,100))})
print(a.values)
'''
输出为:
[['dn' 18 98]
 ['muss' 18 99]]
'''
Use DataFrame.describe () function to view statistics
import numpy as np
import pandas as pd
a=pd.DataFrame({'name':pd.Categorical(['dn','muss']),
                'age':18,
               'score':pd.Series(np.arange(98,100))})
print(a.describe())
'''
输出为:
        age      score
count   2.0   2.000000
mean   18.0  98.500000
std     0.0   0.707107
min    18.0  98.000000
25%    18.0  98.250000
50%    18.0  98.500000
75%    18.0  98.750000
max    18.0  99.000000

Process finished with exit code 0

'''

DataFrame common operations

Transpose

About transpose operation Needless to say, numpy part have said

import numpy as np
import pandas as pd
a=pd.DataFrame({'name':pd.Categorical(['dn','muss']),
                'age':18,
               'score':pd.Series(np.arange(98,100))})
print(a.T)
'''
输出为:
        0     1
name   dn  muss
age    18    18
score  98    99
'''

Sequence

Sort sorted into sorting index value and by

According to sort index
import numpy as np
import pandas as pd
s=np.array([6,5,4,3,2,1]).reshape(3,2)
a=pd.DataFrame(s,index=[0,1,2],columns=['a','b'])
print(a.sort_index(axis=1))
'''
输出为:
   a  b
0  6  5
1  4  3
2  2  1
'''
Sort by value
import numpy as np
import pandas as pd
s=np.array([6,5,4,3,2,1]).reshape(3,2)
a=pd.DataFrame(s,index=[0,1,2],columns=['a','b'])
print(a.sort_values(by='b'))
'''
输出为:
   a  b
2  2  1
1  4  3
0  6  5
'''

Data Selection

Select the column

This will return a single column, the equivalent of a Series object

import numpy as np
import pandas as pd
s=np.arange(1,7).reshape(3,2)
a=pd.DataFrame(s,index=[0,1,2],columns=['a','b'])
print(a['a'])
'''
输出为:
0    1
1    3
2    5
'''

Select row

A truth

#这里我有点迷,直接用行索引会报错,只有使用切片才可以输出,假如想输出第一行就用0:1的方法。
import numpy as np
import pandas as pd
s=np.arange(1,7).reshape(3,2)
a=pd.DataFrame(s,index=[0,1,2],columns=['a','b'])
print(a[0:1])
'''
输出为:
   a  b
0  1  2
'''

Select the tab loc

loc used to obtain a crossing area, such as the first data we want a, b of the first and second column line:

import numpy as np
import pandas as pd
s=np.arange(1,7).reshape(3,2)
a=pd.DataFrame(s,index=[0,1,2],columns=['a','b'])
print(a)
print('.')
print(a.loc[[1,2],['a','b']])
'''
输出为:
   a  b
0  1  2
1  3  4
2  5  6
.
   a  b
1  3  4
2  5  6
'''

Of course, the data may be used to obtain the single separate.

import numpy as np
import pandas as pd
s=np.arange(1,7).reshape(3,2)
a=pd.DataFrame(s,index=[0,1,2],columns=['a','b'])
print(a.loc[1])
'''
输出为:
   a  b
0  1  2
1  3  4
2  5  6
.
a    3
b    4
Name: 1, dtype: int32
'''

Location selection iloc

loc is carried out by the label to the ranks of choice, iloc is selected by the position of the ranks, for example, you want the second column, write that column position 2 (assuming that starting from 1).

import numpy as np
import pandas as pd
s=np.arange(1,7).reshape(3,2)
a=pd.DataFrame(s,index=[0,1,2],columns=['a','b'])
print(a)
print('.')
print(a.iloc[[0,1],0:1])
'''
输出为
   a  b
0  1  2
1  3  4
2  5  6
.
   a
0  1
1  3
'''

Read the file and export operations

Import function Export function function
read_csv to_csv
read_excel to_excel
read_sql to_sql
read_json to_json
read_msgpack to_msgpack
read_html to_html
read_gbq to_gbq
read_stata to_stata
read_sas to_sas
read_clipboard to_clipboard
read_pickle to_pickle

Common statistical methods

function Explanation
count Number of non-NA values
describe Calculate summary statistics for DF Series or columns
min , max Minimum and maximum
argmin , argmax The minimum and maximum index position (integer)
idxmin , idxmax The minimum and maximum index values
quantile Sample quantiles (0-1)
sum Summing
mean Means
median Median
mad Calculating an average from the mean absolute deviation
where variance
std Standard deviation
skew Sample values ​​skewness (third moment)
kurt Sample kurtosis values ​​(FOM)
cumsum Cumulative and sample values
cummin , cummax Accumulated maximum and minimum sample values ​​accumulated
cumprod The cumulative value of the product sample
diff Calculating a first difference (useful time-series)
pct_change Calculate the percentage change
Published 35 original articles · won praise 74 · views 6771

Guess you like

Origin blog.csdn.net/weixin_45939019/article/details/104196337