Ten minutes to master Pandas (on) - API from the official website

Ten minutes to master Pandas (on) - API from the official website

In fact, more than 10 minutes, so much, at least one day

A, numpy and pandas

numpy matrix calculation library is, pandas are data analysis library, on Baidu Encyclopedia, there is an introduction to the pandas.

pandas NumPy is a tool, the tool to solve data analysis tasks created based on. Pandas included a large library and some standard data model provides the tools needed to efficiently operate large data sets. pandas provides a number of functions and methods enable us to quickly and easily handle the data. You will soon find that it is one of the important factors that make Python become a powerful and efficient data analysis environment.

 

Second, the data type

numpy pandas
ndArray dimensional matrix corresponding to n Series (similar to the one-dimensional array, or the value kv)
Only ndArray one of ndArray but there are many data types in numpy DataFrame (read data with csv DataFrame)

 

Second, the API official website

Before 1.API, we should learn to import data

csv data set can be imported pandas

import pandas as pd

food_info = pd.read_csv("xxx.csv")
print(type(food_info))
print(food_info)
2.pd.Series get a Series type
Import numpy AS NP 
Import pandas AS PD
# we can find, may be used here numpy, pandas proved constructed based numpy
S = PD. Series ([ . 1, . 3, . 5, NP. NaN3, . 6, . 8])
Print ( s)
Printing 
0 1.0
. 1 3.0
2 5.0
. 3 NaN3
. 4 6.0
. 5 8.0
DTYPE: float64
3. Get Date

Starting from the first date, get six consecutive days

dates = pd.date_range('20130101',periods=6)
dates
Printing 
DatetimeIndex ([ '2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
              '2013-01-05', '2013-01-06 '],
            DTYPE =' datetime64 [NS] ', FREQ =' D ')
4. The first type used dataframe

np.random.randn (6,4) obtaining the data set 4 rows 6

index is the column

columns row

df = pd.DataFrame(np.random.randn(6,4),index=dates,columns=list('ABCD'))
df
Print Results 
A B C D
2013-01-01 0.284681 1.881328 0.310425 -2.527329
2013-01-02 -0.209723 -1.410186 0.865336 .893260
2013-01-03 -0.095578 0.576282 -1.347052 -0.055370
2013-01-04 -1.216527 .423745 -1.110668 - 1.682405
2013-01-05 0.275501 -0.844457 -0.954631 2.312578
2013-01-06 -1.384552 1.539255 -1.499076 -0.916121
5. Review the data type
df2.dtypes
A           float64
B   datetime64[ns]
C           float32
D             int32
E         category
F           object
dtype: object
6. Check the front two rows
df.head(2)
A   B   C   D
2013-01-01 0.284681 1.881328 0.310425 -2.527329
2013-01-02 -0.209723 -1.410186 0.865336 0.893260
7. Check the end of three lines
df.tail(3)
A   B   C   D
2013-01-04 -1.216527 0.423745 -1.110668 -1.682405
2013-01-05 0.275501 -0.844457 -0.954631 2.312578
2013-01-06 -1.384552 1.539255 -1.499076 -0.916121
8. View index (line name)
df.index
DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
              '2013-01-05', '2013-01-06'],
            dtype='datetime64[ns]', freq='D')
9. Check the column name
df.columns
Index(['A', 'B', 'C', 'D'], dtype='object')
10. Array type into numpy
df.to_numpy()
array([[ 0.28468077,  1.8813276 ,  0.31042514, -2.52732926],
      [-0.20972286, -1.41018638, 0.86533622, 0.89325968],
      [-0.09557804, 0.57628249, -1.34705203, -0.05537029],
      [-1.21652719, 0.42374501, -1.11066844, -1.6824053 ],
      [ 0.27550116, -0.84445705, -0.95463056, 2.31257796],
      [-1.38455249, 1.53925476, -1.49907627, -0.91612064]])
11. Review the description
df.describe()
A   B   C   D
count 6.000000 6.000000 6.000000 6.000000
mean -0.391033 0.360994 -0.622611 -0.329231
std 0.733423 1.291526 0.972213 1.763848
min -1.384552 -1.410186 -1.499076 -2.527329
25% -0.964826 -0.527407 -1.287956 -1.490834
50% -0.152650 0.500014 -1.032650 -0.485745
75% 0.182731 1.298512 -0.005839 0.656102
max 0.284681 1.881328 0.865336 2.312578

As follows: Total number, mean, standard deviation, minimum value, the value of 25%, value 50%, 75%, maximum

12. transpose matrix
df.T
    2013-01-01   2013-01-02  2013-01-03  2013-01-04 2013-01-05 2013-01-06 
A 0.284681 -0.209723 -0.095578 -1.216527 0.275501 -1.384552
B 1.881328 -1.410186 0.576282 0.423745 -0.844457 1.539255 the
C 0.310425 0.865336 -1.347052 -1.110668 -0.954631 -1.499076
D -2.527329 0.893260 -0.055370 -1.682405 2.312578 -0.916121
13. index order, according to a dimension (i.e. the abscissa), sorting, in ascending order is false
df.sort_index(axis=1,ascending=False)

D C B A
2013-01-01 -2.527329 0.310425 1.881328 0.284681
2013-01-02 0.893260 0.865336 -1.410186 -0.209723
2013-01-03 -0.055370 -1.347052 0.576282 -0.095578
2013-01-04 -1.682405 -1.110668 0.423745 -1.216527
2013-01-05 2.312578 -0.954631 -0.844457 0.275501
2013-01-06 -0.916121 -1.499076 1.539255 -1.384552
14. transversely sorted B
df.sort_values(by='B')
A   B   C   D
2013-01-02 -0.209723 -1.410186 0.865336 0.893260
2013-01-05 0.275501 -0.844457 -0.954631 2.312578
2013-01-04 -1.216527 0.423745 -1.110668 -1.682405
2013-01-03 -0.095578 0.576282 -1.347052 -0.055370
2013-01-06 -1.384552 1.539255 -1.499076 -0.916121
2013-01-01 0.284681 1.881328 0.310425 -2.527329
15. Get the column labeled 'A' in
df['A']
2013-01-01    0.284681
2013-01-02   -0.209723
2013-01-03   -0.095578
2013-01-04   -1.216527
2013-01-05   0.275501
2013-01-06   -1.384552
Freq: D, Name: A, dtype: float64
16. Get abscissa sections
df[0:3]
A   B   C   D
2013-01-01 0.284681 1.881328 0.310425 -2.527329
2013-01-02 -0.209723 -1.410186 0.865336 0.893260
2013-01-03 -0.095578 0.576282 -1.347052 -0.055370
17. Get abscissa slices (value)
df['20130102':'20130104']
A   B   C   D
2013-01-02 -0.209723 -1.410186 0.865336 0.893260
2013-01-03 -0.095578 0.576282 -1.347052 -0.055370
2013-01-04 -1.216527 0.423745 -1.110668 -1.682405
18. obtaining a first line
df.loc[dates[0]]
A    0.284681
B   1.881328
C   0.310425
D   -2.527329
Name: 2013-01-01 00:00:00, dtype: float64
19. get the line sections
df.loc[:,['A','B']]
A   B
2013-01-01 0.284681 1.881328
2013-01-02 -0.209723 -1.410186
2013-01-03 -0.095578 0.576282
2013-01-04 -1.216527 0.423745
2013-01-05 0.275501 -0.844457
2013-01-06 -1.384552 1.539255
df.loc['20130102':'20130104',['A','B']]
df.loc['20130102',['A','B']]
20. The only element acquisition
df.loc [dates [0], ' A'] // This is the only element 
df.at [dates [0], ' A'] // As above, the official website explains For getting fast access to a scalar, and more quick access to dimension data
21. index acquired by the first line (loc is obtained by the previous value of the line)
df.iloc [3]
A    1.191786
B   -1.384943
C   -1.463160
D   0.527332
Name: 2013-01-04 00:00:00, dtype: float64
22. Gets the index sliced
df.iloc [3: 5,0: 2] 
df.iloc [[1,2,4], [0,2]]
df.iloc [1: 3 ,:]
df.iloc [1,1]
df. IAT [1,1]
df [df.A> 0]
23. A copy assignment
df2 = df.copy()
df2['E'] = ['one','two','three','four','five','six']
df2

The results of more than one E

24. The values ​​determined whether thief
df2[df2['E'].isin(['two','four'])]
Output: 
A B C D E
2013-01-02 .847134 -0.003377 0.353925 0.438065 TWO
2013-01-04 1.191786 -1.384943 -1.463160 0.527332 Four
25. sequences performed according to the date taken 6
s1 = pd.Series([1,2,3,4,5,6],index=pd.date_range('20130102',periods=6))
s1
26. be taken assignment
df['F']=s1
df
A   B   C   D   F
2013-01-01 0.284681 1.881328 0.310425 -2.527329 NaN
2013-01-02 -0.209723 -1.410186 0.865336 0.893260 1.0
2013-01-03 -0.095578 0.576282 -1.347052 -0.055370 2.0
2013-01-04 -1.216527 0.423745 -1.110668 -1.682405 3.0
2013-01-05 0.275501 -0.844457 -0.954631 2.312578 4.0
2013-01-06 -1.384552 1.539255 -1.499076 -0.916121 5.0

 

Guess you like

Origin www.cnblogs.com/littlepage/p/11964838.html