A, numpy and pandas
numpy matrix calculation library is, pandas are data analysis library, on Baidu Encyclopedia, there is an introduction to the pandas.
pandas NumPy is a tool, the tool to solve data analysis tasks created based on. Pandas included a large library and some standard data model provides the tools needed to efficiently operate large data sets. pandas provides a number of functions and methods enable us to quickly and easily handle the data. You will soon find that it is one of the important factors that make Python become a powerful and efficient data analysis environment.
Second, the data type
numpy | pandas |
---|---|
ndArray dimensional matrix corresponding to n | Series (similar to the one-dimensional array, or the value kv) |
Only ndArray one of ndArray but there are many data types in numpy | DataFrame (read data with csv DataFrame) |
Second, the API official website
Before 1.API, we should learn to import data
csv data set can be imported pandas
import pandas as pd
food_info = pd.read_csv("xxx.csv")
print(type(food_info))
print(food_info)
2.pd.Series get a Series type
Import numpy AS NP
Import pandas AS PD
# we can find, may be used here numpy, pandas proved constructed based numpy
S = PD. Series ([ . 1, . 3, . 5, NP. NaN3, . 6, . 8])
Print ( s)
Printing
0 1.0
. 1 3.0
2 5.0
. 3 NaN3
. 4 6.0
. 5 8.0
DTYPE: float64
3. Get Date
Starting from the first date, get six consecutive days
dates = pd.date_range('20130101',periods=6)
dates
Printing
DatetimeIndex ([ '2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
'2013-01-05', '2013-01-06 '],
DTYPE =' datetime64 [NS] ', FREQ =' D ')
4. The first type used dataframe
np.random.randn (6,4) obtaining the data set 4 rows 6
index is the column
columns row
df = pd.DataFrame(np.random.randn(6,4),index=dates,columns=list('ABCD'))
df
Print Results
A B C D
2013-01-01 0.284681 1.881328 0.310425 -2.527329
2013-01-02 -0.209723 -1.410186 0.865336 .893260
2013-01-03 -0.095578 0.576282 -1.347052 -0.055370
2013-01-04 -1.216527 .423745 -1.110668 - 1.682405
2013-01-05 0.275501 -0.844457 -0.954631 2.312578
2013-01-06 -1.384552 1.539255 -1.499076 -0.916121
5. Review the data type
df2.dtypes
A float64
B datetime64[ns]
C float32
D int32
E category
F object
dtype: object
6. Check the front two rows
df.head(2)
A B C D
2013-01-01 0.284681 1.881328 0.310425 -2.527329
2013-01-02 -0.209723 -1.410186 0.865336 0.893260
7. Check the end of three lines
df.tail(3)
A B C D
2013-01-04 -1.216527 0.423745 -1.110668 -1.682405
2013-01-05 0.275501 -0.844457 -0.954631 2.312578
2013-01-06 -1.384552 1.539255 -1.499076 -0.916121
8. View index (line name)
df.index
DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
'2013-01-05', '2013-01-06'],
dtype='datetime64[ns]', freq='D')
9. Check the column name
df.columns
Index(['A', 'B', 'C', 'D'], dtype='object')
10. Array type into numpy
df.to_numpy()
array([[ 0.28468077, 1.8813276 , 0.31042514, -2.52732926],
[-0.20972286, -1.41018638, 0.86533622, 0.89325968],
[-0.09557804, 0.57628249, -1.34705203, -0.05537029],
[-1.21652719, 0.42374501, -1.11066844, -1.6824053 ],
[ 0.27550116, -0.84445705, -0.95463056, 2.31257796],
[-1.38455249, 1.53925476, -1.49907627, -0.91612064]])
11. Review the description
df.describe()
A B C D
count 6.000000 6.000000 6.000000 6.000000
mean -0.391033 0.360994 -0.622611 -0.329231
std 0.733423 1.291526 0.972213 1.763848
min -1.384552 -1.410186 -1.499076 -2.527329
25% -0.964826 -0.527407 -1.287956 -1.490834
50% -0.152650 0.500014 -1.032650 -0.485745
75% 0.182731 1.298512 -0.005839 0.656102
max 0.284681 1.881328 0.865336 2.312578
As follows: Total number, mean, standard deviation, minimum value, the value of 25%, value 50%, 75%, maximum
12. transpose matrix
df.T
2013-01-01 2013-01-02 2013-01-03 2013-01-04 2013-01-05 2013-01-06
A 0.284681 -0.209723 -0.095578 -1.216527 0.275501 -1.384552
B 1.881328 -1.410186 0.576282 0.423745 -0.844457 1.539255 the
C 0.310425 0.865336 -1.347052 -1.110668 -0.954631 -1.499076
D -2.527329 0.893260 -0.055370 -1.682405 2.312578 -0.916121
13. index order, according to a dimension (i.e. the abscissa), sorting, in ascending order is false
df.sort_index(axis=1,ascending=False)
D C B A
2013-01-01 -2.527329 0.310425 1.881328 0.284681
2013-01-02 0.893260 0.865336 -1.410186 -0.209723
2013-01-03 -0.055370 -1.347052 0.576282 -0.095578
2013-01-04 -1.682405 -1.110668 0.423745 -1.216527
2013-01-05 2.312578 -0.954631 -0.844457 0.275501
2013-01-06 -0.916121 -1.499076 1.539255 -1.384552
14. transversely sorted B
df.sort_values(by='B')
A B C D
2013-01-02 -0.209723 -1.410186 0.865336 0.893260
2013-01-05 0.275501 -0.844457 -0.954631 2.312578
2013-01-04 -1.216527 0.423745 -1.110668 -1.682405
2013-01-03 -0.095578 0.576282 -1.347052 -0.055370
2013-01-06 -1.384552 1.539255 -1.499076 -0.916121
2013-01-01 0.284681 1.881328 0.310425 -2.527329
15. Get the column labeled 'A' in
df['A']
2013-01-01 0.284681
2013-01-02 -0.209723
2013-01-03 -0.095578
2013-01-04 -1.216527
2013-01-05 0.275501
2013-01-06 -1.384552
Freq: D, Name: A, dtype: float64
16. Get abscissa sections
df[0:3]
A B C D
2013-01-01 0.284681 1.881328 0.310425 -2.527329
2013-01-02 -0.209723 -1.410186 0.865336 0.893260
2013-01-03 -0.095578 0.576282 -1.347052 -0.055370
17. Get abscissa slices (value)
df['20130102':'20130104']
A B C D
2013-01-02 -0.209723 -1.410186 0.865336 0.893260
2013-01-03 -0.095578 0.576282 -1.347052 -0.055370
2013-01-04 -1.216527 0.423745 -1.110668 -1.682405
18. obtaining a first line
df.loc[dates[0]]
A 0.284681
B 1.881328
C 0.310425
D -2.527329
Name: 2013-01-01 00:00:00, dtype: float64
19. get the line sections
df.loc[:,['A','B']]
A B
2013-01-01 0.284681 1.881328
2013-01-02 -0.209723 -1.410186
2013-01-03 -0.095578 0.576282
2013-01-04 -1.216527 0.423745
2013-01-05 0.275501 -0.844457
2013-01-06 -1.384552 1.539255
df.loc['20130102':'20130104',['A','B']]
df.loc['20130102',['A','B']]
20. The only element acquisition
df.loc [dates [0], ' A'] // This is the only element
df.at [dates [0], ' A'] // As above, the official website explains For getting fast access to a scalar, and more quick access to dimension data
21. index acquired by the first line (loc is obtained by the previous value of the line)
df.iloc [3]
A 1.191786
B -1.384943
C -1.463160
D 0.527332
Name: 2013-01-04 00:00:00, dtype: float64
22. Gets the index sliced
df.iloc [3: 5,0: 2]
df.iloc [[1,2,4], [0,2]]
df.iloc [1: 3 ,:]
df.iloc [1,1]
df. IAT [1,1]
df [df.A> 0]
23. A copy assignment
df2 = df.copy()
df2['E'] = ['one','two','three','four','five','six']
df2
The results of more than one E
24. The values determined whether thief
df2[df2['E'].isin(['two','four'])]
Output:
A B C D E
2013-01-02 .847134 -0.003377 0.353925 0.438065 TWO
2013-01-04 1.191786 -1.384943 -1.463160 0.527332 Four
25. sequences performed according to the date taken 6
s1 = pd.Series([1,2,3,4,5,6],index=pd.date_range('20130102',periods=6))
s1
26. be taken assignment
df['F']=s1
df
A B C D F
2013-01-01 0.284681 1.881328 0.310425 -2.527329 NaN
2013-01-02 -0.209723 -1.410186 0.865336 0.893260 1.0
2013-01-03 -0.095578 0.576282 -1.347052 -0.055370 2.0
2013-01-04 -1.216527 0.423745 -1.110668 -1.682405 3.0
2013-01-05 0.275501 -0.844457 -0.954631 2.312578 4.0
2013-01-06 -1.384552 1.539255 -1.499076 -0.916121 5.0