Ten minutes understand pandas

Ten minutes to master Pandas (on) - API from the official website

A, numpy and pandas

numpy matrix calculation library is, pandas are data analysis library, on Baidu Encyclopedia, there is an introduction to the pandas.

pandas NumPy is a tool, the tool to solve data analysis tasks created based on. Pandas included a large library and some standard data model provides the tools needed to efficiently operate large data sets. pandas provides a number of functions and methods enable us to quickly and easily handle the data. You will soon find that it is one of the important factors that make Python become a powerful and efficient data analysis environment.

Second, the data type

numpy pandas
ndArray dimensional matrix corresponding to n Series (similar to the one-dimensional array, or the value kv)
Only ndArray one of ndArray but there are many data types in numpy DataFrame (read data with csv DataFrame)

Second, the API official website

2.1.Object craetion

Because while pandas based on numpy development, so we want to introduce pandas were introduced numpy

import numpy as np
import pandas as pd

We create an integer index Series

s = pd.Series([1,3,5,np.nan,6,8])
print(s)

Create a DataFrame type, using an array of NumPy, index row, columns is the column

df = pd.DataFrame(np.random.randn(6,4),index=dates,columns=list('ABCD'))
print(df)

Here is another way to create

df2 = pd.DataFrame({
    'A':1.,
    'B':pd.Timestamp('20130102'),
    'C':pd.Series(1,index=list(range(4)),dtype='float32'),
    'D':np.array([3]*4,dtype='int32'),
    'E':pd.Categorical(["test","train","test","train"]),
    'F':'foo'
})
print(df2)

Our view from the beginning, or view from tail

df.head()
df.tail(3)

Series use to_numpy () format is converted to ndArray

df.to_numpy()

Our DataFrame format, you can use to_numpy () to convert

df2.to_numpy()

Use df.describe view DataFrame properties

df.describe()

A T attribute can view DataFrame transpose

df.T

Use sort_index can be sorted by row index, designated 0 axis dimension sorted by column, a designated axis dimension sorted by row, as the reverse order Ascending to False, True positive sequence sorting

df.sort_index(axis=1,ascending=False)

Sorted by value

df.sort_values(by='B')
2.2.Selection

Note that our standard Python / Numpy express option is to see if the huge amount of data we will use .at, .iat, .loc and .iloc indirectly acquire data

Getting

Direct access to a column

df['A']

Slice acquisition

df[0:3]

Selection by label

Gets the label row

df.loc[dates[0]]
2.3.Select by position

Either obtained from the index position

df.iloc[3]

Location can still ranks way or slice

2.4.Boolean indexing

Boolean selector

df[df.A>0]

We can be filtered by pandas table isin () method

df2=df.copy()//拷贝
df2['E']=['one','two','three','four','five']//插入新列
df2[df2['E'].isin('two','three')] //进行选择过滤
2.5.Setting

Setting a new column set

Our column is equivalent to a Series format, now we find Pandas line is equivalent to a two-dimensional Series encapsulation

s1 = pd.Series([1,2,3,4],index=pd.date_range('20130102',periods=4))
df['F']=s1

Tab setting values

df.at[dates[0],'A']=0

To locate a value

df.iat[0,1]=0
2.6.Missing data

Missing data values, PANDAS np.nan originally used for representing missing values, such as can not be calculated, it may be used instead of Nan

reindex reconstruction

df1 = df.reindex(index=dates[0:4], columns=list(df.columns) + ['E'])

Remove rows of data loss

df1.dropna(how='any')

Fill in missing data

df1.fillna(value=5)

bool determine whether na

df1.isna(df1)
2.7.operations

Averaging, axis is set to 0, according to rows or columns averaging

df.mean(0)
df.mean(1)

Removing the first two values, are sequentially performed twice next shift, to remove the first four values

s=pd.Series([1,3,5,np.nan,6,8],index=dates)
s=s.shift(2)
s=s.shift(2)
s

Guess you like

Origin www.cnblogs.com/littlepage/p/11976815.html