pandas basis, Serires, Dataframe

DataFrame

DataFrame data structure of a table-type Pandas in, comprising an ordered set of columns, each column may be a different value types (numbers, strings, Boolean, etc.), DataFrame that is the row index has column index, can be seen as a dictionary composed by the Series.

Series

It is an object similar to a one-dimensional array, by a set of data (NumPy various data types) and a set of data associated with the tag (i.e., index) components. May also be produced by only a simple set of data objects Series

Exercise

import pandas as pd
import numpy as np

In [5]:

Creating a Series object

s1 = pd.Series([4,6,-5,3])

print(s1)

0    4
1    6
2   -5
3    3
dtype: int64

In [8]:

Gets the value of the Series

s1.values#获取值

Out[8]:

array([ 4,  6, -5,  3], dtype=int64)

In [9]:

Gets Series Index

s1.index#获取索引

Out[9]:

RangeIndex(start=0, stop=4, step=1)

In [10]:

Series object specified index creation

s2 = pd.Series([4.0,6.5,212,2.6],index=['a','b','c','d'])#指定索引

In [11]:

print(s2)

a      4.0
b      6.5
c    212.0
d      2.6
dtype: float64

In [12]:

According to the index value of Series

s2["a"]#根据索引取值

Out[12]:

4.0

In [15]:

s2[['c','d']]#取多个索引值

Out[15]:

c    212.0
d      2.6
dtype: float64

In [16]:

Determine whether the index in the Series

'c' in s2#判断索引是否在Series

Out[16]:

True

In [17]:

'e' in s2

Out[17]:

False

In [18]:

series can be seen as a fixed-length ordered dictionary

#series可以看成一个定长的有序字典
dic1 = {"apple":5,"pen":'3',"applenpen":10}
s3 = pd.Series(dic1)
print(s3)#构建后顺序是一定的，不能改变

apple         5
pen           3
applenpen    10
dtype: object

In [20]:

DataFrame construction

#DataFrame 构造
data = {'year':[2015,2016,2017,2018],
       'income':[1000,2000,3000,4000],
       'pay':[100,200,300,400]}
df1 = pd.DataFrame(data)
df1

Out[20]:

	year	income	pay
0	2015	1000	100
1	2016	2000	200
2	2017	3000	300
3	2018	4000	400

In [22]:

Use numpy build dataframe

#使用numpy构建dataframe
df2 = pd.DataFrame(np.arange(12).reshape(3,4))
df2
'''
shape是查看数据有多少行多少列
reshape()是数组array中的方法，作用是将数据重新组织
'''

Out[22]:

	0	1	2	3
0	0	1	2	3
1	4	5	6	7
2	8	9	10	11

In [24]:

Specified index and the head (first column content) Construction dataframe

#指定索引和表头（第一列内容）
df3 = pd.DataFrame(np.arange(12).reshape(3,4),index=['a','b','c'],columns=["金","木","水","火"])
df3

Out[24]:

	gold	wood	water	fire
a	0	1	2	3
b	4	5	6	7
c	8	9	10	11

In [27]:

DataFrame property

#DataFrame的属性
df3.columns#列

#DataFrame的属性
df3.columns#列

Out[35]:

Index(['金', '木', '水', '火'], dtype='object')

In [28]:

Out[28]:

Index(['a', 'b', 'c'], dtype='object')

In [29]

df3.values#值，二位数组形式

Out[29]:

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [30]:

df3.describe

Out[30]:

<bound method NDFrame.describe of    金  木   水   火
a  0  1   2   3
b  4  5   6   7
c  8  9  10  11>

In [31]:

Transpose

#转置
df3.T

Out[31]:

	a	b	c
gold	0	4	8
wood	1	5	9
water	2	6	10
fire	3	7	11

In [32]:

Sequence

#排序
df3.sort_index(axis=1)#axis=1是对列排序

Out[32]:

	wood	water	fire	gold
a	1	2	3	0
b	5	6	7	4
c	9	10	11	8

In [33]:

df3.sort_index(axis=0)#axis=0是对行排序

Out[33]:

	gold	wood	water	fire
a	0	1	2	3
b	4	5	6	7
c	8	9	10	11

In [34]:

#对某一列排序
df3.sort_index(by="金")

c:\users\wuzs\appdata\local\programs\python\python36-32\lib\site-packages\ipykernel_launcher.py:2: FutureWarning: by argument to sort_index is deprecated, please use .sort_values(by=...)

Out[34]:

	gold	wood	water	fire
a	0	1	2	3
b	4	5	6	7
c	8	9	10	11