Data analysis using Python --Pandas (1)

Series

Series of list properties

Series pandas is built a similar one-dimensional array of objects, having a plurality of data columns list properties, such as slices. Tags also has associated therewith one-column index.

1
2
3
4
5
6
7
8
9
input:obj = Series([11, 22, 33, 44])
input:
obj = Series([11, 22, 33, 44],index=[1,2,3,4])
obj
output:
1 11
2 22
3 33
4 44

Left the index index, the value of the right values. When no index is specified, the default index is incremented from zero. Which can be obtained by the array index values ​​and attributes manifestations, such as obj.index.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
input: 
obj[1]
output:
11
input:
li = [1,2,3]
obj[li]
output:
1 11
2 22
3 33
input:
obj[[True,True,False,True]]
output:
1 11
2 22
4 44
input:
obj>23
output:
1 False
2 False
3 True
4 True

Series can be selected individually or in a group of Series values ​​indexed by the way that can pass the array index value, the boolean array can also pass a length equal to the Series.

NumPy array operations support, and retain the link between the index and value.

Series of properties dict

1
2
3
4
5
6
7
8
9
10
11
12
13
input: '22' in obj
output: True
input:
di = {
1:101,
2:202,
3:303
}
Series(di)
output:
1 101
2 202
3 303

Series can also be seen as a fixed-length ordered dictionary, but also can be used to create Series dictionary, dictionary index, ie the corresponding Series of index.

1
2
3
4
5
6
7
8
9
10
11
12

input:
obj = Series([4,7,9,3], index=['b','d','c','a'])
obj2 = Series([3,3,3,3], index=['a','c','d','z'])
obj+obj2
output:
a 6.0
b NaN
c 12.0
d 10.0
z NaN
dtype: float64

在对两个Series进行操作时,只要某个index对应的值缺失一次,即结果为NaN。pandas中的isnull和notnull函数可以检测确实数据。

Series具有name和index.那么属性,在赋值后可以显示出来。

DataFrame

DataFrame是一个表格型的数据结构,含有一组有序的列,每列可以是不同的类型,我的理解是每一列是一个Series,DataFrame的index为所有的列共享并一一对应。

DataFrame既有行索引也有列索引,可以视为由Series组成的字典。

1
2
3
4
5
6
7
8
9
10
11
12
13
14

data = {
'city':['xm','xm','fz'],
'year':[2000,2001,2000],
'pop':[15,16,20]
}
# 嵌套字典创建df时,外键作为列,内键作为行,可以转置改变
data2 = {
'city':{0:'xm',1:'xm'},
'year':{0:2000,1:2001},
'pop': {0:15, 1:16}
}
frame = DataFrame(data)
frame
city year pop
0 xm 2000 15
1 xm 2001 16
2 fz 2000 20
1
2
3
# 手动为df的列指定顺序,如果原df的列不包括某个index值,则这一列都视为缺失,如下debt
frame2=DataFrame(data, columns=['year','city','pop','debt'],index=[1,2,3])
frame2
year city pop debt
1 2000 xm 15 NaN
2 2001 xm 16 NaN
3 2000 fz 20 NaN
1
2
3
4
5
# DataFrame的一列对应一个series
# 返回的series拥有原DF相同的索引
# 即拥有相同引用,而不是拷贝
city=frame2['city']
city
1
2
3
4
1    xm
2 xm
3 fz
Name: city, dtype: object
1
2
3
4
# Column assignments, can pass a single value may be passed to the same length as the array df
Frame [ 'new new' ] = 100
Frame
# Frame [ 'new new'] = [100, 200 is, 300] in order assignment
city year pop new
0 xm 2000 15 100
1 xm 2001 16 100
2 fz 2000 20 100

PS:

python has a garbage collection mechanism, and therefore del delete that reference, not a memory address.

Garden blog article links www.cnblogs.com/shinyruouo/articles/pandas1

Original: Large column  using the Python data analysis --Pandas (1)


Guess you like

Origin www.cnblogs.com/chinatrump/p/11424083.html