[Visualization of Data Analysis] Mapping and Replace

DataFrame inserts new column, leads to Map

import numpy as np
import pandas as pd
from pandas import Series, DataFrame
# 通过字典创建DataFrame
df1 = DataFrame({'城市':['北京','上海','广州'],'人口':[1000,2000,3000]})
df1
city population
0 Beijing 1000
1 Shanghai 2000
2 Guangzhou 3000
# 给DataFrame增加一列(直接赋值)
# 缺点:要关注顺序
df1['GDP'] = Series([100,200,300])
df1
city population GDP
0 Beijing 1000 100
1 Shanghai 2000 200
2 Guangzhou 3000 300
# 通过城市增加GDPMap
# 优点:无需关注顺序
gdp_map = {'北京':300,'上海':400,'广州':500}
# 也是新一列就是赋值方式是map
df1['GDPMap'] = df1['城市'].map(gdp_map)
df1
city population GDP GDPMap
0 Beijing 1000 100 300
1 Shanghai 2000 200 400
2 Guangzhou 3000 300 500

Column inserted directly by Series (pit: 1 order 2 corresponding to index value change)

# 索引值如果自定义
# 通过字典创建DataFrame
df1 = DataFrame({'城市':['北京','上海','广州'],'人口':[1000,2000,3000]},index=['A','B','C'])
df1
city population
A Beijing 1000
B Shanghai 2000
C Guangzhou 3000
# 问题: 新列值为nan
# 给DataFrame增加一列(直接赋值)
# 缺点:要关注顺序
df1['GDP'] = Series([100,200,300])
df1
city population GDP
A Beijing 1000 NaN
B Shanghai 2000 NaN
C Guangzhou 3000 NaN
# 解决: 指定列值
# 麻烦,因此建议直接用map()字典对应
df1['GDP'] = Series([100,200,300], index=['A','B','C'])
df1
city population GDP
A Beijing 1000 100
B Shanghai 2000 200
C Guangzhou 3000 300

Replace in Series

s1 = Series(np.arange(10))
s1
0    0
1    1
2    2
3    3
4    4
5    5
6    6
7    7
8    8
9    9
dtype: int64
# 替换(字典也行)
s1.replace(1,np.nan)
0    0.0
1    NaN
2    2.0
3    3.0
4    4.0
5    5.0
6    6.0
7    7.0
8    8.0
9    9.0
dtype: float64
# 多数据同时替换
s1.replace([1,2,3],[20,30,40])
0     0
1    20
2    30
3    40
4     4
5     5
6     6
7     7
8     8
9     9
dtype: int64
Published 234 original articles · Like 164 · Visits 140,000+

Guess you like

Origin blog.csdn.net/weixin_43469680/article/details/105607137