pandas 数据操作

参考

https://blog.csdn.net/xiaodongxiexie/article/details/53108959

https://www.cnblogs.com/chaosimple/p/4153083.html

1. 从 csv 文件中读取数据

pd.read_csv("path", encoding='utf-8')，返回的数据类型是 pd.DataFrame

2. 【创建 DataFrame】

df = pd.DataFrame(np.arange(20).reshape(5,4), columns=['a', 'b', 'c', 'd'], index=['x1', 'x2', 'x3', 'x4', 'x5'])

3. 从 DataFrame 中【取值】 --- 从 DataFrame 中取某个具体的值(第 x 行，第 y 列)

df.iloc[x, y] === df.iat[x, y] === df.ix[x, y] === df.x[y] === df[x][y]

4. 从 DataFrame 中【取列】 --- 从 DataFrame 中取某列(第 x 列，或者名为x的列，返回的是 Series)

df['x'] === df.x

5. 从 DataFrame 中【取行】 --- 从 DataFrame 中取某行(第 x 行，返回的是 Series)

df.iloc[x] === df.ix[x]

6. 从 DataFrame 中【取行】 --- 从 DataFrame 中取第 m 行，x，y，z 列（返回的是 Series）

df.iloc[m, [x, y, z]] === df.ix[m, [x, y, z]]

7. 从 DataFrame 中取【子 DataFrame】 --- 从 DataFrame 中取第 x，y，z 列

df[[x, y, z]]

8. 从 DataFrame 中取【子 DataFrame】 --- 从 DataFrame 中取第 x 到 z 行

df.iloc[x:z+1] == df.ix[x:z+1]

9. 创建 Series

series = pd.Series([i for i in range(0, 3)], index=[i for i in 'abc'])

10. 从 Series 中取值 --- 取列名为 x(str 类型) 或者第 x(int 类型) 列的值

series[x] 或者 series.x

11. Series 转 DataFrame

df = series.to_frame()

12. 由 Series 获取新类型的 Series

series.isin([0, 4]) # 判断series 每一个值是否等于 0/4，返回类型是 Series ，index 不变，值为 True 或者 False

13. 【Series 过滤】

series[series.isin([0, 4])] # 保留等于 0 或者 4 的值，返回类型是 Series，是 series 的子集

14. 【DataFrame 过滤】

df[series.isin([0])] # 特定值过滤，保留s eries 对应的列中值等于0的行，返回 DataFrame

df[df['x'] > 4] === df[series > 4] # 范围过滤，保留 x 列(或者 series 对应的列)中值大于4的行，返回 DataFrame

df[True ^ series.isin([0])] # 过滤掉 series 对应的列中值等于0的行，返回 DataFrame

df[(False ^ df['b'].isin([1])) & (False ^ series.isin([0, 4]))] #保留 series 中值等于 0/4 且 b列中值等 1 的行的数据，返回DataFrame

15. 替换 DataFrame 某些项的值

df[col_name] = df[col_name].replace({'C': 1, 'Q': 2, 'S': 3}) # 把 col_name 列中，值 C 替换为 1，值 Q 替换为2，值 S 替换为3

16. 过滤 DataFrame 中的 NaN 值

df = df[(True ^ df[column_name1].isin(['NaN'])) & (True ^ df[column_name2].isin(['NaN']))]

# 把 column_name1 列中值为 NaN 且 column_name2 列中值为 NaN 的行过滤掉

df = df[df.Age.isnull()] df = df[df.Age.notnull()]

# 把age不为NaN 或者 age 为NaN 的行过滤掉

17.从Series中提取index

RangeIndex = df.column_name.index

list = list(RangeIndex )

18.创建Series

s = pd.Series(list)