Pandas处理数据基本操作汇总

版权声明:写博客是为了督促自己学习以及拯救微弱的记忆力,欢迎转载(转载请注明出处),讨论!若有不当之处,请速速联系我!还有,请不要吝啬你的赞哦! https://blog.csdn.net/qq_38150441/article/details/88966776

本文主要介绍pandas的一些基本操作,也是用的比较频繁是操作。主要分为以下几点:

1. pandas查看数据类型等

2. pandas构建一个完整的dataframe

3. pandas取行列的3种方法

4. Pandas实现where filter以及逻辑语句

5. pandas对where条件筛选后只有一行的dataframe取其中某一列的值

6. pandas数据去重

7.pandas之group操作

8.pandas修改列的别名

9.pandas进行全局修改数据,局部修改数据

10.pandas合并df

11.pandas进行数据排序

12.pandas取前k个值

13.pandas自定义行数

具体代码请到github获取(欢迎follow, star, fork):

部分代码如下:

import pandas as pd
import numpy as np

df = pd.DataFrame({'total_bill': [16.99, 10.34, 23.68, 23.68, 24.59],
                   'tip': [1.01, 1.66, 3.50, 3.31, 3.61],
                   'sex': ['Female', 'Male', 'Male', 'Male', 'Female']})
df2 = pd.DataFrame({'total_bill': [16.99, 10.34, 23.68, 23.68, 24.59],
                   'tip': [1.01, 1.66, 3.50, 3.31, 3.61],
                   'sex': ['Female', 'Male', 'Male', 'Male', 'Female']})
df3 = pd.DataFrame({'total_bill': [16, 10, 23, 23, 24],
                   'tip': [1.1, 1.6, 3.5, 3.3, 3.6],
                   'sex': ['Female', 'Male', 'Male', 'Male', 'Female']})
'''查看数据类型等'''
print(df)
print(df.dtypes)
print(df.index)
print(df.columns)


'''构建一个完整的dataframe'''
da = pd.DataFrame([[1,2,3],[1,3,4],[2,4,3]],index = ['one','two','three'],columns = ['A','B','C'])
print(da)
# va可以看成是个列表
va = df.values
print(va)


'''取行列的3种方法'''
# loc,基于列label,可选取特定行(根据行index);
# iloc,基于行/列的position;
# ix,为loc与iloc的混合体,既支持label也支持position;
print(df.loc[1:3, ['total_bill', 'tip']])  # 1,3是行的label
print(df.iloc[1:3, [1, 2]])  # 1,3是行的position
print(df.ix[1:3, [1, 2]])


'''Pandas实现where filter以及逻辑语句'''
print(df['sex'] == 'Female')  # 返回布尔值
print(df[df['sex'] == 'Female'])
print(df[df['total_bill'] > 20])
# and
print(df[(df['sex'] == 'Female') & (df['total_bill'] > 20)])
# or
print(df[(df['sex'] == 'Female') | (df['total_bill'] > 20)])
# in
print(df[df['total_bill'].isin([21.01, 23.68, 24.59])])
# not
print(df[-(df['sex'] == 'Male')])  # 用个负号表示“非”
print(df[-df['total_bill'].isin([21.01, 23.68, 24.59])])


'''对where条件筛选后只有一行的dataframe取其中某一列的值'''
print(df[df['tip'] == 1.66])
print(df.loc[df['tip'] == 1.66])
total = df.loc[df['tip'] == 1.66, 'total_bill'].values
print(total[0])


'''数据去重'''
# drop_duplicates根据某列对dataframe进行去重
# subset,为选定的列做distinct,默认为所有列
# keep,值选项{'first', 'last', False},保留重复元素中的第一个、最后一个,或全部删除
# inplace ,默认为False,返回一个新的dataframe;若为True,则返回去重后的原dataframe
print('-------------')
# print(df.drop_duplicates(subset=['sex'], keep='first', inplace=True))
print('-------------')
print(df)
print('-------------')
print(df.drop_duplicates(subset=['sex'], keep='first'))

猜你喜欢

转载自blog.csdn.net/qq_38150441/article/details/88966776