pandas的一些使用总结

文章目录

pandas的一些使用总结

最近经常使用pandas，发现了一些比较实用的方法，记录下。

1. rename替换表头

a = pd.DataFrame({
    
    'a':[1,2 ,3 ], 'b':[2, 2, 2]}, index=range(3))
a
   a  b
0  1  2
1  2  2
2  3  2
columns = {
    
    'a': '表头1', 'b': '表头2'}

b = a.rename(columns=columns)
b
   表头1  表头2
0    1    2
1    2    2
2    3    2

这样就直接替换表头了，整理起来比较方便

2.replace 替换对应的值

b.replace(to_replace=2, value="替换下")
   表头1  表头2
0    1  替换下
1  替换下  替换下
2    3  替换下

3. django的orm查询的queryset转换成dataframe

这里xxx表示某个表, 通过values()转换为字典。

a = pd.DataFrame(list(xxxx.objects.filter(xxx=xxx).values()))
# 这样就会输出dataframe了

4. 使用datacompy对两个dataframe做对比

一开始打算用pandas的compare做两个dataframe作比较，但是使用起来发现略坑，要求index，行数必须相同，对比某个行列的值不同，属实难用。

推荐使用datacompy, 这个第三方库还不错，针对表头相同，很快能对A, B两个dataframe进行比较。

安装命令pip install datacompy

diff_data = datacompy.Compare(A, B, join_columns=["a"])
# 可以打印比对报告
diff_data.report()
# 输出 A - B的差集
diff_data.df1_unq_rows

示例：

A = pd.DataFrame({
    
    'a':[1,2 ,3, 8], 'b':[2, 2, 2, 2]}, index=range(4))
B = pd.DataFrame({
    
    'a':[1,2 ,3, 4], 'b':[2, 2, 2, 5]}, index=range(4))
diff_data = datacompy.Compare(A, B, join_columns=['a'])
print(diff_data.report())
DataComPy Comparison
--------------------
DataFrame Summary
-----------------
  DataFrame  Columns  Rows
0       df1        2     4
1       df2        2     4
Column Summary
--------------
Number of columns in common: 2
Number of columns in df1 but not in df2: 0
Number of columns in df2 but not in df1: 0
Row Summary
-----------
Matched on: a
Any duplicates on match values: No
Absolute Tolerance: 0
Relative Tolerance: 0
Number of rows in common: 3
Number of rows in df1 but not in df2: 1
Number of rows in df2 but not in df1: 1
Number of rows with some compared columns unequal: 0
Number of rows with all compared columns equal: 3
Column Comparison
-----------------
Number of columns compared with some values unequal: 0
Number of columns compared with all values equal: 2
Total number of values which compare unequal: 0
Sample Rows Only in df1 (First 10 Columns)
------------------------------------------
   a    b
3  8  2.0
Sample Rows Only in df2 (First 10 Columns)
------------------------------------------
   a    b
4  4  5.0


print(diff_data.df1_unq_rows)
   a    b
3  8  2.0

print(diff_data.df2_unq_rows)
   a    b
4  4  5.0

【pandas】pandas的一些使用总结