Summary of some uses of pandas
Article directory
I often use pandas recently, and found some more practical methods, record them.
1. rename to replace header
a = pd.DataFrame({
'a':[1,2 ,3 ], 'b':[2, 2, 2]}, index=range(3))
a
a b
0 1 2
1 2 2
2 3 2
columns = {
'a': '表头1', 'b': '表头2'}
b = a.rename(columns=columns)
b
表头1 表头2
0 1 2
1 2 2
2 3 2
In this way, the header is directly replaced, and it is more convenient to organize.
2.replace replaces the corresponding value
b.replace(to_replace=2, value="替换下")
表头1 表头2
0 1 替换下
1 替换下 替换下
2 3 替换下
3. The queryset of django's orm query is converted into a dataframe
Here xxx
represents a certain table, by values()
converting to a dictionary.
a = pd.DataFrame(list(xxxx.objects.filter(xxx=xxx).values()))
# 这样就会输出dataframe了
4. Use datacompy to compare two dataframes
At the beginning, I planned to use pandas compare
to make two dataframes for comparison, but I found it a bit tricky when using it. The index and the number of rows must be the same, and it is difficult to compare the values of a certain row and column.
It is recommended to use datacompy
, this third-party library is not bad, for the same table header, you can quickly compare A
two B
dataframes.
install commandpip install datacompy
diff_data = datacompy.Compare(A, B, join_columns=["a"])
# 可以打印比对报告
diff_data.report()
# 输出 A - B的差集
diff_data.df1_unq_rows
Example:
A = pd.DataFrame({
'a':[1,2 ,3, 8], 'b':[2, 2, 2, 2]}, index=range(4))
B = pd.DataFrame({
'a':[1,2 ,3, 4], 'b':[2, 2, 2, 5]}, index=range(4))
diff_data = datacompy.Compare(A, B, join_columns=['a'])
print(diff_data.report())
DataComPy Comparison
--------------------
DataFrame Summary
-----------------
DataFrame Columns Rows
0 df1 2 4
1 df2 2 4
Column Summary
--------------
Number of columns in common: 2
Number of columns in df1 but not in df2: 0
Number of columns in df2 but not in df1: 0
Row Summary
-----------
Matched on: a
Any duplicates on match values: No
Absolute Tolerance: 0
Relative Tolerance: 0
Number of rows in common: 3
Number of rows in df1 but not in df2: 1
Number of rows in df2 but not in df1: 1
Number of rows with some compared columns unequal: 0
Number of rows with all compared columns equal: 3
Column Comparison
-----------------
Number of columns compared with some values unequal: 0
Number of columns compared with all values equal: 2
Total number of values which compare unequal: 0
Sample Rows Only in df1 (First 10 Columns)
------------------------------------------
a b
3 8 2.0
Sample Rows Only in df2 (First 10 Columns)
------------------------------------------
a b
4 4 5.0
print(diff_data.df1_unq_rows)
a b
3 8 2.0
print(diff_data.df2_unq_rows)
a b
4 4 5.0