pandas小技巧之求行最大值及其索引

本文链接： https://blog.csdn.net/weixin_37536446/article/details/82774659

在平时训练完模型后，需要对模型预测的值做进一步的数据操作，例如在对模型得到类别的概率值按行取最大值，并将最大值所在的列单独放一列。

数据格式如下：

array
array([[ 0.47288769,  0.23982215,  0.2261405 ,  0.06114962],
       [ 0.67969596,  0.11435176,  0.17647322,  0.02947907],
       [ 0.00621393,  0.01652142,  0.31117165,  0.66609299],
       [ 0.24093366,  0.23636758,  0.30113828,  0.22156043],
       [ 0.44093642,  0.2245989 ,  0.24515967,  0.08930501],
       [ 0.05540339,  0.10013942,  0.30361843,  0.54083872],
       [ 0.11221886,  0.75674808,  0.09237131,  0.03866173],
       [ 0.24885316,  0.28243011,  0.28312165,  0.18559511],
       [ 0.01205211,  0.03740638,  0.271065  ,  0.67947656]], dtype=float32)

想在想实现的功能是在上述DataFrame后面增加两列：一列是最大值，一列是最大值所在的行索引。

首先先来了解一下argmax函数。

argmax(a, axis=None)

# a 表示DataFrame

# axis 表示指定的轴，默认是None，表示把array平铺，等于1表示按行，等于0表示按列。

对于DataFrame来说，求解过程如下：

代码如下：

#导入库
import pandas as pd
import numpy as np
#将array转化为DataFrame
arr=pd.DataFrame(array,columns=["one","two","three","four"])
#分别求行最大值及最大值所在索引
arr['max_value']=arr.max(axis=1)
arr['max_index']=np.argmax(array,axis=1)
#得出如下结果：
arr
Out[28]: 
        one       two     three      four  max_index  max_value
0  0.472888  0.239822  0.226140  0.061150          0   0.472888
1  0.679696  0.114352  0.176473  0.029479          0   0.679696
2  0.006214  0.016521  0.311172  0.666093          3   3.000000
3  0.240934  0.236368  0.301138  0.221560          2   2.000000
4  0.440936  0.224599  0.245160  0.089305          0   0.440936
5  0.055403  0.100139  0.303618  0.540839          3   3.000000
6  0.112219  0.756748  0.092371  0.038662          1   1.000000
7  0.248853  0.282430  0.283122  0.185595          2   2.000000
8  0.012052  0.037406  0.271065  0.679477          3   3.000000

假如现在要找出行第二大的值及其索引时，该怎么操作呢：

解决思路：可以将行的最大值置为0，然后在寻找每行的最大值及其索引。

具体代码实现过程如下：

#将最大值置为0
array[arr.index,np.argmax(array,axis=1)]=0
array
array([[ 0.        ,  0.23982215,  0.2261405 ,  0.06114962],
       [ 0.        ,  0.11435176,  0.17647322,  0.02947907],
       [ 0.00621393,  0.01652142,  0.31117165,  0.        ],
       [ 0.24093366,  0.23636758,  0.        ,  0.22156043],
       [ 0.        ,  0.2245989 ,  0.24515967,  0.08930501],
       [ 0.05540339,  0.10013942,  0.30361843,  0.        ],
       [ 0.11221886,  0.        ,  0.09237131,  0.03866173],
       [ 0.24885316,  0.28243011,  0.        ,  0.18559511],
       [ 0.01205211,  0.03740638,  0.271065  ,  0.        ]], dtype=float32)
#取出第二大值及其索引
arr['second_value']=array.max(axis=1)
arr['second_index']=np.argmax(array,axis=1)
arr
Out[208]: 
        one       two     three      four  max_value  max_index  second_value  \
0  0.472888  0.239822  0.226140  0.061150   0.472888          0      0.239822   
1  0.679696  0.114352  0.176473  0.029479   0.679696          0      0.176473   
2  0.006214  0.016521  0.311172  0.666093   0.666093          3      0.311172   
3  0.240934  0.236368  0.301138  0.221560   0.301138          2      0.240934   
4  0.440936  0.224599  0.245160  0.089305   0.440936          0      0.245160   
5  0.055403  0.100139  0.303618  0.540839   0.540839          3      0.303618   
6  0.112219  0.756748  0.092371  0.038662   0.756748          1      0.112219   
7  0.248853  0.282430  0.283122  0.185595   0.283122          2      0.282430   
8  0.012052  0.037406  0.271065  0.679477   0.679477          3      0.271065   

   second_index  
0             1  
1             2  
2             2  
3             0  
4             2  
5             2  
6             0  
7             1  
8             2

如有更好的方法，可以相互沟通~

pandas小技巧之求行最大值及其索引

猜你喜欢