异常检测 IsolationForest 返回概率

from sklearn.ensemble import IsolationForest

IsolationForest().fit()
IsolationForest().predict()
IsolationForest().decision_function()

def sigmoid(x):
    return 1.0/(1+np.exp(-x))

print(sigmoid(-3))
print(sigmoid(3))

我们来看predict文档：

    def predict(self, X):
        """
        Predict if a particular sample is an outlier or not.

        Parameters
        ----------
        X : {array-like, sparse matrix} of shape (n_samples, n_features)
            The input samples. Internally, it will be converted to
            ``dtype=np.float32`` and if a sparse matrix is provided
            to a sparse ``csr_matrix``.

        Returns
        -------
        is_inlier : ndarray of shape (n_samples,)
            For each observation, tells whether or not (+1 or -1) it should
            be considered as an inlier according to the fitted model.
        """
        check_is_fitted(self)
        decision_func = self.decision_function(X)
        is_inlier = np.ones_like(decision_func, dtype=int)
        is_inlier[decision_func < 0] = -1
        return is_inlier

返回的是-1、1，显然-1位异常值，定位到源码is_inlier[decision_func < 0] = -1，结果很明显，分数越低，异常的概率越大，decision_function即返回异常分数的函数，sigmoid一下即可。

decision_function文档注释如下：

    def decision_function(self, X):
        """
        Average anomaly score of X of the base classifiers.

        The anomaly score of an input sample is computed as
        the mean anomaly score of the trees in the forest.

        The measure of normality of an observation given a tree is the depth
        of the leaf containing this observation, which is equivalent to
        the number of splittings required to isolate this point. In case of
        several observations n_left in the leaf, the average path length of
        a n_left samples isolation tree is added.

        Parameters
        ----------
        X : {array-like, sparse matrix} of shape (n_samples, n_features)
            The input samples. Internally, it will be converted to
            ``dtype=np.float32`` and if a sparse matrix is provided
            to a sparse ``csr_matrix``.

        Returns
        -------
        scores : ndarray of shape (n_samples,)
            The anomaly score of the input samples.
            The lower, the more abnormal. Negative scores represent outliers,
            positive scores represent inliers.
        """

返回值scores为样本异常得分，越低，越不正常。

异常检测 IsolationForest 返回概率

猜你喜欢