ML's sklearn: a detailed guide to the explanation of the commonly used function parameters (such as confusion_matrix, etc.) in sklearn.metrics and their usage instructions

ML's sklearn: a detailed guide to the explanation of the commonly used function parameters (such as confusion_matrix, etc.) in sklearn.metrics and their usage instructions

 

 

 

table of Contents

Commonly used function parameters in sklearn.metrics

confusion_matrix


 

 

Recommended article
ML: Introduction to evaluation indicators (ER/Confusion matrix PR-F1/ROC-AUC/RP/mAP), usage, code implementation, and case applications in classification prediction problems.
CNN performance indicators: Convolutional neural network Introduction to commonly used performance indicators (IOU/AP/mAP, confusion matrix) and detailed strategy of how to use them

Commonly used function parameters in sklearn.metrics

Explanation of confusion_matrix function

Return value : confusion matrix. The entries in the i-th row and j-th column represent the number of samples whose true label is the i-th type and the predicted label is the j-th type.

                                             prediction
                       0                     1
real     0    
    1    

 

def confusion_matrix Found at: sklearn.metrics._classification

@_deprecate_positional_args
def confusion_matrix(y_true, y_pred, *, labels=None, sample_weight=None,  normalize=None):
    """Compute confusion matrix to evaluate the accuracy of a classification.
    
    By definition a confusion matrix :math:`C` is such that :math:`C_{i, j}` is equal to the number of observations known to be in group :math:`i` and predicted to be in group :math:`j`.
    
    Thus in binary classification, the count of true negatives is
    :math:`C_{0,0}`, false negatives is :math:`C_{1,0}`, true positives is
    :math:`C_{1,1}` and false positives is :math:`C_{0,1}`.
    
    Read more in the :ref:`User Guide <confusion_matrix>`.
    
    Parameters
    ----------
    y_true : array-like of shape (n_samples,) Ground truth (correct) target values.
    y_pred : array-like of shape (n_samples,) Estimated targets as returned by a classifier.
    labels : array-like of shape (n_classes), default=None.  List of labels to index the matrix. This may be used to reorder
    or select a subset of labels.  If ``None`` is given, those that appear at least once in ``y_true`` or ``y_pred`` are used in sorted order.
    
    sample_weight : array-like of shape (n_samples,), default=None. Sample weights.
    
    .. versionadded:: 0.18
    
    normalize : {'true', 'pred', 'all'}, default=None. Normalizes confusion matrix over the true (rows), predicted (columns)
    conditions or all the population. If None, confusion matrix will not be normalized.
    
    Returns
    -------
    C : ndarray of shape (n_classes, n_classes)
    Confusion matrix whose i-th row and j-th column entry indicates the number of samples with true label being i-th class and prediced label being j-th class.
    
    References
    ----------
    .. [1] `Wikipedia entry for the Confusion matrix <https://en.wikipedia.org/wiki/Confusion_matrix>`_  (Wikipedia and other references may use a different convention for axes)

Def confusion_matrix found in: sklear.metrics._classification

@_deprecate_positional_args
defconfusion_matrix (y_true, y_pred, *, label =None, sample_weight=None, normalize=None):
Calculate the confusion matrix to evaluate the accuracy of classification .

By definition, a confusion matrix: math: 'C' looks like this: math: 'C_{i, j}' is equal to the number of observations known in: math: 'i' and predicted in: math: 'j' .

Therefore, in the binary classification method, the number of true negatives is
    : math:`C_{0,0}`, false negatives is :math:`C_{1,0}`, true positives is
    :math:`C_{ 1,1}` and false positives is :math:`C_{0,1}`. For

more information, see: ref: 'User Guide <confusion_matrix>'. </confusion_matrix>

Parameters
----------
y_true: the shape of the class array (n_samples,) Ground truth (correct) target value.
y_pred: The class array shape of the estimated target returned by the classifier (n_samples,).
Label: class array shape (n_classes), default=none. The label list of the index matrix. This can be used to reorder
Or select a subset of tags. If '' None '' is given, the value that appears at least once in '' y_true '' or '' y_pred '' will be used in sorted order.

sample_weight: array-like shape (n_samples,), default=None. Sample weight.

. .versionadded:: 0.18

{'true','pred','all'}, default=None. Normalize the confusion matrix of the real (row) and predicted (column)
conditions or all populations. If not, the confusion matrix will not be standardized.

Return
-------
C: Shaped ndarray (n_classes, n_classes)
The i-th row and j-th column items indicate that the number of true label samples is the i-th category, and the number of predicate label samples is the confusion matrix of the j-th category.
    
quote
----------
.. [1]'Wikipedia entry for confusion matrix<https: en.wikipedia.org="" wiki="" confusion_matrix="" > '_ (Wikipedia and other references may use different conventions for axes)</https:>

  Examples
    --------
    >>> from sklearn.metrics import confusion_matrix
    >>> y_true = [2, 0, 2, 2, 0, 1]
    >>> y_pred = [0, 0, 2, 2, 0, 2]
    >>> confusion_matrix(y_true, y_pred)
    array([[2, 0, 0],
    [0, 0, 1],
    [1, 0, 2]])
    
    >>> y_true = ["cat", "ant", "cat", "cat", "ant", "bird"]
    >>> y_pred = ["ant", "ant", "cat", "cat", "ant", "cat"]
    >>> confusion_matrix(y_true, y_pred, labels=["ant", "bird", "cat"])
    array([[2, 0, 0],
    [0, 0, 1],
    [1, 0, 2]])
    
    In the binary case, we can extract true positives, etc as follows:
    
    >>> tn, fp, fn, tp = confusion_matrix([0, 1, 0, 1], [1, 1, 1, 0]).ravel()
    >>> (tn, fp, fn, tp)
    (0, 2, 1, 1)
 
    """
    y_type, y_true, y_pred = _check_targets(y_true, y_pred)
    if y_type not in ("binary", "multiclass"):
        raise ValueError("%s is not supported" % y_type)
    if labels is None:
        labels = unique_labels(y_true, y_pred)
    else:
        labels = np.asarray(labels)
        n_labels = labels.size
        if n_labels == 0:
            raise ValueError("'labels' should contains at least one label.")
        elif y_true.size == 0:
            return np.zeros((n_labels, n_labels), dtype=np.int)
        elif np.all([l not in y_true for l in labels]):
            raise ValueError("At least one label specified must be in y_true")
    if sample_weight is None:
        sample_weight = np.ones(y_true.shape[0], dtype=np.int64)
    else:
        sample_weight = np.asarray(sample_weight)
    check_consistent_length(y_true, y_pred, sample_weight)
    if normalize not in ['true', 'pred', 'all', None]:
        raise ValueError("normalize must be one of {'true', 'pred', "
            "'all', None}")
    n_labels = labels.size
    label_to_ind = {y:x for x, y in enumerate(labels)}
    # convert yt, yp into index
    y_pred = np.array([label_to_ind.get(x, n_labels + 1) for x in y_pred])
    y_true = np.array([label_to_ind.get(x, n_labels + 1) for x in y_true])
    # intersect y_pred, y_true with labels, eliminate items not in labels
    ind = np.logical_and(y_pred < n_labels, y_true < n_labels)
    y_pred = y_pred[ind]
    y_true = y_true[ind] # also eliminate weights of eliminated items
    sample_weight = sample_weight[ind]
    # Choose the accumulator dtype to always have high precision
    if sample_weight.dtype.kind in {'i', 'u', 'b'}:
        dtype = np.int64
    else:
        dtype = np.float64
    cm = coo_matrix((sample_weight, (y_true, y_pred)), shape=(n_labels, 
     n_labels), dtype=dtype).toarray()
    with np.errstate(all='ignore'):
        if normalize == 'true':
            cm = cm / cm.sum(axis=1, keepdims=True)
        elif normalize == 'pred':
            cm = cm / cm.sum(axis=0, keepdims=True)
        elif normalize == 'all':
            cm = cm / cm.sum()
        cm = np.nan_to_num(cm)
    return cm
 

 

 

 

 

 

 

 

 

 

 

 

 

Guess you like

Origin blog.csdn.net/qq_41185868/article/details/108765521