AI Made Easy: How many classification predictions? What are their differences and connections? with code

In this article, we'll explore three important classification algorithms in the field of machine learning: binary classification, multiclass classification, and multilabel classification. We will look at their respective definitions, applications, similarities and differences. Finally, we'll dig into some Python examples to get hands-on experience.

What is classification in machine learning?

Classification is a supervised learning technique used in machine learning to classify data into classes or groups. The goal is to teach a model to predict the class of an object based on its characteristics. There are three types of classification problems - binary, multiclass, and multilabel.

Binary classification

Binary classification is the simplest form of classification, where our goal is to predict one of two possible classes. For example, we might want to predict whether an email is spam or not spam, or whether a tumor is malignant or benign.

multi-class classification

In multi-class classification, we have more than two classes and each instance belongs to only one class. Examples include recognizing handwritten digits (0-9), recognizing different kinds of animals, or classifying news articles into different categories (sports, politics, etc.).

Multi-label classification (multi-label classification)

Multi-label classification differs from the other two types because each instance can belong to more than one class. An example could be music genre classification, where a song might belong to multiple genres, such as pop, rock, and electronic.

Now that we have a basic understanding of the three categories of classification problems, we will further discuss their similarities and differences.

similarities

All three types of classification share a common goal of predicting one or more classes of an instance based on specific features.

In all these cases, we can use popular algorithms like logistic regression, support vector machines, decision trees, and random forests while tuning hyperparameters to achieve the best results.
Performance metrics such as accuracy, precision, recall and F1-score are used for all three types of classification tasks to determine the effectiveness of the selected model.

difference

Data representation:

In binary classification, the target variable is usually a one-dimensional array containing 0 or 1 (or -1 and 1) representing the two possible classes.

In multiclass classification, the target variable is still a one-dimensional array, but it contains integer values ​​(0 to n-1, where n is the number of classes) representing multiple classes.

In multi-label classification, the target variable is a two-dimensional array where each row contains a binary vector representing the presence or absence of each class for that instance.

algorithm:

For binary classification, algorithms such as logistic regression can be used directly without any changes.

For multiclass classification, some algorithms such as logistic regression need to be adjusted to handle multiple classes, for example, by using "one-vs-rest" (OVR) or "one-vs-one" (OVO) strategies, or by combining direct Cross-entropy loss for multi-class classification.

Multi-label classification often requires adaptation, such as using OneVsRestClassifier, which essentially treats the problem as multiple independent binary classification tasks, where each classifier predicts the presence or absence of a particular class.

Loss function:

In binary classification, we often use binary cross-entropy loss, which measures the difference between the true and predicted probabilities for a single target class.

For multi-class classification, we use categorical cross-entropy loss, which measures the difference between the true and predicted probabilities for each (mutually exclusive) class.

Multi-label classification typically uses a binary cross-entropy loss for each class independently (as in binary classification), in a sense combining losses from separate binary classification tasks.

Now that we've explored the similarities and differences, let's dig into some Python examples.

# 导入必要的库
import numpy as np 
import pandas as pd 
from sklearn.datasets import load_breast_cancer, load_digits 
from sklearn.model_selection import train_test_split 
from sklearn.linear_model import LogisticRegression 
from sklearn.multiclass import OneVsRestClassifier 
from sklearn.metrics import accuracy_score, classification_report

Binary Classification Example

In this example, we will use the Wisconsin breast cancer dataset, which is a binary classification problem where we can predict whether a tumor is malignant or benign.

# Load the breast cancer dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the logistic regression model
binary_model = LogisticRegression(max_iter=1000)
binary_model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = binary_model.predict(X_test)

# Calculate the accuracy and print the report
binary_accuracy = accuracy_score(y_test, y_pred)
print("Binary Classification Accuracy:", binary_accuracy)
print(classification_report(y_test, y_pred))

Multi-Class Classification Example

For the multiclass classification problem, we will use the MNIST digits dataset, where each instance is a handwritten digit (0-9).

# 加载数字数据集
data = load_digits() 
X = data.data 
y = data.target

# 将数据拆分为训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size= 0.2 , random_state= 42 )

# 训练逻辑回归模型
multi_class_model = LogisticRegression(max_iter= 1000 , multi_class= "ovr" ) 
multi_class_model.fit(X_train, y_train)

# 对测试集进行预测
y_pred = multi_class_model.predict(X_test)

# 计算准确率并打印报告
multi_class_accuracy = accuracy_score(y_test, y_pred) 
print ( "Multi-Class Classification Accuracy:" , multi_class_accuracy) 
print (classification_report(y_test, y_pred))

Multi-label classification example

For this example, let's consider a hypothetical dataset with three labels and four features.

# 创建一个假设数据集
np.random.seed( 42 ) 
X = np.random.randn( 100 , 4 ) 
y = np.random.randint( 0 , 2 , ( 100 , 3 ))

# 将数据拆分为训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size= 0.2 , random_state= 42 )

# 使用 OneVsRestClassifier 训练逻辑回归模型
multi_label_model = OneVsRestClassifier(LogisticRegression(max_iter= 1000 )) 
multi_label_model.fit(X_train, y_train)

# 对测试集进行预测
y_pred = multi_label_model.predict(X_test)

# 计算准确率并打印报告
multi_label_accuracy = accuracy_score(y_test, y_pred) 
print ( "Multi-Label Classification Accuracy:" , multi_label_accuracy) 
print (classification_report(y_test, y_pred))

To summarize, we explored three types of classification problems: binary classification, multiclass classification, and multilabel classification, and demonstrated how to implement each using logistic regression using the Scikit-Learn library. We also discuss their similarities and differences, and gain valuable insight into their inner workings. Remember that for each problem, you can also try various other algorithms and fine-tune the model for better performance. Understanding these classification tasks is critical to building powerful machine learning models.

English original

read english

recommend

No public

AI Good Book Recommendation

AI is changing with each passing day, but a high-rise building cannot be separated from a good foundation. Are you interested in learning about the principles and practice of artificial intelligence? Look no further! Our book on AI principles and practices is the perfect resource for anyone looking to gain insight into the world of AI. Written by leading experts in the field, this comprehensive guide covers everything from the basics of machine learning to advanced techniques for building intelligent systems. Whether you are a beginner or an experienced AI practitioner, this book has you covered. So why wait?

The principles and practices of artificial intelligence comprehensively cover the classics of various important systems of artificial intelligence and data science

Peking University Press, Principles and Practice of Artificial Intelligence Artificial intelligence and data science from entry to proficiency Detailed explanation of machine learning deep learning algorithm principles

Guess you like

Origin blog.csdn.net/robot_learner/article/details/131151238