Machine Learning - Classification Classification

Classification, as the name implies, is the task of "categorizing" things into subcategories. But, by machine! If that doesn't sound like much, imagine your computer being able to tell the difference between you and a stranger. Between potatoes and tomatoes. Between A and F grades. Now, that sounds like fun. In machine learning and statistics, classification is the problem of identifying which of a set of classes (subpopulations) a new observation belongs to, based on a training dataset containing the observations and their class memberships.

classification type

There are two categories:

  1. Binary Classification: When we have to classify the given data into 2 different classes. Example - Based on a particular health condition of a person, we have to determine if that person has a certain disease.
  2. Multi-class classification: The number of classes is more than 2. For example - based on data of different species of flowers, we have to determine which species our observation belongs to.
    insert image description here
    Figure: Binary and multiclass classification. Here x1 and x2 are the variables of the predicted class.

How do categories work?

Suppose we have to predict whether a given patient has a certain disease or not based on 3 variables called characteristics.

This means there are two possible outcomes:

  1. The patient suffers from the above diseases. Basically, the result is marked as "yes" or "true".
  2. The patient is healthy. Results marked with "No" or "False".

This is a binary classification problem.
We have a set of observations called the training dataset, which contains sample data with actual classification results. We trained a model on this dataset, called a classifier, and used the model to predict whether a certain patient would have the disease.
So the result now depends on:

  1. How these features are "mapped" to the result
  2. The quality of our dataset. By quality I mean statistical and mathematical quality.
  3. The extent to which our classifier generalizes this relationship between features and outcomes.
  4. The values ​​of x1 and x2.

The following is a generic block diagram for a classification task.
insert image description here

Generalized Classification Block Diagram

  1. X: pre-classified data, in N*M matrix form. N is the number. observations and M is the number of features
  2. y: An Nd vector corresponding to the predicted category for each of the N observations.
  3. Feature extraction: Extract valuable information from the input X using a series of transformations.
  4. ML model: The "classifier" we will train.
  5. y': the label predicted by the classifier.
  6. Quality Metrics: Metrics used to measure model performance.
  7. ML Algorithm: The algorithm used to update the weights w', which updates the model and "learns" iteratively.

Type of classifier (algorithm)

There are various types of classifiers. Some of them are:

  • Linear Classifier: Logistic Regression
  • Tree-Based Classifiers: Decision Tree Classifiers
  • Support Vector Machines
  • Artificial neural networks
  • Bayesian regression
  • Gaussian Naive Bayes Classifier
  • Stochastic Gradient Descent (SGD) Classifier
  • Ensemble methods: Random Forest, AdaBoost, Bagging Classifier, Voting Classifier, ExtraTrees Classifier

A detailed description of these methods is beyond a single article!

Practical Applications of Classification

  • Google's self-driving cars use deep learning-enabled classification techniques that allow them to detect and classify obstacles.
  • Spam filtering is one of the most widespread and recognized uses of classification technology.
  • Detecting health issues, facial recognition, speech recognition, object detection, sentiment analysis all center on classification.

code:

# Python program to perform classification on Iris dataset

# Run this program on your local Python interpreter
# provided you have installed the required libraries

# Importing the required libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn import datasets
from sklearn import svm
from sklearn.tree import DecisionTreeClassifier
from sklearn.naive_bayes import GaussianNB

# import the iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target

# splitting X and y into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
	X, y, test_size=0.3, random_state=1)

# GAUSSIAN NAIVE BAYES
gnb = GaussianNB()
# train the model
gnb.fit(X_train, y_train)
# make predcitions
gnb_pred = gnb.predict(X_test)
# print the accuracy
print("Accuracy of Gaussian Naive Bayes: ", accuracy_score(y_test, gnb_pred))

# DECISION TREE CLASSIFIER
dt = DecisionTreeClassifier(random_state=0)
# train the model
dt.fit(X_train, y_train)
# make predcitions
dt_pred = dt.predict(X_test)
# print the accuracy
print("Accuracy of Decision Tree Classifier: ", accuracy_score(y_test, dt_pred))

# SUPPORT VECTOR MACHINE
svm_clf = svm.SVC(kernel='linear') # Linear Kernel
# train the model
svm_clf.fit(X_train, y_train)
# make predcitions
svm_clf_pred = svm_clf.predict(X_test)
# print the accuracy
print("Accuracy of Support Vector Machine: ",
	accuracy_score(y_test, svm_clf_pred))

おすすめ

転載: blog.csdn.net/weixin_43367756/article/details/126004318