Deep learning in simple terms

Article directory

Deep learning is a branch of machine learning. Its core idea is to use deep neural networks to model and learn data, so as to achieve tasks such as recognition, classification, and prediction. In the past few years, deep learning technology has achieved many breakthrough results, such as in image recognition, speech recognition, natural language processing, game AI and other fields.

This article will briefly introduce the basic principles of deep learning and demonstrate how to implement a simple neural network model using the TensorFlow library in Python.

a basic principle

The most basic model in deep learning is the neural network (Neural Network), whose structure mimics the human nervous system and contains multiple layers (Layer).

The basic unit of a neural network is a neuron (Neuron), each neuron receives multiple inputs, and outputs through an activation function (Activation Function) after adding weighted sums and bias items.

Multiple neurons can form a level, and neurons between different levels can be connected to form a complete neural network.

Depth in deep learning refers to the number of layers of the neural network. Generally speaking, the more layers, the stronger the expressive ability of the network.

Training the neural network requires the use of the backpropagation algorithm (Backpropagation), through the backpropagation of the error signal, to update the parameters (Weight) and bias items (Bias) in the neural network, so that the output of the model is closer to the real value.

The most commonly used neural network structure in deep learning is Multilayer Perceptron (MLP), which is a network composed of multiple layers of neurons, each layer is connected to each other, where the input layer receives data, and the output layer outputs results. The hidden layer in the middle performs nonlinear transformation and feature extraction on the input data. The training process of MLP usually uses the Backpropagation algorithm (Backpropagation, BP) for parameter optimization.

Two advantages of deep learning

Can learn and extract features autonomously

One of the biggest advantages of deep learning is that it can learn and extract features from data autonomously. Compared with traditional machine learning methods, which require manual feature extraction, deep learning can automatically extract the most relevant features. This has made deep learning a great success in many fields, such as image recognition, natural language processing, etc.

Can handle large-scale data

Deep learning can handle large-scale data, and as the size of the data increases, the performance of deep learning will become better. This makes deep learning very widely used in many fields, such as speech recognition, natural language processing, image recognition, etc.

Can handle non-linear relationships

Traditional machine learning algorithms can usually only deal with linear relationships, but deep learning can deal with nonlinear relationships. This makes deep learning perform well in many fields, such as image recognition, speech recognition, etc.

End-to-end learning is possible

Deep learning can perform end-to-end learning, that is, the entire process from input data to output results can be completed through deep learning. This makes deep learning very suitable for some complex tasks, such as natural language processing, speech recognition, etc.

Three disadvantages of deep learning

high data requirements

The deep learning model requires a large amount of data for training, and the quality of the data also needs to be high. If the quality of the data is not high, such as containing more noise or errors, the effect of deep learning will be greatly affected. In addition, deep learning also has high requirements for data labeling, and inaccurately labeled data may affect the learning effect of the model.

Computational resource requirements are high

Deep learning models usually require a lot of calculations, so they require high computing resources. Training deep learning models on traditional CPUs is often very slow, so hardware accelerators such as GPUs or TPUs are needed to speed up training. In addition, the storage resources required to train the deep learning model are also very large, so a high storage capacity is required.

Model is too complex

Deep learning models are often very complex, with a large number of parameters and layers, making it difficult to understand their inner workings. This makes deep learning models less interpretable and difficult to analyze and debug. In addition, overly complex models are also prone to overfitting, resulting in poor performance on new data.

Less reliance on human knowledge

Deep learning can autonomously extract features from data, thus eliminating the tedious process of manual feature extraction. However, this also makes deep learning models less dependent on human knowledge. This means that deep learning may miss some important features because these features are not obvious in the data. At the same time, deep learning is also susceptible to the bias of the data set itself, which leads to inaccurate prediction results of the model.

Four deep learning applications

Deep learning can be applied in various fields, such as image recognition, natural language processing, speech recognition, etc. In the field of image recognition, deep learning can be used to identify objects in images, thereby helping computers understand image content autonomously. In the field of natural language processing, deep learning can be used for tasks such as automatic translation, question answering, and text generation. In the field of speech recognition, deep learning can be used to recognize human voice commands, thus helping people to interact with computers more conveniently.

handwritten digit recognition

TensorFlow is an open source machine learning library developed by Google that can be used for various machine learning tasks, including deep learning. Its core is a graph (Graph) computing model, users can use TensorFlow to build nodes (Node) and edges (Edge) in the graph, and perform calculations.

In TensorFlow, the neural network model is composed of a series of layers (Layer). Each level contains multiple neurons (Neuron), and the output of each neuron is transformed by an activation function (Activation Function). TensorFlow provides a variety of commonly used activation functions, such as sigmoid, ReLU, tanh, etc.

Handwritten digit recognition is a classic problem in deep learning, which requires the recognition of handwritten images of ten digits from 0-9. In this article, we will use the MNIST dataset, which contains a series of images of handwritten digits that have been labeled, each image is 28x28 pixels in size.

First, we need to import the necessary libraries:

import tensorflow as tf
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt

Then, we need to load the handwritten digit dataset MNIST and preprocess the data:

mnist = keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

train_images = train_images / 255.0
test_images = test_images / 255.0

train_images = train_images.reshape((-1, 784))
test_images = test_images.reshape((-1, 784))

Next, we can define our neural network model:

model = keras.Sequential([
    keras.layers.Dense(128, activation='relu', input_shape=(784,)),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(10, activation='softmax')
])

This model has two hidden layers, the first hidden layer has 128 neurons and uses the ReLU activation function, the second hidden layer uses Dropout to avoid overfitting, and the output layer has 10 neurons and uses the softmax activation function.

Next, we need to compile the model, and train it:

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

history = model.fit(train_images, train_labels, epochs=10, batch_size=64,
                    validation_data=(test_images, test_labels))

We use Adam optimizer, sparse cross-entropy loss function and accuracy as evaluation indicators for model compilation. Then, we use the fit method to train the model, pass the training set and test set to the model, and set 10 epochs and 64 batch size.

Finally, we can use the trained model to predict handwritten digits:

predictions = model.predict(test_images)

print(np.argmax(predictions[:10], axis=1))
print(test_labels[:10])

We use the predict method to make predictions on the test set, and use the argmax function to find the index of the maximum value in the predicted results as the predicted category. Finally, we print the top 10 predictions and their corresponding ground truth labels.

Full code:

import tensorflow as tf
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt

mnist = keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

train_images = train_images / 255.0
test_images = test_images / 255.0

train_images = train_images.reshape((-1, 784))
test_images = test_images.reshape((-1, 784))

model = keras.Sequential([
    keras.layers.Dense(128, activation='relu', input_shape=(784,)),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(10, activation='softmax')
])

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

history = model.fit(train_images, train_labels, epochs=10, batch_size=64,
                    validation_data=(test_images, test_labels))
                    
predictions = model.predict(test_images)

print(np.argmax(predictions[:10], axis=1))
print(test_labels[:10])