Summary of key points of "Introduction to Artificial Intelligence" course

Table of contents

Genetic Algorithm Related Parameter Problems

The influence of population size, number of iterations, crossover rate, and mutation rate on the algorithm

The role of open table and close table in A* algorithm

Why is the open list growing in the A* algorithm

The influence of the value of the heuristic function h(n)

The conditions under which the A* algorithm can find the optimal solution

Components of a pattern recognition system

Difference between supervised and unsupervised learning

Three Principles of Model Evaluation

Precision and Recall

Influence of K value selection in KNN

The difference between regression and classification problems

Steps in Machine Learning for Linear Regression Problems

The meaning of batch and epoch during model training

The difference between the sigmoid function and the ReLU function

neuron model

Forward Propagation and Back Propagation of BP Learning Algorithm

The flow of BP algorithm

The role of convolutional and pooling layers in CNN

Key Techniques in CNN

Where CNN is superior to fully connected neural network

GAN workflow

Genetic Algorithm Related Parameter Problems

Encoding

  • Binary encoding: Represent the solution of the problem as a bit string consisting of 0 and 1, each bit corresponding to a gene. Binary coding is simple and easy to use, but there may be a Hamming cliff problem, that is, there is a large Hamming distance between the binary codes of adjacent integers, making crossover and mutation difficult to cross.
  • Gray coding: the solution of the problem is expressed as a bit string consisting of 0 and 1, but every two adjacent numbers are represented by Gray code, and only one of the corresponding code bits is different, which can improve the local search ability of the algorithm. Gray coding can avoid the Hamming Cliff problem, but requires additional encoding and decoding processes.
  • Real number encoding: the solution of the problem is expressed as a real number or a floating point number, which can directly correspond to the parameter space of the problem, without the need for encoding and decoding processes. Real codes are suitable for dealing with continuous optimization problems, but specific crossover and mutation operators need to be designed.
  • Permutation Encoding: Represent the solution to a problem as a permutation or sequence, with each position corresponding to a gene. Permutation coding is suitable for dealing with combinatorial optimization problems, such as traveling salesman problem, but it also needs to design specific crossover and mutation operators.

Method of choosing

  • Roulette selection method: This is the simplest and most commonly used selection method, which assigns the selection probability according to the ratio of the fitness value of the individual to the total fitness value, that is, the probability that the individual with the higher fitness value is selected bigger. Specific steps are as follows:
    • The fitness value of individuals in the population is superimposed to obtain the total fitness value
    • The fitness value of each individual is divided by the total fitness value to obtain the probability of the individual being selected
    • Calculate the cumulative probability of individuals to construct a roulette wheel
    • Generate a random number between [0,1], and determine the selected individual according to the roulette
  • Tournament selection method: This is a method to determine selected individuals through multiple rounds of comparison, which can effectively control selection pressure and maintain diversity. Specific steps are as follows:
    • Randomly select k individuals from the population, where k is a preset parameter, usually 2 or 3
    • Compare the fitness values ​​of these k individuals and choose the highest one as the winner
    • Copy the winner to the new population and repeat the above steps until the new population is full

cross strategy

  • Single-point crossover: Randomly select a crossover point on the chromosomes of two individuals, and then exchange some genes behind this point.
  • Two-point crossover: randomly select two crossover points on the chromosomes of two individuals, and then exchange some genes between these two points.

mutation strategy

  • Bit-reversal mutation: Randomly select one or more gene bits on an individual's chromosome, and then invert them, that is, 0 becomes 1, and 1 becomes 0.
  • Crossover mutation: Two gene bits are randomly selected on an individual's chromosome and their values ​​are swapped.
  • Insertion mutation: A gene bit is randomly selected on the chromosome of an individual, and then inserted into another random position, and the gene bit behind the original position is moved forward one by one.
  • Reversal mutation: randomly select a substring on an individual's chromosome, and then arrange it in reverse order.
  • Uniform mutation: compare genes bit by bit on an individual's chromosome, and change different genes with a certain probability.

The influence of population size, number of iterations, crossover rate, and mutation rate on the algorithm

  • Population size: The population size is the fixed total number of individuals in each generation, that is, the number of initial solutions. If the population size is too small, it will lead to insufficient search space and fall into a local optimal solution; if the population size is too large, the calculation amount will increase and the convergence speed will slow down. Generally speaking, the population size should be determined according to the complexity of the problem and the size of the search space, and generally takes a value between 10 and 100.
  • Number of iterations: The number of iterations refers to how many generations of evolution the genetic algorithm performs. If the number of iterations is too small, the search will be insufficient and a better solution cannot be found; if the number of iterations is too large, the calculation time will increase and the convergence speed will slow down. Generally speaking, the number of iterations should be determined according to the difficulty and convergence of the problem, and generally take a value between 50 and 500.
  • Crossover rate: The crossover rate refers to the proportion of individuals performing the crossover operation in each generation. The crossover operation is the most important operation in the genetic algorithm, it can generate new individuals, increase the diversity of the population, and thus improve the search ability. If the crossover rate is too low, the search ability will be insufficient, and it will fall into a local optimal solution; if the crossover rate is too high, the search ability will be too strong, which will destroy excellent individuals and reduce the convergence speed. Generally speaking, the crossover rate should be determined according to the characteristics of the problem and the size of the search space, and generally takes a value between 0.6 and 0.9.
  • Mutation rate: Mutation rate refers to the proportion of genes undergoing mutation operations in each generation. The mutation operation is an operation that assists the crossover operation in the genetic algorithm. It can introduce new genes, increase the diversity of the population, and jump out of the local optimal solution. If the mutation rate is too low, it will lead to insufficient search ability and fall into the local optimal solution; if the mutation rate is too high, it will lead to strong search ability, destroy excellent individuals, and reduce the convergence speed. Generally speaking, the mutation rate should be determined according to the characteristics of the problem and the size of the search space, and generally takes a value between 0.01 and 0.1.

The role of open table and close table in A* algorithm

The A* algorithm is a heuristic search algorithm that finds the shortest path from a start point to an end point in a graph. The A* algorithm uses two lists, the open list and the close list

The open table is a priority queue that stores all nodes that have been evaluated for heuristic values ​​but have not yet been expanded into successor nodes . The heuristic function value is an estimated value representing the cost from the current node to the goal node. The A* algorithm selects a node with the smallest heuristic function value from the open list each time, called the current node, and then removes it from the open list and adds it to the close list.

The close table is a collection that stores all nodes that have been visited. This can avoid repeated visits to the same node and improve search efficiency . The A* algorithm expands all the successor nodes of the current node each time, calculates their heuristic function values, and checks whether they already exist in the open table or the close table. If they do not exist, they are added to the open table; if they exist, their heuristic function values ​​are compared, and if the new value is smaller, their values ​​are updated, and their parent nodes are modified as the current node. ¹²

This process is repeated until the target node is found or the open list is empty. If the target node is found, trace back along the parent node pointer to get the shortest path; if the open list is empty, it means that there is no feasible path. ¹²

The role of the open table and the close table is to help the A* algorithm search the nodes in the graph according to the order of the heuristic function values, and record the nodes that have been visited, so as to find the shortest path.

Why is the open list growing in the A* algorithm

The Open list grows because every time a node is expanded, its adjacent reachable nodes are added to the Open list (unless they are already in the Closed list or are obstacles). In this way, the number of nodes in the Open table will increase with the search process until the end is found or there are no expandable nodes.

The influence of the value of the heuristic function h(n)

h(n) is relatively large: reduce the search workload, but may not find the optimal solution;

The proportion of h(n) is small: it generally leads to increased workload, and in extreme cases, it becomes a blind search, but the optimal solution may be found.

The conditions under which the A* algorithm can find the optimal solution

The heuristic function must be admissible: the heuristic function cannot overestimate the distance from the current node to the target node. That is, the distance estimated by the function cannot be greater than the actual distance. If the heuristic is consistent, the A* search algorithm is guaranteed to find the shortest path.

The graph structure must not include acyclic: The A* algorithm is only suitable for problems where the graph structure does not include cycles. Because for a graph structure containing a loop, the heuristic function may cause the algorithm to loop infinitely in the loop.

Components of a pattern recognition system

Example: palmprint recognition

Difference between supervised and unsupervised learning

Supervised learning and unsupervised learning are two important concepts in machine learning.

Supervised learning requires labeled data as training data, while unsupervised learning does not.

The goal of supervised learning is to predict the label of unknown data from known labeled data , while the goal of unsupervised learning is to discover patterns and structures in the data

Three Principles of Model Evaluation

Occam's razor: the simpler the model, the better

The principle of sample sampling when dividing the data set: the distribution of the training set, test set and verification set should be as consistent as possible

The principle of using the test set: Do not peek at the test set for any reason during the training phase; repeated evaluation of the test set is also a sneak peek

Precision and Recall

  • The precision rate refers to the proportion of the samples that are predicted to be positive examples that are actually positive examples. It reflects the accuracy of the prediction results, that is, to avoid misjudging negative examples as positive examples. The way to improve the precision rate is to increase the threshold for predicting a positive example, and only when the positive probability of the sample is high, it will be judged as a positive example, which can reduce the number of false positives (false positive), but at the same time Increase the number of false negatives, that is, positive examples are misjudged as negative examples.
  • The recall rate refers to the proportion of the predicted positive samples among the samples that are truly positive. It reflects the completeness of the prediction results, that is, covering all positive examples as much as possible. The way to improve the recall rate is to lower the threshold for predicting a positive example. As long as the sample has a certain positive probability, it will be judged as a positive example. This can reduce the number of false negatives (false negatives), but it will also increase false positives. The number of false positives, that is, false positives are misjudged as positives.

Therefore, precision and recall are a trade-off relationship, and increasing one tends to decrease the other, and vice versa. Different tasks or scenarios may have different preferences or requirements for the precision rate and recall rate. For example, in spam filtering, we are more inclined to improve the precision rate to avoid misjudging normal emails as spam; In disease diagnosis, we are more inclined to improve the recall rate and avoid misjudgment of sick people as healthy people.

Influence of K value selection in KNN

The difference between regression and classification problems

The difference between a regression problem and a classification problem is the type of output variable. The quantitative output is called regression, or continuous variable prediction; the qualitative output is called classification, or discrete variable prediction.

Classification problems are usually suited to predicting a class (or the probability of a class) rather than continuous values.

Regression problems are usually used to predict a value, such as predicting house prices, future weather conditions, and so on.

Steps in Machine Learning for Linear Regression Problems

1 Define a function (model) with unknown parameters

2 Define the loss function

3 Obtain the optimal parameters based on the optimization method

The meaning of batch and epoch during model training

In deep learning, batch refers to a way to simultaneously input multiple samples for training each time the model is trained. Batch size refers to the number of samples contained in each batch. The larger the batch size, the faster the training speed, but the larger the memory consumption; the smaller the batch size, the slower the training speed, but the smaller the memory consumption.

Epoch refers to the training process that goes through all the training data. In each epoch, the model performs a forward pass and back pass on all training data and updates the model parameters. The number of epochs refers to the number of times required to go through all the training data.

The difference between the sigmoid function and the ReLU function

The main difference between the sigmoid function and the ReLU function is their shape and properties. When the input value of the sigmoid function is large or small, the output value is close to 0 or 1, and when the input value is close to 0, the output value is close to 0.5. This means that when the input value of the sigmoid function is large or small, the gradient will become very small, causing the gradient disappearance problem. The ReLU function does not have this problem because it has a gradient of 0 when the input value is negative and a gradient of 1 when the input value is positive. This makes the ReLU function easier to train and can speed up convergence.

In addition, the ReLU function also has sparse and nonlinear characteristics. Sparsity means that the ReLU function can make the output of some neurons 0, making the neural network more sparse; nonlinearity means that the ReLU function can introduce nonlinear factors, making the neural network more flexible. These properties make the ReLU function widely used in deep learning and has been proven to be more effective than the sigmoid function.

neuron model

Forward Propagation and Back Propagation of BP Learning Algorithm

Forward propagation: The input information is passed from the input layer to the hidden layer, and finally output at the output layer.

Backpropagation: Modify the weights of neurons in each layer to minimize the error signal

The flow of BP algorithm

The role of convolutional and pooling layers in CNN

In a convolutional neural network (CNN), both convolutional and pooling layers are used to extract image features. The convolution layer extracts the local features of the image by using the convolution kernel , and the pooling layer reduces the size of the feature map by downsampling the feature map output by the convolution layer . The pooling layer can help reduce overfitting and improve the generalization ability of the model

Key Techniques in CNN

partial link

Weight sharing: all convolution templates are the same

Multiple convolution kernels

Where CNN is superior to fully connected neural network

Convolutional layers have the following advantages over fully connected layers:

  1. Parameter sharing: The parameter sharing of the convolutional layer can greatly reduce the number of parameters that need to be learned and improve the generalization ability of the model. Compared with fully connected layers, convolutional layers need to learn fewer parameters, so they are easier to train and less prone to overfitting.
  2. Spatial locality: Convolutional layers are able to capture spatial locality information in the input data. In image recognition tasks, the relationship between adjacent pixels is very important, and the convolutional layer can use this relationship to extract image features, while the fully connected layer cannot capture this information.
  3. Computational complexity: Convolutional layers have lower computational complexity than fully connected layers due to parameter sharing and spatial locality. Convolutional layers are more efficient when dealing with large-scale data.
  4. Model generalization ability: The convolutional layer can learn the local features of the input data, so it has better model generalization ability. Convolutional layers are easier to handle noise and distortion in the input data than fully connected layers.
  5. Interpretability: The output of the convolutional layer can be seen as a feature map of the input data, so it is easier to explain the prediction results of the model. Convolutional layers are easier to understand and interpret than fully connected layers.

To sum up, the convolutional layer has better model generalization ability, higher computational efficiency and better interpretability than the fully connected layer , and is suitable for processing two-dimensional or three-dimensional data such as images and audio.

GAN workflow

A Generative Adversarial Network (GAN) is a deep learning model that consists of two neural networks: a generator and a discriminator. The role of the generator is to generate new data similar to the training data, while the role of the discriminator is to distinguish real data from the data generated by the generator. The training process of GAN is to alternately train the two networks until the data generated by the generator can be confused with the real, and reach a certain balance with the ability of the discriminator. The workflow of GAN is as follows:

  1. Initialize the parameters of the generator and discriminator.
  2. Draw n samples from the training set, and sample n noise samples from the noise distribution.
  3. Fix the generator and train the discriminator to distinguish real from fake as much as possible.
  4. Fix the discriminator and train the generator to fool the discriminator as much as possible.
  5. After multiple update iterations, the final discriminator cannot distinguish whether the picture comes from the real training sample set or the sample generated by the generator G. At this time, the discrimination probability is 0.5, and the training is completed.

Guess you like

Origin blog.csdn.net/qq_51235856/article/details/130441496