Detailed explanation of genetic algorithm (GA)

Genetic Algorithm

        As usual, the scientific definition is given first:

       Genetic Algorithm (GA) originated from computer simulation research on biological systems. It is a random global search and optimization method developed by imitating the biological evolution mechanism in nature, drawing on Darwin's theory of evolution and Mendel's theory of genetics. Its essence is an efficient, parallel, global search method, which can automatically acquire and accumulate knowledge about the search space during the search process, and adaptively control the search process to obtain the best solution.

 

     Then give the relevant terms: (just take a look at it, it will be involved later, and then elaborate)

Genotype: The internal representation of the trait chromosome;

Phenotype: The external manifestation of a chromosomally determined trait, or the external manifestation of an individual formed according to genotype;

Evolution: The population gradually adapts to the living environment, and the quality is continuously improved. The evolution of organisms takes place in the form of populations.

Fitness: A measure of how well a species adapts to its living environment.

Selection: Selecting several individuals from a population with a certain probability . Generally, the selection process is a process of survival of the fittest based on fitness .

Replication (reproduction): When a cell divides, the genetic material DNA is transferred to the newly generated cell through replication, and the new cell inherits the genes of the old cell.

Crossover: DNA is cut at the same position on two chromosomes, and the two strings are crossed and combined to form two new chromosomes. Also known as genetic recombination or hybridization;

Mutation: Some replication errors may (with a small probability) occur during replication, and mutation produces new chromosomes that exhibit new traits.

Coding: The genetic information in DNA is arranged in a certain pattern on a long chain. Genetic coding can be viewed as a mapping from phenotype to genotype.

Decoding: Mapping of genotypes to phenotypes.

Individual: refers to an entity with characteristics of chromosomes;
population: a collection of individuals, and the number of individuals in this collection is called a population

                  the size of.        

 

       There are many interesting applications of genetic algorithm, such as pathfinding problem, 8-digit problem, prisoner's dilemma, motion control, finding the center of a circle (in an irregular polygon, find the center of the largest circle contained in the polygon), TSP problem , production scheduling problems, artificial life simulation, etc. Let me talk about genetic algorithms using kangaroo as an example. (because kangaroos can jump)

 

     Each chromosome in the genetic algorithm corresponds to a solution of the genetic algorithm. Generally, we use the fitness function to measure the pros and cons of this solution. So the fitness from a genome to its solution forms a map. The process of genetic algorithm can be regarded as a process of finding the optimal solution in a multivariate function. It can be imagined that there are countless "mountains" in this multi-dimensional surface, and these peaks correspond to the local optimal solution. And there will also be a "mountain" with the highest altitude, then this is the global optimal solution. The task of the genetic algorithm is to try to climb to the highest peak, instead of falling into some small peaks. (In addition, it is worth noting that the genetic algorithm does not have to find the "highest mountain". If the fitness evaluation of the problem is as small as possible, then the global optimal solution is the minimum value of the function. Correspondingly, the genetic algorithm is looking for "The Deepest Bottom")

                                                          


Problems and solutions:

   Let us first consider the solution to the following problem.

           Known unary functions:

      

Now it is required to find the maximum value of the function within a given interval  


                                                    

The "Kangaroo Jump" Problem

        Since we understand the function curve as a mountain range composed of peaks and valleys. Then we can imagine that each solution we get is a kangaroo, and we want them to keep jumping higher and higher until they reach the highest mountain (although the kangaroo itself may not be willing to do that). So the process of finding the maximum value is transformed into a "kangaroo jump" process.

As a comparison, here are a few ways of "kangaroo jumping".

 1. Mountain climbing method (fastest ascent mountain climbing method):

      Randomly generate adjacent points from the search space, select the individual with the best corresponding solution, replace the original individual, and repeat the above process continuously. Because the hill-climbing method only compares "adjacent" points, the vision is "short-sighted", and it often can only converge to the local optimal solution that is relatively close to the initial position. For problems with many local optima, the chances of finding the global optimal solution with a simple iteration are very slim. (In the climbing method, the kangaroo has the best hope of reaching the summit closest to its starting point, but there is no guarantee that the summit is Mount Everest, or a very high mountain. Because it only goes uphill, not downhill.)

2. Simulated Annealing:

     This method is inspired by the thermal processing of metals. During metal thermal processing, when the temperature of the metal exceeds its melting point (Melting Point), the atoms move violently and randomly. Like all other physical systems, this motion of atoms tends to seek out minima of their energy. In this energy transition process, at the beginning, the temperature is very high, which makes the atoms have high energy. As the temperature continues to decrease, the metal gradually cools, and the energy of the atoms in the metal becomes smaller and smaller, and finally reaches the lowest point possible. When using simulated annealing, let the algorithm start with a large jump, so that it has enough "energy" to escape from the local optimal solution that may "pass by" without being limited in it, when it stops near the global optimal solution When , gradually reduce the jump amount in order to make it "settle" to the global optimal solution. (In simulated annealing, the kangaroo got drunk and jumped randomly for a long time. With any luck, it jumped from one mountain to another higher mountain. But in the end, it gradually sobered up and Jump towards the summit it's on.)

3. Genetic algorithm:

    Simulating the biological evolutionary process of natural selection, it performs a multi-directional search by maintaining a population of potential solutions, and supports the composition and exchange of information in these directions. The search based on the face unit can find the global optimal solution better than the search based on the point unit. (In GA, there are a lot of kangaroos, and they land anywhere in the Himalayas. These kangaroos don't know that their mission is to find Mount Everest. But every few years, at some lower altitude Shoot some kangaroos and hope the surviving ones are prolific and have babies where they are . ) (Or to put it another way. Once upon a time, there were large colonies of kangaroos that were inexplicably scattered in the Himalayas. So I had to live hard there. The low altitude is filled with a colorless and odorless poisonous gas. The higher the altitude, the thinner the poisonous gas. But the poor kangaroos are completely unaware of this, and they are still used to jumping around. So, there are kangaroos constantly. Died at lower altitudes, and the higher the altitude, the longer the kangaroos can live and the better the chance to have children. After so many years, these kangaroos even unconsciously gathered into one However, among all the kangaroos, only the kangaroos that gathered to Mount Everest were brought back to beautiful Australia.)

 

 

The implementation process of genetic algorithm

        The implementation process of genetic algorithm is actually like the evolution process in nature. Start by looking for a way to "digitally" encode potential solutions to the problem. (establish the mapping relationship between phenotype and genotype) and then initialize a population with random numbers (then the first batch of kangaroos are randomly scattered on the mountains), and the individuals in the population are these digital codes. Next, after passing through the appropriate decoding process (getting the position coordinates of the kangaroo), use the fitness function to make a fitness evaluation for each gene individual (the higher the kangaroo climbs, the more we like it, so the fitness is correspondingly higher. high). Use the selection function to choose the best according to a certain rule (we have to shoot some kangaroos at lower altitudes on the mountain at regular intervals to ensure that the overall number of kangaroos is the same.). Make individual genes mutate (make kangaroos jump randomly). The offspring are then produced (hopefully the surviving kangaroos are prolific and have offspring there). The genetic algorithm does not guarantee that you can obtain the optimal solution to the problem, but the biggest advantage of using the genetic algorithm is that you do not have to understand and worry about how to "find" the optimal solution. (You don't have to instruct the kangaroo to jump in which direction or how far.) Just simply "negate" some individuals who are underperforming. ( Shooting those kangaroos that always love going downhill is the essence of genetic algorithms! )

 

 So we summarize the general steps of the genetic algorithm:

       Start looping until a satisfactory solution is found.

1. Assess the fitness of the individuals corresponding to each chromosome.

2. According to the principle that the higher the fitness, the greater the selection probability, select two individuals from the population as the father and mother.

3. Extract the chromosomes of both parents and cross them to produce offspring.

4. To mutate the chromosomes of the offspring.

5. Repeat steps 2, 3, and 4 until a new population is generated.

End the loop.

                                                

Next, we will dissect every detail of the genetic algorithm process in detail.

Compiling the chromosomes of kangaroos - how genes are encoded

      Inspired by the structure of human chromosomes, we can imagine that there are only two bases "0" and "1", and we also use a chain to connect them in an orderly manner, because each unit can express 1 bit of information, so a long enough chromosome can outline all the characteristics of an individual for us. This is the binary encoding method , and the chromosomes are roughly as follows:

010010011011011110111110

     Although the above encoding method is simple and intuitive, it is obvious that when the individual characteristics are more complex, a large amount of encoding is required to accurately describe the corresponding decoding process (similar to the DNA translation process in biology, which is to map genotypes to The process of phenotyping.) will be too complicated, in order to improve the computational complexity of the genetic algorithm and improve the operational efficiency, floating-point coding is proposed. The chromosomes are roughly as follows:

1.2 –3.3 – 2.0 –5.4 – 2.7 – 4.3

(Note: there is another encoding method called symbol encoding)

      So how do we use these two encodings to encode the chromosomes of kangaroos? Because the purpose of coding is to establish a mapping relationship between phenotypes and genotypes, and phenotypes are generally understood as individual characteristics. For example, the human genotype is described by 46 chromosomes, but it can be decoded into a living person with different characteristics such as eyes, ears, mouth, and nose. So if we want to encode the chromosome of "kangaroo", we must first consider what the "individual characteristics" of "kangaroo" are. Some people may say that there are many characteristics of kangaroos, such as gender, length, weight, and maybe what it likes to eat can also be counted as one of the characteristics. But in the case of solving this problem, we should think further: no matter whether the kangaroo is long or short, fat or thin, black and white, as long as it is at a low altitude, it will be shot and killed. At the same time, there is no rule that the kangaroo can jump farther, and its body The short kangaroo jumps closer. Of course it doesn't even matter what it likes to eat. We only care about one thing from start to finish: where the kangaroo is. Because as long as we know the kangaroo is there, we can do two things that must be done:

(1) Know the altitude of the kangaroo by consulting the map of the Himalayas (the value of the fitness function is obtained through the independent variable.) to judge whether we need to shoot it.

(2) Know which new position the kangaroo goes to after a hop (crossover and mutation).

      If we can't accurately judge which "individual characteristics" are necessary and which are unnecessary, we can often use such a way of thinking: For example, if you think that what kangaroos like to eat are very necessary, then you can think about it , There are two kangaroos, and their other individual characteristics are exactly the same, one looks black, and the other is not so black. You'll see right away that this won't affect their fate in the slightest, and they should have an equal chance of being shot! Just because they are in the same place . ( It is worth mentioning that if your genetic coding design includes the information about whether the kangaroo is black or not, it will not affect the evolution of the kangaroo, and the kangaroo that climbed Mount Everest is black and white. Totally random, but its location is pretty deterministic.)

   The above is the thinking process often experienced in the process of coding the genetic algorithm. It is necessary to abstract the specific problem into a mathematical model, highlight the main contradiction and discard the secondary contradiction. Only in this way can the problem be solved concisely and effectively.

     Since the position of the kangaroo has been determined as an individual characteristic, specifically the position is the abscissa. Next, we need to establish the mapping relationship between phenotype and genotype. That is how to use coding to show the abscissa where the kangaroo is. Since the abscissa is a real number, we need to encode this real number to put it bluntly. Looking back at the two encoding methods we introduced above, the first thing that comes to mind should be that for binary encoding, the encoding will be more complicated, while for floating-point encoding, it will be more concise. Well, as you can imagine, encoding with floating-point numbers only requires a single floating-point number. The following describes how to create a mapping from a binary code to a real number.

  Obviously, a binary code sequence of a certain length can only represent floating point numbers of a certain precision. For example, we require the solution to be accurate to six decimal places . Since the length of the interval is 2 – (-1) = 3, in order to ensure the accuracy requirement, at least divide the interval [-1,2] into 3 × 10 6 equal parts. also because

           

So the encoded binary string needs at least 22 bits.

       Convert a binary string (b0,b1,....bn) into the corresponding real value in the bit interval through the following two steps.

    (1) Convert the binary number represented by a binary string to a decimal number:

                 

    (2) The real numbers in the corresponding interval:

                          

      (Like an analog-to-digital conversion)

   For example a binary string <1000101110110101000111> represents the real value 0.637197.

         

(to correct a mistake, -1 here)  

       The binary strings <0000000000000000000000> and <1111111111111111111111> represent the two endpoints of the interval -1 and 2, respectively.

     Ok, so far we have thoroughly studied the chromosomes of kangaroos, let us continue to follow the evolutionary journey of kangaroos

Natural selection - fitness score and selection function.

1. The competition of things - fitness function (fitness function)

   The process of biological competition in nature often includes two aspects: the struggle between organisms and the struggle between organisms and the objective environment. But in our case, you can imagine that kangaroos are very friendly to each other, and they don't need to fight each other for the right to survive. Their life and death depend more on your judgment. Because you have to measure which kangaroos should be killed and which kangaroos should not be killed, so you have to develop a measure. For this problem, the standard for this measurement is relatively easy to formulate: the altitude of the kangaroo. (Because you simply want the kangaroos to climb as high as possible.) So we directly use the kangaroos' altitude as their fitness score. That is, the fitness function can directly return the function value.

2. Natural selection - selection function (selection)

    In nature, the more adapted individuals are, the more likely they are to reproduce. However, it cannot be said that the higher the fitness, the more offspring, but only in terms of probability. (After all, some kangaroos at lower altitudes are lucky enough to escape your eyes.) So how do we establish this probability relationship? Below we introduce a commonly used selection method - roulette (Roulette Wheel Selection) selection method.                                 

     For example, we have 5 chromosomes, and their corresponding fitness scores are: 5, 7, 10, 13, 15.

       So the cumulative total fitness is:

                                  

       So the probability of each individual being selected is:

                                    

  

You can imagine that when we turn the wheel, when the wheel stops, the pointer will randomly point to the area represented by a certain individual, then very lucky, this individual is selected. (Obviously, individuals with higher fitness scores are more likely to be selected.)

Note: There is also an elite selection mechanism

 

Genetic variation - genetic recombination (crossover) and genetic mutation.

  It should be said that these two steps are the fundamental reasons that make the offspring different from the parent ( note that I did not say that the offspring is superior to the parent, only after natural selection will the offspring tend to be superior to the parent. ). For these two genetic operations, binary encoding and floating-point encoding are very different in processing. The genetic operation process of binary encoding is more similar to the process in nature, which will be described separately below.

1. Recombination/crossover

   (1) Binary encoding

    The gene exchange process of binary coding is very similar to the process of association of homologous chromosomes in high school biology - randomly exchange several codes in the same position to generate new individuals.



(2) Floating point encoding

     If a gene contains multiple floating-point codes, gene crossover can also be performed in a similar way to the above, except that the basic unit of crossover is not binary code, but floating-point number. And if there is a single floating-point gene crossover, there are other different recombination methods, such as intermediate recombination: random generation can get a value between the parent gene encoding value and the maternal gene encoding value as the offspring gene encoding. value of . For example, 5.5 and 6 cross, resulting in 5.7, 5.6.

   Consider the specifics of the "kangaroo jump" problem - the individual characteristics of a kangaroo are only represented by its location. It is conceivable that the genes of kangaroos in the same position are exactly the same, and after two identical genes are crossed, it is equivalent to doing nothing, so we do not intend to use the genetic operation step of crossover in this example. (Of course, it is not impossible to insist on this operation step. You can catch two kangaroos in different places, let them mate, and then produce offspring, and then send them to where they should be.)

2. Mutation

  (1) Binary encoding

     Gene mutation process: Gene mutation is a change in a gene at a certain locus of a chromosome. Mutations turn a gene into its allele and usually cause certain phenotypic changes. As mentioned above, the genetic manipulation process of binary coding is very similar to the process in biology, and the "0" or "1" on the gene string has a certain chance to become the opposite "1" or "0". For example, the following binary code:

101101001011001

After genetic mutation, it may become the following new code:

001101011011001

(2) Floating point encoding

      The gene mutation process of floating-point encoding generally adds or subtracts a small random number to the original floating-point number. For example, the original floating-point string is as follows:

1.2,3.4,5.1, 6.0, 4.5

After mutation, the following floating-point strings may be obtained:

1.3,3.1,4.9, 6.3, 4.4

  Of course, this small random number also has a size, we generally call it "step size". (Think of the "kangaroo jump" problem. The length of the kangaroo jump is this step size.) Generally speaking , the larger the step size, the faster the evolution will be at the beginning, but it will be more difficult to converge to a precise point later . The small step size can converge to a point more accurately. Therefore, in many cases, in order to speed up the evolution of the genetic algorithm and ensure that it can converge to the optimal solution more accurately in the later stage, the method of dynamically changing the step size is adopted. In fact, this process is similar to the simulated annealing process described earlier.

  So far, genetic coding, genetic fitness evaluation, genetic selection, and genetic variation have all been realized one by one , and the rest is to assemble the "parts" of these genetic processes. (written in code)

 

Here is the result of running the above example:


The red point represents the real maximum point, which can be obtained by the derivation method as f(1.85)=3.85










Summarize:

Encoding Principle
Completeness: All solutions of the problem space can be represented as designed genotypes;
Soundness: Any genotype corresponds to a possible solution;
Non-redundancy: There is a one-to-one correspondence between the problem space and the expression space.

Importance of
     fitness function The selection of fitness function directly affects the convergence speed of genetic algorithm and whether it can find the optimal solution. Generally speaking, the fitness function is transformed from the objective function.

Improper design of the fitness function may lead to deception problems:
(1) In the early stage of evolution, individual supernormal individuals control the selection process;
(2) In the late stage of evolution, individual differences are too small and fall into local extreme values.

Examples of cheating problems:

Or the kangaroo problem. If the poisonous smog occurs at low altitudes, it will kill the kangaroos. Only the kangaroos that climb to the top of Mount Everest can survive.

Because there are many peaks in the Himalayas, we use height as fitness, case (1): if the monkeys not on Mount Everest are taller than the monkeys on the halfway of Mount Everest, because the population size remains the same, the monkeys on Mount Everest may be Will be eliminated; case (2): 100 monkeys are not on Mount Everest;

1. The role of selection: survival of the fittest, survival of the fittest;

2. The role of crossover: to ensure the stability of the population and to evolve towards the optimal solution;

3. The role of variation: to ensure the diversity of the population and avoid local convergence that may result from crossover.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324514371&siteId=291194637