Detailed Explanation and Implementation of Genetic Algorithm

Introduction to Genetic Algorithms

Genetic algorithms are a family of search algorithms inspired by the theory of natural evolution. By mimicking the process of natural selection and reproduction, genetic algorithms can provide high-quality solutions to a variety of problems involving search, optimization, and learning. At the same time, they are similar to natural evolution and thus can overcome some of the obstacles encountered by traditional search and optimization algorithms, especially for problems with a large number of parameters and complex mathematical representations.

Darwinian evolution analogy

Darwin's theory of evolution

A genetic algorithm is a simplified version of Darwinian evolution that is analogous to nature.
The principles of Darwin's theory of evolution can be summarized as follows:

  1. Variation: The characteristics (traits, attributes) of individual samples in a population may vary, which causes samples to differ to some extent from each other.
  2. Inheritance: Certain traits can be passed on to its offspring. This leads to a certain degree of similarity between the offspring and the parental samples.
  3. Selection: Populations typically compete for resources in a given environment. Individuals that are better adapted to the environment have an advantage in survival and thus produce more offspring.

In other words, evolution maintains samples of individuals in a population that differ from each other. Individuals that are adapted to their environment are more likely to survive, reproduce and pass on their traits to the next generation. In this way, as generations pass, species become more adapted to their environment. An important driver of evolution is crossover or recombination or hybridization—combining traits from both parents to produce offspring. Crossovers help maintain the diversity of the population and blend better traits together over time. In addition, mutations, or mutations (random variations in traits), can play an important role in evolution by introducing chance changes.

The corresponding concept of genetic algorithm

Genetic algorithms try to find the best solution for a given problem. Darwinian evolution preserves the individual traits of a population, while a genetic algorithm preserves a set of candidate solutions (also known as individuals) for a given problem. These candidate solutions are iteratively evaluated and used to create the next generation of solutions. A better solution has a greater chance to be selected, and its characteristics are passed to the next generation of candidate solution sets. In this way, with generational updates, the set of candidate solutions can better solve the current problem.
Genotype
In nature, reproduction, reproduction, and mutation are characterized by the genotype, which is the collection of genes that make up chromosomes.
In a genetic algorithm, each individual is composed of chromosomes representing a set of genes. For example, a chromosome can be represented as a binary string, where each bit represents a gene:
chromosome
Population
Genetic algorithms maintain a large number of individuals (individuals) - the collection of candidate solutions for the current problem. Since each individual is represented by a chromosome, individuals of these races can be viewed as a collection of chromosomes:
set of candidate solutions

Fitness function
At each iteration of the algorithm, individuals are evaluated using a fitness function (also known as the objective function). The objective function is the function used for optimization or the problem it is trying to solve.
Individuals with higher fitness scores represent better solutions, which are more likely to be selected for breeding and whose traits will be expressed in the next generation. As the genetic algorithm proceeds, the quality of the solution will increase and the fitness will increase. Once a solution with a satisfactory fitness value is found, the genetic algorithm will be terminated.

Selection
After calculating the fitness of each individual in the population, a selection process is used to determine which individual in the population will be used to reproduce and produce the next generation, individuals with higher values ​​are more likely to be selected and pass on their genetic material to the next generation.
There is still a chance of selecting individuals with low fitness values, but with a lower probability. In this way, its genetic material is not completely discarded.
Crossover
To create a new pair of individuals, parts of the chromosomes of a sample of parents selected from the current generation are usually exchanged (crossed over) to create two new chromosomes representing the offspring. This operation is called interleaving or recombining:
cross

Mutation
The purpose of the mutation operation is to periodically update the population randomly, introducing new patterns into chromosomes and encouraging searches in unknown regions of the solution space.
Mutations may appear as random changes in genes. Mutation is achieved by randomly changing one or more chromosome values. For example, to flip a bit in a binary string:
mutation

Genetic Algorithm Theory

Theoretical hypothesis of constructing genetic algorithm - the optimal solution for the current problem is composed of multiple elements, when more such elements are combined together, it will be closer to the optimal solution of the problem.
The individuals in the population contain some elements required for the optimal solution, and the repeated selection and crossover process will allow the individual to convey these elements to the next generation, possibly combining them with other essential elements of the optimal solution. This creates genetic pressure that directs more and more individuals in the population to contain elements that constitute the optimal solution.

schema theorem

A more formal expression of the factor assumption is the Holland schema theorem, also known as the fundamental theorem of genetic algorithms.
The theorem states that schemas are patterns (or templates) that can be found within chromosomes. Each schema represents a subset of chromosomes with a certain similarity.
For example, if a group of chromosomes is represented by a binary string of length 4, then the schema 1 01 represents all of these chromosomes, where the leftmost position is 1, the rightmost two positions are 01, and the second position from the left is is 1 or 0, which represents a wildcard.
For each schema, there are the following two metrics:

  1. Order: the number of fixed numbers
  2. Defining length: the distance between the furthest two fixed numbers
    The following table provides a few examples of four-bit binary schemas and their measurements:
Schema Order Defining Length
1101 4 3
1*01 3 3
*101 3 2
**01 2 1
1*** 1 0
**** 0 0

Each chromosome in the population corresponds to multiple schemas. For example, chromosome 1101 corresponds to every pattern that appears in this table because it matches every pattern they represent. If the chromosome has a high fitness, it and all the schemas it represents are more likely to survive the selection operation. When this chromosome crosses over with another or mutates, some patterns will be preserved while others will disappear. Low-order schemas and schemas with short definition lengths are more likely to survive.
Thus, the schema theorem states that the frequency of schemas of low order, short definition length, and above-average fitness increases exponentially in successive generations. In other words, as the genetic algorithm develops, smaller, simpler building blocks of features that represent properties of more efficient solutions will increasingly appear in the population.

Differences between Genetic Algorithms and Traditional Algorithms

There are some important differences between genetic algorithms and traditional search and optimization algorithms such as gradient-based algorithms.

  1. Population-based
    genetic searches are performed against a set of candidate solutions (individuals) rather than a single candidate. During the search, the algorithm keeps a set of individuals from the current generation. Each iteration of the genetic algorithm creates the next generation of individuals.
    In contrast, most other search algorithms maintain a single solution and modify it iteratively to find the best solution. For example, the gradient descent algorithm iteratively moves the current solution in the direction of the current steepest descent, which is the negative of the gradient of the given function.
  2. Genetic representations
    Genetic algorithms operate not directly on candidate solutions, but on their representations (or codes) (often called chromosomes). Chromosomes are capable of genetic manipulation using crossover and mutation.
    The downside of using a genetic representation is that it decouples the search process from the original problem domain. Genetic algorithms don't know what chromosomes represent and don't try to interpret them.
  3. Fitness function
    The fitness function represents the problem to be solved. The purpose of the genetic algorithm is to find the individual with the highest score obtained by using the fitness function.
    Unlike many traditional search algorithms, genetic algorithms only consider the values ​​obtained with the fitness function and do not rely on derivatives or any other information. This makes them suitable for working with functions that are difficult or impossible to derive mathematically.
  4. Probabilistic Behavior
    While many traditional algorithms are deterministic in nature, the rules that genetic algorithms use to generate the next generation from one generation to the next are probabilistic.
    For example, selected individuals will be used to create the next generation, and the probability of selecting an individual increases with the individual's fitness score, but it is still possible to select an individual with a lower score.
    Despite the probabilistic nature of the process, a search based on a genetic algorithm is not random; instead, it uses randomness to direct the search toward areas of the search space that have a better chance of improving results.

Pros and Cons of Genetic Algorithms

advantage

  1. Global Optimum
    In many cases, optimization problems have local maxima and minima. These values ​​represent a solution that is better than surrounding solutions, but not optimal.
    Most traditional search and optimization algorithms, especially gradient-based search and optimization algorithms, are prone to getting stuck in local maxima instead of finding the global maxima.
    Genetic algorithms are more likely to find the global maximum. This is due to the use of a set of candidate solutions instead of one, and in many cases the crossover and mutation operations will result in a candidate solution that differs from previous solutions. As long as we try to maintain the diversity of the population and avoid premature convergence (premature convergence), it is possible to produce a global optimal solution.
  2. Handling Complex Problems
    Because genetic algorithms only require the fitness function score for each individual and are independent of other aspects of the fitness function (such as derivatives), they can be used to solve functions with complex mathematical representations that are difficult or impossible to derive.
  3. Dealing with problems that lack mathematical representation
    Genetic algorithms can be used for problems that lack mathematical representation altogether. This is due to the fact that fitness is artificially designed. For example, to find the most attractive color combinations, try different color combinations and ask users to rate the attractiveness of these combinations. A genetic algorithm is applied to search for the best combination of scores using opinion-based scores as a fitness function. Even though the fitness function lacks a mathematical representation and scores cannot be calculated directly from a given color combination, it is still possible to run a genetic algorithm.
    Genetic algorithms can even handle cases where the fitness of each individual is not available, as long as it is possible to compare two individuals and determine which of them is better. For example, using an ML algorithm to drive a car in a simulated race and then using a genetic algorithm-based search can optimize and tune the ML algorithm by pitting different versions of the ML algorithm against each other to determine which version is better.
  4. Noise Tolerance
    Noise may be present in some problems. This means that even for similar input values, the resulting output values ​​may vary each time. This can happen, for example, when anomalous data is produced from sensors, or where scoring is based on a human perspective.
    Although this behavior can interfere with many traditional search algorithms, genetic algorithms are generally robust to this, thanks to the repeated crossover and re-evaluation of individuals.
  5. Parallelism
    Genetic algorithms are well suited for parallelization and distributed processing. Fitness is calculated independently for each individual, which means that all individuals in the population can be evaluated simultaneously.
    In addition, the operations of selection, crossover and mutation can be performed simultaneously on individuals and pairs of individuals in the population, respectively.
  6. Continuous Learning
    Evolution never stops, as environmental conditions change, the population gradually adapts to them. Genetic algorithms can run continuously in a changing environment, and can obtain and use the current best solution at any point in time. But the changing speed of the required environment is slower than the search speed of the genetic algorithm.

limitation

  1. Special Definitions Required
    When applying genetic algorithms to a given problem, it is necessary to create suitable representations for them—defining the fitness function and chromosome structure, as well as the selection, crossover, and mutation operators applicable to the problem.
  2. Hyperparameter Tuning
    The behavior of a genetic algorithm is controlled by a set of hyperparameters, such as population size and mutation rate, etc. There are no standard rules for setting hyperparameters when applying genetic algorithms to specific problems.
  3. Computationally
    intensive Large population sizes can be computationally intensive and can be time consuming before good results are achieved.
    These issues can be mitigated through choice of hyperparameters, parallel processing, and in some cases caching of intermediate results.
  4. Premature Convergence
    If the fitness of an individual is much higher than that of the rest of the population, it may repeat enough to cover the entire population. This can cause the genetic algorithm to get stuck in a local maximum prematurely instead of finding the global maximum.
    To prevent this from happening, species diversity needs to be guaranteed.
  5. Unguaranteed solution quality
    The use of genetic algorithms does not guarantee finding the global maximum for the problem at hand (but this is the case with almost all search and optimization algorithms, unless it is an analytical solution to a particular type of problem).
    In general, genetic algorithms can provide good solutions in a reasonable amount of time when used appropriately.

Genetic Algorithm Application Scenarios

  1. Mathematical representation of complex problems: Since genetic algorithms require only the result of the fitness function, they can be used for problems with objective functions that are difficult or impossible to differentiate, problems with a large number of parameters, and problem types with mixed parameters.
  2. Problems without a mathematical expression: Genetic algorithms do not require a mathematical representation of the problem as long as a fractional value can be obtained or there is a way to compare two solutions.
  3. Problems involving noisy data: Genetic algorithms can cope with data that may be inconsistent, such as data derived from sensor outputs or based on human scoring.
  4. Implications for environments that change over time: Genetic algorithms can respond to slower environmental changes by continually creating new generations that adapt to the changes.

But when the problem has known and specialized solutions, using existing traditional methods or analytical methods may be a more effective option.

Components of a Genetic Algorithm

At the heart of a genetic algorithm are loops—the genetic operators of selection, crossover, and mutation are applied sequentially, and the individual is then re-evaluated—continuing until a stopping condition is met

The basic flow of the algorithm

The following flowchart shows the main stages of a basic genetic algorithm process:

Created with Raphaël 2.3.0 开始 创建初始种群 计算种群中每个个体的适应度 选择 交叉 突变 计算种群中每个个体的适应度 满足终止条件? 选择适应度最高的个体 结束 yes no

Create initial population

初始种群是随机选择的一组有效候选解(个体)。由于遗传算法使用染色体代表每个个体,因此初始种群实际上是一组染色体。

计算适应度

适应度函数的值是针对每个个体计算的。对于初始种群,此操作将执行一次,然后在应用选择、交叉和突变的遗传算子后,再对每个新一代进行。由于每个个体的适应度独立于其他个体,因此可以并行计算。
由于适应度计算之后的选择阶段通常认为适应度得分较高的个体是更好的解决方案,因此遗传算法专注于寻找适应度得分的最大值。如果是需要最小值的问题,则适应度计算应将原始值取反,例如,将其乘以值(-1)。

选择、交叉和变异

将选择,交叉和突变的遗传算子应用到种群中,就产生了新一代,该新一代基于当前代中较好的个体。
选择(selection)操作负责当前种群中选择有优势的个体。
交叉(crossover,或重组,recombination)操作从选定的个体创建后代。这通常是通过两个被选定的个体互换他们染色体的一部分以创建代表后代的两个新染色体来完成的。
变异(mutation)操作可以将每个新创建个体的一个或多个染色体值(基因)随机进行变化。突变通常以非常低的概率发生。

算法终止条件

在确定算法是否可以停止时,可能有多种条件可以用于检查。两种最常用的停止条件是:

  1. 已达到最大世代数。这也用于限制算法消耗的运行时间和计算资源。
  2. 在过去的几代中,个体没有明显的改进。这可以通过存储每一代获得的最佳适应度值,然后将当前的最佳值与预定的几代之前获得的最佳值进行比较来实现。如果差异小于某个阈值,则算法可以停止。

其他停止条件:

  1. 自算法过程开始以来已经超过预定时间。
  2. 消耗了一定的成本或预算,例如CPU时间和/或内存。
  3. 最好的解已接管了一部分种群,该部分大于预设的阈值。

其他

精英主义(elitism)

尽管遗传算法群体的平均适应度通常随着世代的增加而增加,但在任何时候都有可能失去当代的最佳个体。这是由于选择、交叉和变异运算符在创建下一代的过程中改变了个体。在许多情况下,丢失是暂时的,因为这些个体(或更好的个体)将在下一代中重新引入种群。
但是,如果要保证最优秀的个体总是能进入下一代,则可以选用精英主义策略。这意味着,在我们使用通过选择、交叉和突变创建的后代填充种群之前,将前n个个体(n是预定义参数)复制到下一代。复制后的的精英个体仍然有资格参加选择过程,因此仍可以用作新个体的亲本。
Elitism策略有时会对算法的性能产生重大的积极影响,因为它避免了重新发现遗传过程中丢失的良好解决方案所需的潜在时间浪费。

小生境与共享

在自然界中,任何环境都可以进一步分为多个子环境或小生境,这些小生境由各种物种组成,它们利用每个小生境中可用的独特资源,例如食物和庇护所。例如,森林环境由树梢,灌木,森林地面,树根等组成;这些可容纳不同物种,它们生活在该小生境中并利用其资源。
当几种不同的物种共存于同一个小生境中时,它们都将争夺相同的资源,从而形成了寻找新的,无人居住的生态位并将其填充的趋势。
在遗传算法领域,这种小生境现象可用于维持种群的多样性以及寻找几个最佳解决方案,每个解决方案均被视为小生境。
例如,假设遗传算法试图最大化具有几个不同峰值的适应度函数:
fitness function with several peaks

Since the tendency of the genetic algorithm is to find the global maximum, most of the individuals are concentrated near the highest peak after a period of time. This is indicated in the figure by using x-marked positions, which represent individuals of the current generation.
But sometimes, in addition to the global maximum, we want to find some other (or all) peaks. To this end, each peak can be considered as a niche, providing resources in proportion to its height. Then, find a way to share (or distribute) these resources among the individuals who occupy them, ideally this will drive the species to allocate accordingly, with the highest peak attracting the most people because it offers the most rewards, and Other peaks reduce the number of species accordingly by offering less rewards:
Examples of niches

One way to implement the sharing mechanism is to divide each individual's raw fitness value by the sum of the distances of all other individuals (with respect to it). Another option is to divide each individual's raw fitness by the number of other individuals within some radius around it.

Genetic Algorithm Practice

Using the deap framework to implement the Hello world of the genetic algorithm——OneMax problem, the code realizes the link .

Guess you like

Origin blog.csdn.net/LOVEmy134611/article/details/111639624