The concept of genetic algorithm and python implementation

    Genetic algorithm is a very classic intelligent algorithm, mainly used to solve optimization problems. This article mainly briefly introduces some principles, and at the same time gives a template based on python for solving optimization problems within real numbers.

References in this article:

Principle: Detailed Introduction to Genetic Algorithms - Programmer Sought

a brief introdction

    Genetic algorithm refers to the genetics in biology. First, several populations are generated, and there are some individuals in them, that is, solutions. The fitness is set for different solutions, and operators are used to generate new solutions. Excellent genes are continuously inherited in the population, and finally optimization is achieved. the goal of.

    So in the end there must be several groups , and the individuals inside are all relatively good, and those who are not good will be eliminated.

Related concepts

    The following are some basic concepts. I refer to the above-mentioned link, but I just extracted some and made some notes. You can understand the details by yourself.

① Chromosome: Chromosomes can also be called genotype individuals (individuals), a certain number of individuals form a population, and the number of individuals in the population is called the population size.

② Bit String (Bit String): the representation of an individual. Corresponds to chromosome in genetics.

③ Gene (Gene): A gene is an element in a chromosome, which is used to represent the characteristics of an individual. For example, if there is a string (that is, chromosome) S=1011, the four elements 1, 0, 1, and 1 are called genes.

④ Feature value (Feature): When a string is used to represent an integer, the feature value of the gene is consistent with the weight of the binary number; for example, in the string S=1011, the 1 in the gene position 3 has a gene feature value of 2; the gene position 1 out of 1, its gene eigenvalue is 8.

⑤ Fitness (Fitness): The adaptability of each individual to the environment is called fitness. In order to reflect the adaptability of chromosomes, a function that can measure each chromosome in the problem is introduced, called fitness function. This function is usually used to calculate the probability of an individual being used in a group.

⑥ Genotype (Genotype): or genetic type, refers to the genome to define genetic characteristics and performance. For bit strings in GA.

⑦ Phenotype: The performance characteristics of the genotype of an organism in a specific environment. Corresponds to the decoded parameters of the bit string in GA.

Chromosome is an individual, and in specific calculation, it is a solution in n-dimensional space

A bit string is an encoded representation of an individual

A gene is a feature contained on a chromosome, that is, a coded bit

The characteristic value indicates the meaning of this bit under this encoding rule

The fitness indicates the probability of being used in the calculation process, which is generally the high-quality situation of the solution

genotype representation encoded string

The phenotype represents the decoded parameter

genetic step

  1. Encoding and Decoding Rules

    The two processes are reverse processes. Encoding is to decode a sequence into a sequence, generally using binary encoding , and decoding is the reverse.

  2. generate initial value

    This process is to generate the required variables

    Set the maximum evolution algebra T, population size M, crossover probability Pc, mutation probability Pm, randomly generate M individuals as the initial population P0

  3. change fitness

    It is mainly used to change the fitness of each individual to prevent the fitness from being equal, and then there will be no survival of the fittest.

    Generally speaking, it refers to the different stages of the algorithm iteration. By appropriately changing the fitness of the individual, it can avoid the weakening of competition caused by the similar fitness between the groups, and cause the population to converge to the local optimal solution.

    There are many ways, linear nonlinear...

  4. genetic operator

    1. choose

      Selection is to select an excellent population in an old population (including multiple individuals), and select excellent individuals to reproduce in it, so as to achieve excellent birth and excellent fertility.

      Roulette is commonly used , that is, the probability of individual selection is the fitness of the individual divided by the total fitness of the group. If the fitness is high, the probability of being selected is high, and excellent offspring can be obtained.

    2. cross

      Crossover is to generate some exchanges between two chromosomes, so as to complete the operation of generating new solutions, commonly used single-point crossover , and other...

    3. Mutations

      Mutation means that it has a certain probability of changing itself, such as changing a bit of binary, which also produces a new solution.

                There are three kinds of this process, which are mainly used to generate new solutions, liberate different ones together or generate new solutions by yourself. At the same time, these processes can also be presented in many ways , not necessarily limited to the ones mentioned, but also for different problems. There will be different solutions.

important point

Genetic algorithm y generally has four operating parameters that need to be set in advance, namely

M : population size

T : Termination evolution algebra of genetic algorithm

Pc: Crossover probability, generally 0.4~0.99

Pm: mutation probability, generally 0.001~0.1

Code

     The following is the specific code implementation. The coding method I use is to directly generate floating-point numbers . The selection operation retains a set proportion of individuals each time. The crossover operation selects two sets of parameters each time to exchange parameters with a certain probability to generate new individuals. Mutation operation For each value, re-value a parameter with a certain probability to simulate the mutation operation.

        There are three groups of recorded parameters, which are the optimal fitness of each round, the optimal result and the corresponding optimal parameters, and the final drawing.  

the code

import warnings
from math import log2
import matplotlib.pyplot as plt
import pandas as pd
warnings.filterwarnings('ignore')
import heapq
import itertools
from random import randint, random, uniform
import numpy as np

def select_op(fitness,op,select_rate,best_keep_rate):
    ans_list = []
    # 先选择保留部分
    keep_num = int(best_keep_rate * len(op)*select_rate)
    index_list = list(map(list(fitness).index, heapq.nlargest(keep_num, fitness)))
    for index in index_list:
        ans_list.append(op[index])
    # 保留的
    p =fitness/sum(fitness) # 计算每个个体占比
    p = np.array(list(itertools.accumulate(p))) # 计算累积密度函数
    # 采用轮盘赌方式选择
    for i in range(int(len(op)*select_rate)-keep_num): # 再产生这么多个
        r = random()
        index = np.argmax(p>r) # 找到第一个大于随机的值
        if index == 0 and p[0] < r: # 可能第一个并不大于这个数,可能是没找到,也是返回0
            continue
        ans_list.append(op[index])
    return ans_list

def cross_op(op,cross_rate,num):
    ans_list = []
    num_op = len(op) # 当前数量
    while num > 0:
        max_ = 5 # 最多找5次,如还是相同就用相同的,就说明这个基因很多
        while max_>0:
            n1 = randint(0,num_op-1)
            n2 = randint(0,num_op-1) # 不允许相同个体杂交
            max_ -= 1
            if op[n1] != op[n2]:
                break
        father = op[n1]
        mother = op[n2]
        if random() < cross_rate:
            location = randint(0,len(father)-1) # 随机产生交叉位置
            tmp = father[0:location+1] + mother[location+1:len(father)]
            ans_list.append(tmp)
            num -= 1
    return ans_list

def variation_op_10(new_op,variation_rate,low,high):
    for index,it in enumerate(new_op): # 一定概率变异
        if random() < variation_rate:
            location = randint(0, len(it) - 1)
            it = uniform(low[location],high[location])  # 随机产生数字
            new_op[index][location] = it
    return new_op


# 生成随机初始值
def ini_op(low_paras, high_paras, max_op_size):
    # 计算出每个参数应该占的位数
    st = 0
    ed = -1  # 为了保证st为-1
    para_range = []
    for i in range(len(low_paras)):
        low_it = low_paras[i]
        high_it = high_paras[i]
        num = int(log2(high_it - low_it + 1)) + 1  # 计算二进制位数
        st = ed + 1
        ed += num
        para_range.append((st, ed))  # 加入每个参数的范围,包括起始点和终点(在序列中的)
    op = []
    for i in range(max_op_size):
        tmp = [uniform(low_paras[k], high_paras[k]) for k in range(len(low_paras))]
        op.append(tmp)
    return op, para_range


def cal_fitness(op):
    ans_list = np.zeros((len(op), 1))
    for index, it in enumerate(op):  # 取出每个参数对应的数字
        if un_suit(it):  # 如果不满足约束条件
            ans_list[index] = 1000000000  # 给一个很大的值,最后要统一处理
            continue
        y = func(it)
        ans_list[index] = y
    ans_list = func_fitness(ans_list)
    return ans_list


# 自定义适应度函数计算
def func_fitness(ans_list):
    if model_dir == 'min':
        for index, it in enumerate(ans_list):
            ans_list[index] = 1 / it
    return ans_list


def un_suit(x):  # 定义参数不满足的约束条件
    # 参数范围约束
    for i in range(len(low_paras)):
        if x[i] < low_paras[i] or x[i] > high_paras[i]:
            return True
    # ...自行添加
    return False


# 定义计算函数
def func(x):
    return x[0] ** 2 + x[1] ** (3 / 2) + x[2] + x[3] ** (1 / 2)


# ---配置参数
paras_name = ['x1', 'x2', 'x3', 'x4']
high_paras = [60,60,40,30]  # 参数范围上限
low_paras = [10, 21,3,10]  # 参数范围下限
model_dir = 'min' # max表示越大越好,min表示越小越好
# ---配置遗传算法参数
max_op_size = 200  # 种群大小,这里也是考虑一个种群的优化问题
max_epochs = 200  # 迭代次数,也就是进化次数
cross_rate = 0.8  # 杂交率,选出两个个体之后以这个概率杂交(各取部分基因产生后代)
select_rate = 0.4  # 选择率,也就是选择保留占总的个数(这里实际是利用随机数抽取,而不是按照排序)
variation_rate = 0.1  # 变异率,产生新的个体以这个概率变异(一位重新赋值)
best_keep_rate = 0.1  # 每次选择必定保留的比例(排序靠前的部分)
# ---遗传算法求解
if __name__ == '__main__':
    data = np.array(pd.read_excel('../static/test.xlsx'))  # 读入数据
    op, para_range = ini_op(low_paras, high_paras, max_op_size)  # 初始化种群,返回种群和每个参数的位置范围[(l1,r1),(l2,r2)...]
    best_ans_history = []  # 记录最优答案历史
    best_para_history = []  # 记录最优对应参数
    best_fitness_history = []  # 记录最优适应度
    for i in range(1, max_epochs + 1):
        if i % 50 == 0:
            print('epoch:', i)
        # 计算适应度
        fitness = cal_fitness(op)  # 计算适应度
        index = np.argmax(np.array(fitness)) # 为什么已经保留了最佳适应度,最后的图还是会上下跳动
        best_fitness_history.append(fitness[index])
        best_para_history.append(op[index])
        best_ans_history.append(func(op[index]))
        op = select_op(fitness, op, select_rate, best_keep_rate)  # 选择个体,选择比例为
        # 交叉,产生后代
        new_op = cross_op(op, cross_rate, max_op_size - len(op))  # 后一个参数为需要产生的个数
        # 变异
        new_op = variation_op_10(new_op, variation_rate,low_paras,high_paras)  # 变异
        op.extend(new_op)  # 把新的个体加入群落

    plt.rcParams['font.sans-serif'] = ['SimHei']  # 用来正常显示中文标签
    plt.rcParams['axes.unicode_minus'] = False  # 用来正常显示负号
    index = np.argmax(best_ans_history)
    print('最优结果为:', best_ans_history[index])
    print('最优参数如下')
    for name,index in zip(paras_name,best_para_history[index]):
        print('{}={}'.format(name,index))
    plt.plot(best_fitness_history, label='适应度曲线变化')
    plt.legend()
    plt.show()

example

Take a monotonic function as an example:

x1 ** 2 + x2 ** (3 / 2) + x3 + x4 ** (1 / 2)

The setting range is as follows:

high_paras = [60,60,40,30]  # 参数范围上限
low_paras = [10, 21,3,10]  # 参数范围下限

Stop after two hundred iterations and output the result

最优结果为: 262.08470155055244
最优参数如下
x1=10.92155546612619
x2=22.81339324058242
x3=30.588146368196167
x4=10.573758934746433

Output the optimal fitness change curve:
 

Guess you like

Origin blog.csdn.net/weixin_60360239/article/details/130244042