RL+CO survey :Reinforcement Learning for Combinatorial Optimization: A Survey

N. Mazyavkina, S. Sviridov, S. Ivanov, and E. Burnaev, ‘Reinforcement Learning for Combinatorial Optimization: A Survey’. arXiv, Dec. 24, 2020. Accessed: Jul. 19, 2023. [Online]. Available: http://arxiv.org/abs/2003.03600

Table of contents

First of all, what is the CO problem?

Why use RL to solve combinatorial optimization (CO) problems?

How to use RL to solve the CO problem, its general basic idea?


This paper mainly talks about some basic ideas of applying RL to solve combinatorial optimization problems (combinatorial optimization, CO) and the progress of RL solution algorithms for several classic problems. This blog aims to summarize the thoughts in the first half, and temporarily ignore the progress of the classic problems in the following articles.

First of all, what is the CO problem?

Simple mathematical explanation, for a function f has a feasible region V, it is necessary to find a set of elements in V to make the function value f optimal. Note that V is a finite set. In human terms: When seeking the optimal solution to a problem, its decision variable is discrete, usually an integer or a binary variable (binary variable), for mixed integer programming, that is, there are both continuous variables and discrete variables in the decision , is also a CO problem.

Why use RL to solve combinatorial optimization (CO) problems?

Generally, CO problems are NP-hard, so no optimal solution can be found within the time polynomial. The traditional method is generally to design some heuristic algorithms to find suboptimal solutions to the CO problem. However, this requires manual design of heuristic algorithms, and it also involves a lot of parameter design. RL proposes a good alternative method , to automate the heuristic search process by training agents in a supervised or self-supervised manner.

Note: I personally think that the advantage of the RL algorithm to solve CO is that RL trains a strategy rather than a solution, and is less sensitive to the disturbance of the original problem, that is, even if a small part of the data of the original problem changes slightly, RL can still find it. A relatively good solution, and the heuristic generally has to be recalculated from scratch.

How to use RL to solve the CO problem, its general basic idea?

A general procedure for solving CO problems with RL:

1. Model the CO problem as an MDP problem, that is, you need to define the state, action, and reward of the problem.


 state: The state space of a CO problem is usually defined in one of two ways. One set of methods incrementally constructs solutions , defined as a collection of partial solutions to a problem (eg, partial construction methods for the TSP problem). Another group of methods start with a suboptimal solution to a problem and iteratively improve it (eg, TSP's suboptimal tour).

action: action means adding part or changing the complete solution (for example, changing the tour order of nodes in TSP) 

reward: The reward indicates how the chosen action in a particular state improves or worsens the solution of the problem (i.e., one run length of TSP).

transition function: In combinatorial optimization, transition dynamics are usually deterministic and known in advance.


2. The encoder of states needs to be defined. That is, a parametric function that encodes the input state and outputs a numeric vector (Q-value or probability for each action).

state is to represent some information about the problem, such as a given graph or the current trip of TSP, while Q value or action is a number. Therefore, reinforcement learning algorithms must include an encoder, a function that encodes a state into a number. Many encoders have been proposed for CO problems, including recurrent neural networks RNNs, graph neural networks GNNs, attention-based networks, and multi-layer perceptrons.

3. Let the agents learn the parameters of the encoder and make decisions for the MDP.

4. After the agent chooses an action, the environment moves to a new state and the agent receives a reward for the action it took.

The paper describes the RL application progress of these classic problems Traveling Salesman Problem (TSP), Maximum Cut (Max-Cut) problem, Maximum Independent Set (MIS), Minimum Vertex Cover (MVC), Bin Packing Problem (BPP), there are Those interested can read the original text.

Guess you like

Origin blog.csdn.net/qq_38480311/article/details/131822858