Practice of Combinatorial Optimization Method Based on Deep Reinforcement Learning in Industrial Applications

There is an example in "Coordinated Method Pinghua" that was included in Chinese textbooks. It tells that "boiling water and making tea" has five processes: 1. Boiling water, 2. Washing the teapot, 3. Washing the teacup, 4. Taking tea leaves, 5. 1. Making tea, the first four processes are the prerequisites for making tea, and each process takes different time. Different process arrangements will directly affect the time to drink tea.

In the picture above, there are two sorting methods of "boiling water and making tea". It is obvious that the first method takes the least time and is the most efficient.

We still encounter many similar problems in our daily life, such as the combination of clothes, the combination of dishes, the order of housework, etc. These seemingly simple things may cause us to fall into "difficulty in choosing". In such scenarios, the permutations of several simple choice problems result in a geometrically growing collection of feasible solutions (often of enormous magnitude). And we try to find a solution or a group of solutions to achieve the goals of the highest income, the least time, and the lowest cost. This kind of problem is the combinatorial optimization problem in the optimization problem.

Tetris is actually a dynamic combinatorial optimization problem

01 Common combinatorial optimization problems

In addition to daily life, in the industrial field, the research on combinatorial optimization problems has wider application significance, involving transportation, information technology, economic management, industrial engineering, communication networks and many other fields. Here we only discuss the most common discrete combination Optimization.

Although the industries are different, they are always the same. The basic model problems of classic combinatorial optimization can basically be divided into four categories according to the different optimization operations: selection problems, allocation problems, sorting problems and mixed problems. In actual application scenarios, there are usually multi-factors and multi-operation modes. For example, both selection and sorting are often required, so the mixed problem is the most common type of real-world problem.

Common Combinatorial Optimization Problems

Transportation

The online car-hailing scheduling problem is a typical hybrid combinatorial optimization problem that combines selection (receiving orders)-allocation (passengers to drivers)-sorting (path planning). In the actual travel scenario, factors such as passenger demand, vehicle distribution, pick-up distance, and congestion make the decision-making scenario very complex. It not only needs to meet the needs of fast and efficient real-time and dynamic scheduling of drivers and passengers, but also needs to be able to take into account orders. A series of profit requirements such as income.

Network communication field

Resource allocation is one of the typical combinatorial optimization problems in the field of network communication, and it is also a mixed problem. Resource allocation refers to the allocation of limited CPU, memory, bandwidth and other resources to different users or task requirements. In today's highly diverse and complex environment of network communication requirements, fully optimize the use of service resources and maintain high-efficiency operation of communication networks , is a very important issue that can improve user experience.

Manufacturing field

There are a large number of combinatorial optimization problems in the field of manufacturing. For example, in the modern large-scale manufacturing industry, multi-variety, small-batch personalized consumption demand is replacing single-species, high-volume large-scale production. Due to the complexity of production and supply chain, how to use fewer resources and shorter Time, less inventory, and more products become one of the combinatorial optimization problems that need to be faced in the production scheduling process.

Financial investment field

The portfolio optimization of financial investment has always been one of the concerns of investors. In the modern financial market, if investors want to obtain higher returns while avoiding excessive risks, they need to invest all their assets in different securities markets and products in an appropriate proportion to achieve a certain period of time. Expected revenue maximization. "Don't put all your eggs in the same basket" is the simplest heuristic rule for the "risk minimization" indicator in the portfolio optimization problem.

02 Solving methods for traditional combinatorial optimization problems

Although combinatorial optimization problems in real production and life are often large-scale, complex constraints, and diverse objectives, if such problems can be solved efficiently, it is possible to further increase the benefits from the "decision strategy" without additional costs , which has led to the development of combinatorial optimization theory and algorithms in the continuous research of academia and industry. At present, the mainstream combinatorial optimization application solution methods mainly include the following three categories:

  1. Accurate algorithm: Represented by the branch and bound method, theoretically the exact optimal solution of the problem can be obtained, and it is usually used in combination with a specially designed heuristic method to reduce the time complexity of the solution, which is the most basic method of most solvers . The disadvantage of this type of method is that when the scale of the problem expands, the algorithm will consume a huge amount of calculation, and it is difficult to solve large-scale problems.
  2. Heuristic algorithm: Based on the optimization operation based on the rules specially designed for the scene, the approximate optimal solution of the problem can be given in much less than polynomial time, and the quality of the solution can be guaranteed within a certain range. The disadvantage is that it is difficult to design effective heuristic rules for some problems with very complex logic, and the solution performance is not stable.
  3. Meta-heuristic algorithm: Represented by group search algorithm, because it mainly depends on the design of fitness function and coding design, and does not depend on the design of the solution process, it has a miraculous effect on the optimization of some complex problems, and can obtain an approximate optimal solution to the problem , such as particle swarm optimization, simulated annealing algorithm, evolutionary algorithm, tabu search, neighborhood search, etc. The disadvantage is also that the search time for large-scale problems is slow, and the search for the optimal solution is random.

Although these three types of solving algorithms have their own advantages and applicable scenarios, their common defect is that when the scale of the problem is large, it will consume a huge amount of calculation, or it will not be able to give a sufficient optimal solution within the specified time. . However, with the emergence of large-scale combinatorial optimization problems in actual production and life, these three types of traditional solving methods have been more than enough to solve large-scale combinatorial optimization problems.

For example, in the planning and scheduling decision-making in the field of production and manufacturing, the scheduling problems in modern large-scale manufacturing are extremely huge. Taking Lenovo, the world’s No. 1 PC sales company, as an example, in the production scheduling scene of Lenovo’s largest PC manufacturing base, Hefei Lianbao Factory:

"There are two shifts per day, the average number of orders per shift is more than 5,000, and the product types are 500-600 kinds. There are 4 factories and 43 production lines with different equipment configurations to choose from. Considering the daily production schedule of the Lianbao factory , the production scheduling plan will have a total of 10 to the 190th power of the total number of plans.”

Comparing the complexity of production scheduling problem and game tree of chess games

The main data of a work order in Lianbao factory, including model, production quantity, delivery date, material preparation time, etc. In addition, production scheduling planning also requires information about production capacity, that is, production line data. Typical features include which models the current production line can produce, and the speed at which different models can be produced (number of units per hour, determined by differences in equipment and personnel) ), production line opening time, etc. Under the above problem description, this is a large-scale, multi-constraint, multi-index optimization problem. In the face of such complex production scheduling problems, traditional manufacturing has used manual production scheduling in the past. When Lianbao Factory manually scheduled production, it took up to 6 hours a day, and it was difficult to guarantee the rationality and accuracy of production scheduling. .

The Lianbao factory is undergoing an intelligent upgrade, and has almost strict requirements on the production scheduling time: push the scheduling results within 15 minutes. Considering the time before and after data processing and report generation, it takes less than 10 minutes to complete the solution. If it is handed over to the combinatorial optimization algorithm to solve the problem, the current mainstream methods are incapable of this task without exception. Therefore, in the technical This is quite a challenge.

03 Combinatorial optimization method based on deep reinforcement learning

In recent years, with the development of deep reinforcement learning technology, it has shown strong learning ability and decision-making ability in Go, robot and other fields. It not only exceeds the level of human experts in specific fields, but even has the potential to realize general artificial intelligence (AGI). Therefore, people began to use deep reinforcement learning technology to explore solving combinatorial optimization problems. This has also made the combinatorial optimization method based on deep reinforcement learning a research hotspot in recent years, and a series of related studies and cases have emerged. Therefore, Lenovo's researchers expanded their field of vision beyond the field of traditional operational optimization methods, and considered the efficient solution of large-scale production scheduling problems based on deep reinforcement learning technology.

Build a scheduling problem model

For the modeling of the production scheduling problem, it is actually a process of sorting work orders one by one, and then placing them one by one at the appropriate position on the production line, that is, a problem of changing an unordered sequence into an ordered sequence. Such a modeling method is called an allocation model in the traditional field of operational optimization, and it is a very inefficient modeling method. However, since the problem-solving method of the machine learning method is different from the traditional method, the allocation model is a new way to solve the problem.

If a deep neural network is used as a decision-making mapping model, one of the solutions is a sequence-to-sequence solution method. The researchers designed a sequence-to-sequence scheduling AI, which has such a structure: First, the input of this AI model is two parts of information: one is the production scheduling such as process data, production calendar, work order data and equipment data. information, and the second is the sequence of work orders without order and their corresponding information features. After receiving the information, the AI ​​model can rearrange the order of the work orders and insert some signs in the middle of the work orders to indicate the switching of production lines, so that the complete schedule of each work order on each production line can be fully expressed and realized. At the same time, complete the above-mentioned two-step decision-making on which production line the work order is produced on and how to determine the order of the work order on the production line.

Input-Output Relationship of Sequence-to-Sequence Scheduling Model

AI model-Pointer Network (pointer network)

The AI ​​model used by the researchers is a deep learning network called Pointer Network . Combinatorial optimization problems often involve sequential decision-making, and pointer networks are a kind of neural network that is very suitable for solving combinatorial optimization problems. Its characteristics are: 1) The input is an unordered sequence; 2) The output sequence is a reordering of the input sequence numbers; 3) It has scale generalization for similar problems. The original version of this network is used to solve the vehicle routing problem, and the production scheduling problem is similar to the vehicle routing problem in terms of model structure. If the production line is compared to a vehicle, and the work order is compared to a customer location, the production scheduling decision is actually It is the decision of vehicle path planning.

How the Pointer Network works

The researchers used pointer networks in a similar way when solving production scheduling optimization problems. Because the pointer network supports the generalization of the problem scale, the length of its output sequence is equal to the length of the input sequence, so no matter how many work orders are input, as long as it is forwarded to the network once, the corresponding production scheduling result can be obtained without any additional calculation Therefore, it is very suitable for large-scale calculations. After experiments, when this model actually handles a shift-scale production scheduling problem (5000-10000 work orders, 15 production lines), it only takes about 10 seconds to complete the calculation. But the next important question is: any network must be trained to have this ability, so how to train the network? This is actually a twofold issue:

1) Because it is almost impossible to get the optimal solution as an answer to the real-scale production scheduling problem, generally speaking, it is impossible to search for the best solution in the case of 10 to the power of 190 and then use it as an example and give it to the network for training, so it is Network training cannot be performed through a supervised learning (Supervised learning) algorithm;

2) The Lianbao factory has two shifts of data per day. During the two years of the project, it can only collect a maximum of 700 data. For the deep learning network, such a data scale is far from supporting training.

Applications of Reinforcement Learning Techniques

At this time, Reinforcement Learning (RL) is needed to complete this challenge. It should be emphasized that the concept of reinforcement learning does not only refer to a class of algorithms, but a large class of learning problems or tasks that actually exist in real life: we do not necessarily have to know "how to Doing", more situations are driven by the limited reward signal (reward) after the behavior ends to drive us to learn. Therefore, to be able to use reinforcement learning technology to solve problems reasonably, the prerequisite is whether the problem is essentially a reinforcement learning problem.

Compared with supervised learning, reinforcement learning does not necessarily require specific behavioral label signals. Any tentative behavior can be tried, and positive reward signals are used to strengthen good behaviors, and negative reward signals are used to weaken bad behaviors. , to achieve self-learning through such a mechanism. Regarding the successful case of reinforcement learning, the most famous one is AlphaGo, the Go AI project of the Google DeepMind research group. The production scheduling optimization AI uses the same training mode as AlphaGo, and constructs similar reinforcement learning problems to solve.

The analogy between intelligent scheduling and Go AI in reinforcement learning methods

The actual training data obtained and the real data supporting deep learning are not enough, but one feature of production scheduling is that it is easy to simulate. Since the production process does not involve dynamics, there is no need to solve complex differential equations. Therefore, by analyzing work orders and Distribution fitting of various parameters of the production line can generate a lot of virtual data, and then perform virtual scheduling interaction on the basis of simulation, so that any amount of data can be generated to train the deep network. In actual development, the virtual data used in each training round is 10 to the 6th power level, and the data is regenerated in each round. The real data is only used when fitting the distribution, but not directly Get involved in training. So researchers borrowed this very extravagant method to use data for training, ensuring the optimality and generalization of production scheduling AI.

For a variety of constraints and a large number of constraints, the researchers set a mask mechanism at the output of the pointer network and use tensor operations to realize the constraint of the policy space. In the output decoding process of the pointer network, the mask mechanism will judge according to the current state whether any work order to be scheduled is arranged at the specified position of the production line, whether the constraint will be triggered, and use a 0-1 mask to control whether the work order can be arranged on the production line. For example, by subtracting the number of machines in the work order from the remaining time of the production line, and then dividing by the production efficiency of the production line, we can get a judgment on whether the production line has enough time to complete the order.

As in the example in the figure below, under the action of the mask mechanism and the softmax function, the decoding process outputs a sequence of 0-5-1-0-0-3-2-0-4-0, that is to say, the working Orders 5 and 1 are arranged on the L1 production line, work orders 3 and 4 are arranged on the L3 production line, and L4 produces work order No. 4. By designing the mask to implement constraints, the number of input features is greatly reduced, so the data size is also reduced. Lianbao production scheduling problem actually only uses a 11G video memory of a 2080 graphics card to complete the training.

The principle of Mask mechanism for constraint processing

apply effects

After the deployment of the AI ​​scheduling algorithm in the Lianbao factory, according to the test results, the production scheduling time has been reduced from 6 hours a day to less than 5 minutes, the output of the first shift has increased by 23%, and the backlog of production orders has decreased by 20%. At the same time, the number of orders completed on the same day increased to 3.5 times the original, and the set goal was successfully achieved. In the subsequent actual implementation stage, the researchers continued to upgrade the practical functions for the AI ​​production scheduling project: for example, reducing the workload of the production scheduler to manually adjust the scheduling results, completing all fixture restriction logic, and completing KPI integration with the manufacturing and quality departments etc., and also added the business function of multi-factory joint production scheduling.

04 Summary and Outlook

Since many combinatorial optimization problems are NP-hard problems, the current traditional combinatorial optimization methods are difficult to solve accurately. Due to the limitation of algorithm capabilities, they can only make compromises and trade-offs in terms of solution speed, solution performance and function realization within a limited range. . Especially in the face of large-scale and high-complexity combinatorial optimization problems, it is difficult for traditional methods to obtain the optimal solution within an acceptable time.

The method of solving combinatorial optimization problems based on deep reinforcement learning has the advantages of fast solution speed and strong model generalization ability, and provides a new way of thinking for solving combinatorial optimization problems. The example of solving combinatorial optimization problems is one of the application cases of combinatorial optimization methods based on deep reinforcement learning in recent years.

For millions of years, the emergence of new technologies and inventions has made people's social life more and more complicated, which has promoted the rise and development of operations research. As a part of operations research, combinatorial optimization methods have related research theories, Methods and models also appeared and developed rapidly and widely used. With the rapid development of science and technology, with the emergence of various large-scale and complex combinatorial optimization problems, the combinatorial optimization method based on deep reinforcement learning has not only become one of the research hotspots in recent years, but also will become a potential in the future. research direction.

(Follow our WeChat public account: ML OR intelligent decision-making. Share more dry goods, welcome to communicate~)

Guess you like

Origin blog.csdn.net/mlorworld/article/details/125973902