Academic Achievements of Shangjian Intelligence|Lenovo Schedules Laptop Manufacturing Using Deep Reinforcement Learning Based on Deep Reinforcement Learning

insert image description here

For more information, please pay attention to the official account above!

  This paper, as the latest research result of Shangjian Intelligence , was published in the top journal of operations research "INFORMS JOURNAL ON APPLIED ANALYTICS". It is the first time that deep reinforcement learning has been applied to large-scale manufacturing scheduling scenarios . This advanced scheduling project was shortlisted by the international authority on operations research The Franz Edelman Award, the highest award for the application of INFORMS operations research , has been widely reported by People's Daily and other media as a typical case of technological transformation of manufacturing enterprises.

The first author, Liang Yi, CEO and CTO of Shangjian Intelligent, an expert in artificial intelligence and operational optimization algorithms, has a bachelor's degree in physics from Zhu Kezhen College, Zhejiang University, a master's degree in theoretical physics from McMaster, and a doctorate in high-energy physics from the University of Alberta. He is a postdoctoral fellow at the University of Chinese Academy of Sciences. He has published more than ten papers in the fields of high-energy physics and artificial intelligence, with an average citation rate of >15. He used to be the chief algorithm researcher of AI lab of Lenovo Research Institute, focusing on the application of artificial intelligence in manufacturing.

Summary

  Lenovo Research Institute cooperated with members of the operation group of Lianbao Technology LCFC, Lenovo's largest computer manufacturing factory, to replace traditional manual production scheduling with a decision support platform based on deep reinforcement learning architecture. The system can schedule the production orders of all 43 assembly and manufacturing lines in the factory, balance the relative priorities of output, replacement cost and order delivery rate , and solve the multi-objective scheduling problem by using the deep reinforcement learning model. The method combines high computational efficiency with a novel masking mechanism to guarantee running constraints, thereby avoiding machine learning models wasting time exploring infeasible solutions. By using this new model, the original production management process has been changed, resulting in a 20% reduction in production order backlog and a 23% increase in delivery rate . It also shortened the entire scheduling process from 6 hours to 30 minutes , while retaining the flexibility of multiple targets, enabling the factory to quickly adjust to changing targets. The research work boosted the plant's revenue by $1.91 billion in 2019 and $2.69 billion in 2022.

background introduction

  Lenovo's Hefei factory LCFC is Lenovo's largest computer manufacturing plant. It has 4 manufacturing plants and 43 assembly lines. On average, it receives about 5,000 computer orders every day , accounting for more than half of Lenovo's computer production and at least one-eighth of the world's computers. These computers contain more than 20 product series and 550 product models . Prior to production, these orders are broken down into production work orders (MOs), where an order can contain thousands of computers, each with the same model number and similar promised ship date.

  The computer production process can be roughly divided into three stages:

  • The first stage: the production of the main board is in charge of the surface assembly technology workshop. At this stage, production is mainly based on automatic execution, with high stability and no need for human intervention;
  • The component shop completes the second stage of production, with workers attaching the laptop's casing to the monitor and keyboard;
  • The third stage, the assembly stage, assembles the internal components of the laptop. This stage is the most time-consuming and unstable, requiring a lot of manual intervention, so the efficiency of this stage is usually the bottleneck of the entire manufacturing process .

  In the third stage, semi-finished products and spare parts are allocated to 43 production lines according to work orders. On these production lines, workers process each work order in turn, that is, the next work order can only start after the current work order is assembled. Assembly efficiency for a particular computer model may vary depending on the production line assigned. The hourly output (Unit-per-hour, UPH) matrix expresses the corresponding efficiency of products and production lines. UPH is susceptible to fluctuations in employee attendance, machine status on the production line, and availability of tools and materials. Each work order corresponds to a workpiece. As shown in Figure 1, work order 4 moves from production line B to production line A. Because the UPH becomes larger, the production time becomes shorter. Moreover, the sequencing of work orders on each production line can significantly affect the total production time.

  When the production line is switched to produce different models, it will lead to the cost of changing models, and the scheduling performance can be improved through reasonable work order assignment. The optimization problem is computationally intractable considering the number of production lines and the amount of dispatched orders. Therefore, the management of the assembly section in the third stage is the focus and the most challenging part of the production management of all Lenovo factories.

Traditional approaches cannot address existing challenges

  At Lenovo, production scheduling based on human experience and judgment required hours of work. Today's modern manufacturing companies are under enormous pressure due to the fluctuating supply of production resources. Therefore, Lenovo needs a production management system with the following characteristics:

  1. It can solve large-scale scheduling problems . For an enterprise like Lenovo with increasingly complex production, a factory must be able to process up to tens of thousands of orders every day;
  2. Quick response ability . The volatility on the supply side requires the production scheduling system to respond quickly to changes in component supply. Lenovo's previous scheduling process was based on the experience and judgment of planners, which could not respond to changes in the supply side in a timely and adequate manner;
  3. Better KPI performance . It can simultaneously optimize the total output, order delivery rate, changeover cost, etc.;
  4. Flexible configuration capabilities for multi-criteria optimization targets . Freed from mechanical work, planners have more time for strategic work. They can actively participate in the decision-making process by interacting with the system; for example, they can configure KPI thresholds and set relative priorities (weights) for optimization goals. This is critical to building planners' trust in the system, increasing their job satisfaction through this workflow, and increasing the efficiency of the scheduling process.

  Traditional methods are difficult to meet such demands. Traditional methods are divided into exact methods and approximate methods . Exact methods such as branch-and-bound and cutting plane methods, which pursue global optimal solutions, are limited to solving small-scale problems. To solve large-scale problems, traditional solution developers seek approximate optimal solutions through rule-based or heuristic approaches. However, some approximation methods such as tabu search/path relinking perform well on small and medium-sized problem sets, but are often too slow for large-scale problems to meet the needs of fast response times. Other approximation methods that can solve both large and small scale problems in a reasonable amount of time usually do not perform well in terms of KPI optimization. The conflict between response speed and solution quality is more pronounced when dealing with multi-objective optimization problems than when using traditional methods. To sum up, these deficiencies of traditional methods have brought considerable challenges to Lenovo's supply chain management.

solution

  To address these challenges, the **production line planning problem (PLPP)** problem is modeled as a Markov decision process (MDP).

  Suppose a factory has KKK production lines andNNN work orders, the MDP corresponding to the production scheduling problem can be expressed as{ X t , A , P , R } \left\{\mathbf{X}_{\mathbf{t}}, \mathbf{A}, \ mathbf{P}, \mathbf{R}\right\}{ Xt,A,P,R}

in:

X t \mathbf{X}_{\mathbf{t}} Xt: each event ttThe state set of t is composed of a series of vectorsX t = { xti } \boldsymbol{X}_t=\left\{\boldsymbol{x}_t^i\right\}Xt={ xti} x t i \boldsymbol{x}_t^i xtiis a set of description input iiCharacteristics of the i state. In PLPP,xti \boldsymbol{x}_t^ixtiIndicates work order iiA snapshot of i- series, model, quantity, UPH and remaining capacity for each line.

A \mathbf{A} A : Action collection. A \mathbf{A}can be directlyA is equivalent to the policy function P in MDP( y ∣ x ) P(\boldsymbol{y} \mid \boldsymbol{x})P ( andx ), wherex \boldsymbol{x}xy \boldsymbol{y}y represents the encoder and decoder states, respectively. P ( . ∣ . ) P(. \mid .)P(.. ) is the conditional probability. According to the chain rule, given the initial statex 0 \boldsymbol{x_0}x0, the process of obtaining a complete solution based on the sequential decision-making model is as follows:
P ( y ∣ x 0 ) = ∏ t = 0 NP ( yt + 1 ∣ yt , xt ) P\left(\boldsymbol{y} \mid \boldsymbol{x} _0\right)=\prod_{t=0}^NP\left(\boldsymbol{y}_{t+1} \mid \boldsymbol{y}_{t}, \boldsymbol{x}_t\right)P(yx0)=t=0NP(yt+1yt,xt)

P \mathbf{P}P : State transition probability function. For this problem, the state transitionP ( y ∣ x ) P(\boldsymbol{y} \mid \boldsymbol{x})P ( andx )is deterministic, so there are no random state transitions.

R\mathbf{R}R : collection of reward functions. r ( y ) ∈ R r(\boldsymbol{y}) \in \mathbf{R}r(y)R is the system transition to statey \boldsymbol{y}Reward function value for y . For multi-objective optimization problems,r ( y ) r(\boldsymbol{y})r ( y ) can be defined as a vector containing weighted values ​​of multiple production indicators.

  In the MDP expression, a solution is the sequence of work orders assigned to each production line. A near-optimal solution to a problem is obtained by using a machine learning model that learns to increase the probability of generating the desired sequence through a reinforcement learning (RL) framework.

  The production scheduling task can be seen as learning how to arrange the order order, that is, given an initial order, output a new sorting result, so the sequence-to-sequence model (sequence-to-sequence, S2S) can be considered .

  As we all know, a typical S2S model includes an encoder and a decoder , the encoder learns how to encode an input sequence into a fixed-size vector and sends it to the decoder, and the decoder learns to convert this vector back to an output sequence. In our problem, the input to the encoder is an initial sequence of tickets, and the decoder generates an optimized sequence of tickets. The output sequence is the arrangement of the work order index and the separation mark . The index mark from the first position to the first mark corresponds to the work order assigned to production line 1, and the index between the first mark and the second mark indicates the corresponding work order Assigned to the second production line, and so on, as shown in the figure below.

  The encoder network iteratively converts the input sequence into a high-dimensional tensor. The decoder network generates a probability distribution for selecting each MO through an attention mechanism.

  Once well trained, the model keeps its learned parameters and quickly generates optimized sequences. This provides a computational time advantage over traditional OR methods. In our model, the running time does not increase exponentially with the problem size, which enables the model to be trained on relatively small problems and applied to larger problems.

  The input of the model includes order-related and factory-related information. The order-related information includes the required product quantity, product series and product ID of each order in the plan . Relevant aspects of the factory include the number of production lines, the production efficiency of each model on each production line, the cost of switching between each pair of production models, and the manufacturing rules .

  Order information and corresponding production information, including machine availability status (e.g., whether a machine is available for production, under maintenance, or repair), UPH, and production calendar, are combined into the MO unit in the system.

  We call the above model Encoder Enhanced Pointer Network (EEPN) . This model is trained by reinforcement learning to optimize the plan by reordering the input MO sequences and inserting markers (white cubes) to indicate the positions of two adjacent lines.

Improve model expression ability

  Many key processes for optimizing production scheduling (e.g., switching cost calculation, production line selection) are difficult to learn for models using previous deep reinforcement learning methods. These operations are highly nonlinear. Therefore, simple network structures cannot be modeled well. By upgrading the traditional encoder to a two-layer nonlinear convolutional neural network . With the improved information abstraction ability, EEPN utilizes the captured problem structure to obtain high-quality production scheduling solutions immediately after training.

Masking mechanism for complex constraints

  Considering the scale of LCFC production scheduling, it is challenging to generate a good production plan in such a large-scale and complex production system. At the same time, scheduling must follow complex rules as constraints. Below we list the four most important constraints:

  1. Production time : The production time for each order cannot exceed its predefined time window, which is the intersection of the earliest start time and the available time for all shifts. Each shift has a configured total time, which includes worker breaks and shift handover times;
  2. Production quantity : When a product model requires special equipment, its total production quantity within a specified time may be limited (for example, up to 200 units per two hours). Once the limit is reached, the model will cease production until the end of the specified duration, which facilitates quality control;
  3. Allocated production lines : Each order can only be allocated to a production line that has the ability and capacity to handle the corresponding model. Additionally, some models can only be produced on a fixed number of lines in a given shift due to limitations in the number of fixtures (i.e., equipment dedicated to constraining PCs during production).
  4. Related : Some orders are marked as related, indicating that these orders must be fulfilled at the same factory within the specified time frame.

  These constraints are associated with order, production line, time and quantity, and the number of constraints may exceed 1 0 6 10^6106

  In EEPN, these constraints are addressed by introducing a new masking mechanism . The core technology of the masking mechanism is the controllable masking tensor (ie multi-dimensional matrix). Each element in the mask tensor can be thought of as a gate that controls whether placement of an order at a particular location on a particular line is feasible. At each optimization time step where the model processes an order, if placing the order on the line does not violate any constraints, the gate is opened; otherwise the gate is closed.

  Therefore, EEPN only selects the orders that open the door and puts some of them on the production line according to the time step .

  As shown in the figure above, a combined mask consists of several sub-masks combined by logical addition, each sub-mask representing a constraint. The masking mechanism considers multiple constraints simultaneously during solution generation and excludes infeasible solutions, which greatly reduces the computation time for model training.

fast model training

  During the algorithm testing phase, the impact on runtime of AI benchmark scheduling with and without masking and including various problem sizes was evaluated.

  The results show a slight increase in run time for tests using the masking mechanism compared to tests not using this mechanism. As the problem size increases, the run time growth rate for both tests is roughly the same, resulting in a linear increase in run time for the larger problem . Although the masking mechanism leads to an increase in computation time in optimization solving, it significantly reduces training time through effective constraint enforcement. Furthermore, for larger problems, the masking mechanism does not lead to a significant increase in model runtime.

Configure multi-objective scheduling optimization

  In each scheduling run, EEPN should simultaneously generate a set of solutions under different target priorities. When given a set of target priorities, the decision maker should be able to flexibly configure the preference weight of each target and intuitively choose the desired optimal solution.

  Therefore, one idea is to update EEPN to be able to learn optimal scheduling policies for different priority sets in multi-objective scenarios.

  This can be done by using various target preference weights as additional input data for machine learning models.

  According to previous research, this objective requires the design of multiple EEPN instances, each of which is responsible for completing the optimization under a specific set of objective function priorities. However, this method is very time-consuming and requires a lot of computing resources.

  Instead, Lenovo's research team decided to use a single EEPN to achieve this goal. The multi-objective version of EEPN takes as input the objective function criterion priorities (i.e. preference weights). Therefore, EEPN continuously learns various combinations of object priorities and scheduling data in a time-varying environment.

  Using the same scheduling data, EEPN can quickly generate optimal scheduling results in each case if the configured target priorities are different. Using this learning-based approach, the algorithm successfully solves the multi-objective optimization problem.

in conclusion

  To sum up, the EEPN framework developed by Lenovo and tested by LCFC for intelligent scheduling through OR and AI has been proven to improve efficiency, increase revenue, save human capital, and protect the environment. Such solutions have enormous potential to go on to solve some of the most complex problems facing business and society.

  The solution was not only implemented in the Lianbao factory, but also migrated and tested in the production scenarios of other Lenovo internal factories such as Shenzhen and Huiyang factories. The results of the POC stage showed that the KPIs of the two factories have substantially improved . In addition to the PC industry , this solution is also applicable to the mobile phone industry, semiconductor industry, and discrete machining industry , although from the perspective of OR, the production scheduling problem of these industries may be different from the PLPP of the Lianbao factory, because each factory Has its own set of production processes and KPI preferences, but it can easily adapt to these differences by modifying the masking mechanism and setting the objective function.

Guess you like

Origin blog.csdn.net/hba646333407/article/details/128529557