Summary of 2021 Huawei Software Elite Challenge


Preface

Insert picture description here

With the end of the preliminary round of the competition, our journey came to an end. Although we did not make it to the semi-finals, we also achieved the 52nd result in the Hangzhou-Xiamen Division (up to three places after checking the duplicates), which was also regarded as the top 64 in the regional competition. Although it is a bit regretful, it is not too bad as a sophomore to participate in this kind of competition for the first time.
Insert picture description here

In this competition, I learned a lot, both in code writing and way of thinking, all have been greatly improved. At the same time, I also realized my shortcomings and the gap between myself and those big guys. In short, I have a lot of thoughts, so this article can be regarded as a review and summary of the experience of this game.

1. Competition questions

The question for this competition is server resource allocation and scheduling in the context of cloud computing. For details, please see the official website . I will also put the contest files in my gitee together with my code, as a reference for the latecomers.

2. Competition review

Here, I mainly share my game process. (The idea of ​​the question will be discussed in the thinking process)

1. Team up

Before the question was released, there was a Zhejiang University doctor who was looking for a teammate in the competition group, and I just met her requirements (mainly because she was not good at writing code), so I chatted with her privately, and later I was considered a team. The original agreement After the contest question came out, we discussed and analyzed together, but every time she sent a message to her, she didn't reply much, saying that it was too busy (maybe it was really too busy), and finally she decided not to participate in the contest a week later.

Fortunately, I didn't wait for this period of time. I wrote my own analysis and thinking of the competition.
Insert picture description here

Insert picture description here

Then I asked in the competition group if there were any missing teammates. Someone sent me an invitation shortly afterwards, which is my current teammate.
Insert picture description here

2. Meeting to discuss

When the team was formed, a week had passed, and there was only a little more than a week before the official match. On the evening of the formation of the team, we held a meeting to briefly discuss the next question. But because the languages ​​used by each are not the same, and the other two teammates and I are not from the same school (the other two teammates know each other), it is not convenient to communicate with each other, so I finally decided to write our own. Code.

Of course, there were several meetings in succession in the next few days, so I won't talk about it here.

3. Code modification iteration, bug fix

Program according to the code idea (I will explain the idea of ​​the contest question in the thinking process). After all the hardships, I finally wrote a version of the baseline that can be submitted, and then continuously optimizes based on the baseline. Of course, there will be various things in this process. Various bugs, and various unknown difficulties.

4. The struggle of the last few days

To be honest, if you don’t have teammates, you will probably not be able to persist in the last few days, because there are constantly rushing up on the leaderboard. At this time, you will be under great pressure, especially when your code encounters a variety of This feeling is especially obvious when the bug is at a loss. You will be very anxious at this stage, and so am I, I really almost gave up many times. (In fact, many teams gave up halfway through)

In a week of high-intensity optimization, the last few days will be very exhausted. This is very obvious to me on the last day, because we are only 20 million away from 32, but our plans are frustrated and want to enter 32. It was very uncomfortable to not get in again. I really wanted to give up at that time.

Third, the thinking process

Here I roughly record the thinking process at that time.

1. Preliminary thinking

At the beginning of the analysis of the competition problem, I regarded it as a 01 knapsack problem and wanted to use dynamic programming to solve it. However, after careful consideration, I found that this is different from the usual 01 knapsack problem. It is multi-dimensional and has various complicated constraints. After consulting the relevant information, the dynamic programming solution was finally rejected.
Because the optimal state of each stage cannot be directly obtained from one or some states of a previous stage.
For details, please refer to this blog dynamic programming (DP) popular explanation .

After negating the dynamic planning, I decided to change my perspective, not from the perspective of virtual machines to choose which server to put, but from the point of view of the server, to think about which virtual machine to put.

According to the question, I divided it into three steps: purchase, migration, and deployment.

And each step only needs to do the following:
1. Purchase: spend as little as possible when the request of the day is met
2. Migration: The purpose is to integrate resources as much as possible to make the server free, and at the same time it can be better installed The next server
3. Deployment: As much as possible to deploy the virtual machine request of the day, use existing resources as much as possible

How can it be done?
My initial idea is to migrate and integrate resources first, and then use the current resources as much as possible for the initial deployment. When there is a request for a virtual machine that cannot be accommodated, I will purchase a server and then deploy the newly purchased server.

At the same time, I sort the requests from largest to smallest (dual nodes first, resources first), so as to fill the server as much as possible.

2. The first version of the code

With a general idea, I decided to write a version of the code first according to the basic idea. Of course, this process encountered many difficulties and encountered many bugs, but in the end I wrote a version. This process took about two days. Although the first version came out (actually it could not be submitted because of various reasons, such as request order issues, output format issues), the local test results were not ideal.

3. Idea improvement

① Balanced deployment and unbalanced deployment

On the first version of the code, I used small-scale data for debugging and analysis, and found a problem, that is, the server resource utilization is very unbalanced, and some virtual machine cores are even more than 100 or less than 0.01 (this is purely disgusting) , In reality, how can there be a server with 1000 cores and a few memory)
Insert picture description here

As a result, when deploying virtual machines, server resources are often wasted due to these extreme virtual machines.

In order to solve this problem, I thought of a solution, that is-balanced deployment.
The so-called balanced deployment is an improvement strategy for the above situation. It is to perform a balance check before deployment. If this virtual machine is inserted into this server, it will cause the server's core memory ratio to be too high or too low (ie imbalance), then Reject this insertion.

The specific imbalance standard is as follows: if the remaining resources after insertion are less than a certain value, then there is no need to judge the kernel memory ratio (because it is meaningless) and pass directly; otherwise, the kernel memory ratio judgment is performed, if the remaining resources are more than the kernel memory ratio If it is less than or greater than a certain value, that is, there is an imbalance (as shown in the figure above), then the insertion is rejected.

At the same time, after the balanced deployment, that is, after the request of the day tries to balance the deployment to this server, perform an unbalanced deployment (unbalanced detection), so as to ensure that the appropriate virtual machine is inserted into the appropriate server to the greatest extent possible, but also Use resources as much as possible.

After this improvement, the resource utilization rate has been significantly improved. Some servers (generally server cores, memory are around 500), the remaining resources even reach 1 or 2.
Insert picture description here


②Dynamic update of strategy

But at the same time, I also found a problem. After debugging, I found that the utilization rate of the first few servers was very high (as shown in the figure above), but the utilization rate of the next few servers showed a cliff-like decline.
Insert picture description here

Reason: The guess is that the virtual machine requests are not evenly distributed every day and the request (kernel/memory) gap of the day is too large.

This is a problem when buying a server, what should I do?

I thought of a method (and one of my core ideas), which is to update the strategy dynamically.
The so-called dynamic update of strategy means that I dynamically adjust the purchase strategy and deployment strategy according to daily requests, which is embodied in the update of the balance factor and balance boundary.

Specific approach: There is an updateStrategy method in my program, which means to update the status. What it does is to count the average core and average memory of the remaining requests on the day, update the balance factor and balance boundary according to its value, and then adjust the purchase strategy and balance deployment strategy . The balance factor is the ratio of the average kernel to the memory, and the balance boundary is its addition and multiplying by a coefficient.
Insert picture description here

At the same time, in order to improve resource utilization, I changed the choice of purchasing a server to select the server with the closest kernel memory ratio based on the current remaining requests.

After this improvement, the server utilization rate has been greatly improved. Except for the last one or two servers, the utilization rate of other servers is very high. The remaining resources are generally between 10-20, and some even only 1 , 2 resources.

After some improvements (of course there are many improvements in details, I won't repeat them here), our version without migration ran to 1.19 billion in the practice stage and 1.53 billion in the official competition.
Insert picture description here

Insert picture description here


③Migration optimization

As for the migration, based on the previous ideas, I quickly formulated a migration strategy, that is, first migrate the virtual machines that were previously unbalanced deployment successfully, and try to balance deployment to other servers (of course I added some judgments, such as the non-balanced deployment previously recorded). If the server is already balanced, you don’t need to migrate it, so that the server can be rebalanced and the server can be loaded with more virtual machines.

If there are more migration times, perform all migrations, that is, migrate from a server with fewer virtual machines to a server with more virtual machines. This will not only consolidate resources, put more virtual machines, but also save unnecessary energy consumption.

However, this migration will have an obvious problem-the time complexity is too high, each migration takes about 0.6s, if the data set of the practice match is about to run 500+s, this is fatal to the 90s limit.

So I optimized the code, mainly in the following points:
1. Outer loop optimization: timely exit or skip unnecessary loops, and pruning for some situations
2. Data structure optimization: use a special data structure , Try to reduce memory and time consumption
3. Internal operation optimization: optimize internal operations, such as deployment operations, directly judge and return for some unnecessary situations
4. Code details optimization: such as variable declarations outside the loop, etc. (of course this is I have optimized it a long time ago)

After the above optimization, the running time of my code is directly optimized to around 20s. This is an amazing optimization. I never thought that my code could be optimized so much before.

However, due to the significant increase in online data sets, running locally for more than 20 seconds and online still time out, but I can only reduce the number of migration operations, from triggering all migrations every day to only triggering all migration operations when the delete request is greater than the increase request .

In the end, our score came to 1.49 billion. After tuning, the result finally came to 1.48 billion, and this was our best result.
Insert picture description here

④ The final optimization attempt

At this time, only two days have passed since the end of the official competition. After various optimizations, we found that it is difficult for us to reduce costs in migration and deployment operations (migration operations are mainly due to running time).

So I turned my attention to the purchase operation, because my purchase operation was selected based on the current remaining requested kernel memory ratio without considering the cost. Although the utilization rate was high, the cost could not be lowered, so I wanted to The purchase cost cannot be taken into consideration (I wrote a version that considered cost-effectiveness, but the result was not ideal and the utilization rate was very low).

Special handling for special circumstances

One night before the end of the game, we communicated with a big guy. He said that my solution would be better for more balanced server virtual machines, but for some more extreme virtual machines and servers, he suffered a lot. He gave us The suggestion is to take out this type of virtual machine request for additional processing, and use the corresponding server for deployment.

Indeed, my plan is to choose the server according to the current virtual machine kernel memory ratio as much as possible, but the problem is that after counting the current remaining virtual machine requests, the kernel memory ratio will be close to 1 (about 0.8-1.2). Even if there are virtual machines with exaggerated ratios, these characteristics are neutralized by other extreme virtual machines or diluted by virtual machines with a relatively balanced ratio. It is difficult for the program to recognize these characteristics, so the program is generally Will buy the kind of server with a core memory ratio close to 1, and this kind of server is often not cost-effective.
Insert picture description here

Only when the deployment requests continue to decrease and the characteristics of the remaining requests continue to be highlighted, then the program will buy the server with a larger core memory ratio, so my purchase cost will not be able to come down.

Separate processing is indeed a good method. It does not mean that there is a lot of waste of resources (because my resource utilization is not low), but that if the kernel memory is compared with the more exaggerated server to install the kernel memory than the same exaggerated virtual Machine, so the cost will be lower.

So early in the morning the next day I started to change the code, but I gradually found that if I want to change it, one morning and one afternoon is not enough (because this idea will involve the change of my deployment strategy, and my deployment strategy is The other two steps are closely related and are the core of the entire code. It is more difficult to change and debugging errors. At that time, I didn’t know whether this change would reduce the cost or not. I changed it all morning and finally decided. Give up this change of thinking.

Overlay to select the optimal solution

Since our bottleneck is that we did not consider cost when we purchased, I tried to incorporate cost. So I came up with another plan, the specific plan is as follows:
I superimpose the requests in order, and after each superimposition, I will look for the server with the lowest cost (cost + energy consumption * remaining days) that can install the superimposed virtual machine. Type, record the difference between the current resource and the currently selected server resource, and then continue to superimpose until it can't find a server that can hold these requests. In this process, the server type with the smallest resource difference during the superimposition process is left. , And this server is the optimal solution in this case.
Insert picture description here

However, the cost of this strategy is similar to the cost of my previous strategy. Of course, this does not mean that this solution is not good. It has improvements. For example, the minimum difference in resources is not the standard for the best server type. The judgment strategy can be changed; For example, it is not necessary to superimpose the virtual machines in order, etc., but there was not much time left at the time, so I had to give up. But I think this scheme is quite clever.

Four, summary of ideas

In this question, I think I have two strategies that I do better, one is balanced deployment, and the other is strategy dynamic update.

Imagine a river in front of you with pits (servers) of different sizes, a pile of stones of different sizes (virtual machine request) rolling down from above, if the size is appropriate (balanced deployment), then the stones will roll in In the pit, after rolling over again, except for the last few pits, the other pits will almost be filled. And every time the stone rolls down, the pit will change its shape (dynamic update of the strategy), and respond to the falling stone in a targeted manner.

Of course, there is also an imperfect purchase strategy, which is to superimpose and select the optimal solution. This is also a method worth considering.
(See the thinking process section for details)

What I did not do well in this competition is that I did not consider the price/performance ratio in terms of purchasing strategy, and lacked consideration of extreme situations. And this is why we cannot go further beyond 32.

In addition to the lack of thinking, I also have drawbacks in playing games with Java. This is reflected in the fact that I cannot migrate all of them. I can only abandon part of the migration because of the running time. If there is no running time limit, Then we should still be able to advance another 10 million to 20 million.

Five, the bug road is long

After optimizing this route, I encountered a lot of bugs, which is why our team is called all bugs. Alas, sad all the way, I can only sigh a bug, the road is long!

The following records a few of my experiences of finding bugs

1. Endless loop

At that time, the code encountered a timeout problem. I optimized for a long time, optimized some logic, deleted unnecessary code, and optimized the code from 700+ lines to 380+ lines. But the problem is still unresolved.

The following is my bug record at the time:

Insert picture description here

2. Server overrun

I often encounter this bug during the submission process.
Insert picture description here

This is generally a matter of the order of requests.
The following is my bug record at the time:
Insert picture description here

At the time, I thought that my logic for processing the request was wrong, so I should use sequential processing to process the request, but the result was still that the virtual machine resources exceeded the limit.
Insert picture description here

Insert picture description here
Insert picture description here

Insert picture description here

Insert picture description here
Insert picture description here

The last bug turned out to be just because of a number! ! ! Just because of a 1! ! !
oh, my god!

6. Suggestions for latecomers

If you want to participate in Huaduan in the future, based on my experience in the game, I will give you the following suggestions:
1. You must find a good teammate, don’t choose the one who gives up in the middle. Many times teammates can’t help you improve your thinking, but Can encourage you to keep moving forward when you want to give up
2. Do a good job of code management, a good code management can save us a lot of effort in the later optimization
3. Try to think clearly before writing the code, the cost of subsequent major changes will be very large
4 . Communicate more with the big guys, but not copy them, but think about whether there are better ideas based on the big guys’ ideas, so as to improve your own thinking
5. diversify to think about the problem, many times the solution is not only fixed. Several
6. When changing the bug, use small-scale data (you can gradually predict how to send it) for testing. If you find that the process does not meet your expectations, it means that there is a bug in this part of the code, and then gradually narrow the scope to find the bug. , Must be careful, calm down, calm down
7. If you can use c/c++ to play the game, use c/c++ to play the game, because other languages ​​are not as efficient as it, of course, if you are the same as me It doesn’t matter if you want to use a specific language, the running time of this competition is not the decisive factor

7. Summary of feelings

I felt a lot about this competition. I worked hard for a week and skipped a week of class (laughing and crying). Although I didn't make it to the semi-finals in the end, it helped me a lot. Especially in terms of algorithm and code optimization.

I really admire the big guys in the front row, who can keep the cost down so low, which is far from what I can achieve.

Of course, my current level is still not enough, and I have little experience in such tuning. I used to focus on learning the knowledge of Java back-end development, but neglected to write and optimize the underlying code, so I will have to supplement the knowledge of data structures and algorithms in the future. , Study hard, and strive to enter the semi-finals in the next war next year.


Finally, attach the code gitee address of this time, which contains the relevant files of this competition, if necessary, you can download it yourself .


May we take our dreams as horses and live up to our youth.
Encourage with you!

Guess you like

Origin blog.csdn.net/qq_46101869/article/details/115284543