Paper translation - STUN: Reinforcement-Learning-Based Optimization of Kernel Scheduler Parameters 4 (3)

Continued from the previous article: Thesis translation and reading - STUN: Reinforcement-Learning-Based Optimization of Kernel Scheduler Parameters 4 (2)

4. Design of STUN

4.3 Reward Algorithm

Reward is a value that indicates whether workload performance is improving. By subdividing and applying rewards, STUN can update the Q-table more efficiently and reduce the learning time. The idea of ​​some reward and penalty rules in the STUN reward function is as follows:

  • High rewards for significant improvements in performance
  • Penalties if performance drops significantly
  • Reward based on previous performance

default_bench is used as a variable for performance benchmarks to represent the rules of the algorithm. Here are the test workload results under Linux default settings without changing the parameters. Note that the results represent the performance of the test workload. As with the filtering process, the rewards are different based on results greater than 20% and less than 20%, to check whether performance is significantly affected. Results that improve by more than 20% from the default_bench are set as the upper bound, while results below 20% are set as the lower bound. If the result is better than the upper bound, the reward is 200, which is a large reward; if it is lower than the lower bound, the reward is −50, which is a penalty; If so, the reward is 100; otherwise, a value of 0 is returned. Algorithm 1 is an algorithmic representation of rewards.

Guess you like

Origin blog.csdn.net/phmatthaus/article/details/131449766