Summary of Interview Questions for "Deep Reinforcement Learning"

Original source:
[1] Tencent Cloud. "Deep Reinforcement Learning"
Interview Questions Summary [2] Some Reinforcement Learning Interview Questions Encountered by Reinforcement Learning [3] Zhihu. Reinforcement Learning Interview Questions

Deep reinforcement learning report

Source: Blog (AemaH, Su Ke)

Edit: DeepRL

Unknowingly, the job hunting season has been going on for a long time. The recruitment of algorithm positions can be said to be very popular, but the interview questions for intensive learning are relatively few. This article has compiled about 50 questions for your self-test. , Also welcome to summarize and contribute answers!

  1. What is reinforcement learning?

  2. What is the difference between reinforcement learning, supervised learning, and unsupervised learning?

  3. What kind of problem is Reinforcement Learning suitable for?

  4. What is the loss function of reinforcement learning? What is the relationship with the loss function of deep learning?

  5. What is POMDP? What is the Markov process? What is the Markov decision process? What is the nature of the "Markov" in it?

  6. What is the specific mathematical expression of Bellman equation?

  7. Why are the optimal value function and optimal strategy equivalent?

  8. What is the difference between value iteration and strategy iteration?

  9. What if Markovianity is not satisfied? What is the relationship between the current state and many, many states before it?

  10. What are the methods to solve the Markov decision process? What method is there for a model? What is dynamic programming?

  11. Briefly describe the dynamic programming (DP) algorithm?

  12. Briefly describe the Monte Carlo estimation function (MC) algorithm.

  13. Briefly describe the time difference (TD) algorithm.

  14. Briefly describe the comparison between dynamic programming, Monte Carlo and time difference (common and different points)

  15. Are MC and TD respectively unbiased estimates?

  16. MC or TD, whose variance is greater, and why?

  17. Briefly describe the difference between on-policy and off-policy

  18. Briefly describe Q-Learning and write its Q(s,a) update formula. Is it on-policy or off-policy, why?

  19. Write the formula for updating the current value function with the value function of the nth step (meaning 1-step, 2-step, n-step). When the value of n becomes larger, the expectation and variance become larger and smaller respectively?

  20. TD(λ) method: When λ=0, which method is actually equivalent, λ=1?

  21. Write down the formulas for updating the value function of the three methods of Monte Carlo, TD and TD(λ)?

  22. What is the difference between value-based and policy-based?

  23. What are the two key tricks of DQN?

  24. Explain the role of the target network and experience replay?

  25. Manually derive the strategy gradient process?

  26. Describe the characteristics of random strategies and deterministic strategies?

  27. Without breaking the data correlation, why the training effect of neural network is not good?

  28. Draw the flow chart of DQN playing Flappy Bird. In this game, what is the state and how is the state transferred? How is the reward function designed, and is there a reward delay problem?

  29. What are the variants of DQN? What kind of status rewards are introduced?

  30. Briefly describe the principle of double DQN?

  31. How to determine the baseline in the strategy gradient method?

  32. What is DDPG, and draw the DDPG framework structure diagram?

  33. What is the difference between Actor-Critic?

  34. What is the role of critic in the actor-critic framework?

  35. Is DDPG on-policy or off-policy, why?

  36. Have you understood the D4PG algorithm? Briefly describe the process

  37. Briefly describe the A3C algorithm? Is A3C on-policy or off-policy, why?

  38. How is the A3C algorithm updated asynchronously? Can you explain the difference between GA3C and A3C?

  39. Briefly describe the advantage function of A3C?

  40. What is importance sampling?

  41. Why can TRPO ensure that the return function of the new strategy remains monotonous?

  42. How TRPO solves the problem of learning rate by optimizing each local point to find the optimal step size that makes the loss function non-increasing;

  43. How to understand the use of the average KL divergence instead of the maximum KL divergence?

  44. Briefly describe the PPO algorithm? What is the relationship with the TRPO algorithm?

  45. Briefly describe the relationship between DPPO and PPO?

  46. How can reinforcement learning be used in recommendation systems?

  47. How to design the reward function in the recommendation scenario?

  48. What is the state in the scene, and how does the current state transfer to the next state?

  49. How to model the scenarios of autonomous driving and robots into reinforcement learning problems? Which variables in the real scene correspond to each element of the MDP?

  50. Reinforcement learning requires a lot of data. How to generate or collect this data?

  51. Have you ever played a Torcs game with a certain DRL algorithm? How to solve it?

  52. Have you learned about reward shaping?

This article also synchronizes the topic on Github, welcome to pull request the best answer! At the end of the article, we will unanimously thank all contributors, and welcome everyone to discuss in the group!

https://github.com/NeuronDance/DeepRL/blob/master/DRL-Interviews/drl-interview.md

Acknowledgements: This article (the next two links) is compiled with reference to AemaH and Su Ke's blog. Thank you here!

https://zhuanlan.zhihu.com/p/33133828

https://aemah.github.io/2018/11/07/RL_interview/

Deep Reinforcement Learning Lab

Algorithms, frameworks, materials, cutting-edge information, etc.

GitHub repository

https://github.com/NeuronDance/DeepRL

Guess you like

Origin blog.csdn.net/SL_World/article/details/112631061