【RL系列】马尔可夫决策过程——Jack‘s Car Rental

Jack's Car Rental是一个经典的应用马尔可夫决策过程的问题,翻译过来,我们就直接叫它“租车问题”吧。租车问题的描述如下:

Jack’s Car Rental Jack manages two locations for a nationwide car rental company. Each day, some number of customers arrive at each location to rent cars. If Jack has a car available, he rents it out and is credited $10 by the national company. If he is out of cars at that location, then the business is lost.

Cars become available for renting the day after they are returned. To help ensure that cars are available where they are needed, Jack can move them between the two locations overnight, at a cost of $2 per car moved.

We assume that the number of cars requested and returned at each location are Poisson random variables, where λ is the expected number. 

Suppose λ is 3 and 4 for rental requests at the first and second locations and 3 and 2 for returns.

To simplify the problem slightly, we assume that there can be no more than 20 cars at each location (any additional cars are returned to the nationwide company, and thus disappear from the problem) and a maximum of five cars can be moved from one location to the other in one night. We take the discount rate to be γ = 0.9 and formulate this as a continuing finite MDP, where the time steps are days, the state is the number of cars at each location at the end of the day, and the actions are the net numbers of cars moved between the two locations overnight. 

简单描述一下:

Jack有两个租车点,1号租车点和2号租车点,每个租车点最多可以停放20辆车。Jack每租车去一辆车可以获利10美金,每天租出去的车与收回的车的数量服从泊松分布。每天夜里,Jack可以在两个租车点间进行车辆调配,每晚最多调配5辆车,且每辆车花费2美金。1号租车点租车数量服从$ \lambda = 3 $的泊松分布,回收数量的$ \lambda = 3 $。二号租车点的租车数量和回收数量的$ \lambda $分别为4和2,试问使用什么样的调配策略可以使得盈利最优化。

猜你喜欢

转载自www.cnblogs.com/Jinyublog/p/9319484.html
car