来源:专知
本文为论文介绍,建议阅读5分钟在本论文中,我们的目标是研究一个鲁棒的随机控制问题,其中代理不知道基础过程的参数值。
In stochastic control problems, an agent chooses an optimal policy to maximize or minimize a performance criterion. The performance criterion can be the expectation of the reward function for standard control problems or the nonlinear expectation for robust control problems. In parametric stochastic control problems, the agent needs to know the values of the model parameters in the stochastic system in order to correctly specify the optimal policy. However, the cases where the agent knows the values of the model parameters are almost non-existent.
In this paper, our goal is to study a robust stochastic control problem where the agent does not know the parameter values of the underlying process. Therefore, we formulate a stochastic control problem assuming that the agent does not know the values of the model parameters. However, the agent uses observable processes to estimate the values of model parameters while solving stochastic control problems within a robust framework.
This new stochastic control problem has two key components. The first component is the parameter estimation part, where the agent uses the implementation of the underlying process to estimate the unknown parameters in the stochastic system. We pay special attention to online parameter estimation. Online estimators are an important ingredient for our stochastic control problems because this type of estimator allows the agent to obtain an optimal policy in the form of feedback. The second component is the stochastic control part, where the question is how to design a time-consistent stochastic control problem so that the agent can also simultaneously estimate parameters and optimize its strategy. In this paper, we address each component of the above problem in a continuous-time setting, and then take a close look at the utility maximization problem under this framework.
In this paper, we study stochastic control problems in which the agent does not have sufficient knowledge of the parameter values in the model, and over time new observations are used to estimate the parameters and simultaneously update the optimal policy. This question is interesting from both a theoretical and a practical point of view. Standard stochastic control problems often assume that the agent knows the values of the model parameters, a strong assumption that does not hold in practice. By relaxing the assumption on parameter knowledge, we can apply the new stochastic control framework to many classical stochastic control problems, such as utility maximization, where the agent does not have full knowledge of the model parameter values in a stochastic system. There are two key components in these stochastic control problems. First, the values of the parameters are estimated over time and as more information becomes available. In this paper, we focus on online parameter estimation. Online estimators are an important component of the stochastic control problems we study because online estimators allow agents to obtain policies (Markovian) in the form of feedback. Second, design a time-consistent stochastic control problem that allows the agent to estimate parameters online while simultaneously deriving an optimal policy. In this paper, we address each component of the above problem in a continuous-time setting.