torch.max(action_value, 1)表示取action_value里每行的最大值
torch.max(action_value, 1)[1]表示最大值对应的下标
.data.numpy()[0]表示将将Variable转换成tensor
action_value = self.eval_net.forward(x)
action = torch.max(action_value, 1)[1].data.numpy()[0]
print("<choose_action> action_value=", action_value, "torch.max(action_value, 1)=",torch.max(action_value, 1),"torch.max(action_value, 1)[1]=",torch.max(action_value, 1)[1], "action=", action)
<choose_action> action_value= tensor([[-0.2394, -0.3109, -0.3330, -0.0376]], grad_fn=<AddmmBackward0>) torch.max(action_value, 1)= torch.return_types.max(
values=tensor([-0.0376], grad_fn=<MaxBackward0>),
indices=tensor([3])) torch.max(action_value, 1)[1]= tensor([3]) action= 3