Introduction to Reinforcement Learning - 代码天地

Introduction to Reinforcement Learning

其他 2018-06-04 13:25:30 阅读次数: 0

Learning from interaction is a foundational idea underlying nearly all theories of learning and intelligence.The approach we explore, called reinforcement learning, is much more focused on goal-directed learning form interaction than are other approaches to machine learning.
这里写图片描述
Reinforcement learning is learning what to do—how to map situations to actions—so as to maximize a numerical reward signal.
These two characteristics—trial-and-error search and delayed reward—are the two most important distinguishing features of reinforcement learning.

One of the challenges that arise in reinforcement learning, and not in other kinds of learning, is the trade-off between exploration and exploitation.
Another key feather of reforcement learning is that it explicitly considers the whole problem of a goal-directed agent interacting with an uncertain environment .

One must look beyond the most obvious examples of agents and their environments to appreciate the generality of the reinforcement learning framework.

Feathers shared by cases that can use reinforcement learning:
All involve interaction between an active decision-making agent and its environment, within which the agent seeks to achieve a goal despite uncertainty about its environment.

Elements of reforcement learning:

agent
policy
A policy is a mapping from perceived states of the environment to actions to be taken when in those states.
reward signal
A reward signal defines the goal in a reinforcement learning problem.
value function
Whereas rewards determine the immediate, intrinsic desirablity of environmental states, values indicate the long-term desirability of states after taking into account the states that are likely to follow, and the rewards available in those states.
model of the environment (optional)
Methods for solving reinforcement learning problems that use models and planning are called model-based methods, as opposed to simpler model-free methods that are explicitly trial-and-error learners—viewed as almost the opposite of planning.

Reinforcement learning uses the formal framework of Markov decision processes to define the interaction between a learning agent and its environment in terms of states, actions and rewards.

猜你喜欢

转载自blog.csdn.net/weixin_42018112/article/details/80456762

Introduction to Reinforcement Learning

Introduction to Learning to Trade with Reinforcement Learning

An Introduction to Reinforcement Learning with OpenAI G

[转]Introduction to Learning to Trade with Reinforcement Learning

《Reinforcement Learning: An Introduction》读书笔记 - 目录

Reinforcement Learning：An Introduction Chapter 1 Summary and Assignments

Lecture1: Introduction to Reinforcement Learning

Introduction to Reinforcement Learning with OpenAI Gym.

强化学习导论（Reinforcement Learning：An Introduction）学习笔记（五）

强化学习导论（Reinforcement Learning：An Introduction）学习笔记（二）

Reinforcement Learning: An Introduction读书笔记(2)--多臂机

Reinforcement Learning: An Introduction读书笔记(3)--finite MDPs

Reinforcement Learning: an introduction 编程笔记——第二章

《Reinforcement Learning: An Introduction》 Chapter 2 Multi-arm Bandits 笔记

Reinforcement learning——an introduction强化学习翻译1.7节

Reinforcement learning——an introduction强化学习翻译1.6节

Reinforcement learning——an introduction强化学习翻译1.5节

Reinforcement learning——an introduction强化学习翻译1.4节

Reinforcement learning——an introduction强化学习翻译1.3节

Reinforcement learning——an introduction强化学习翻译1.2节

Reinforcement learning——an introduction强化学习翻译1.1节

Reinforcement Learning:An Introduction 第三章读书笔记

深度强化学习cs294 Lecture3&Lecture4: Introduction to Reinforcement Learning

《Reinforcement Learning: An Introduction》强化学习导论原文翻译 17.4 设计奖励信号

《Reinforcement Learning: An Introduction》强化学习导论原文翻译 17.3 观察与状态

《Reinforcement Learning: An Introduction》强化学习导论原文翻译 17.5 剩下的问题

《Reinforcement Learning: An Introduction》强化学习导论原文翻译17.2 通过选项(option)做时域抽象

《Reinforcement Learning: An Introduction》强化学习导论原文翻译17.1 广义价值函数和辅助任务

《Reinforcement Learning: An Introduction》强化学习导论原文翻译 17.6 人工智能的未来

Reinforcement Learning: An Introduction （Second edition）个人渣翻（三）Chapter 1.5 1.6

今日推荐

TIOBE 5 月榜单：Fortran “复活”进入 Top 10

GCC 14.1 发布

面壁智能发布 Eurux-8x22B 开源大模型 —— 堪称「理科状元」

开源日报 | 谷歌扶持鸿蒙上位；开源Rabbit R1；Docker加持的安卓手机；微软的焦虑和野心；海尔电器把开放平台关了

中国码农的“35岁魔咒”

蘭雅 CorelDRAW 插件 2024.5.1 国际劳动节版，免费下载

Arc Browser for Windows 1.0 正式 GA

90后程序员开发视频搬运软件、不到一年获利超 700 万，结局很刑！

周排行

Java自定义时间格式

同步整形电路

在开发中最最最常用的字符串的属性大集合

Linux 查看端口占用并杀掉

Java基础四：ArrayList

多线程之死锁就是这么简单

mysql 基础命令集

awk 命令详解

Centos6.3编译安装nginx+php步骤

OCR （Optical Character Recognition，光学字符识别）

每日归档

更多

2024-05-08(42)

2024-05-07(14)

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)

2024-05-02(0)

2024-05-01(4)

2024-04-30(1)

2024-04-29(40)