Machine Learning cornerstone Lin Xuan Tian notes 3-Types of Learning

Last time we introduce a simple method to solve linear classification problem: PLA. PLA can be selected in a line in the plane of the sample data is completely correctly classified. For the case of linearly inseparable, Pocket Algorithm may be used to process. This lesson will focus on learning about what kind of machine, and summarized.

A, Learning with Different Output Space Y (in accordance with changes in the input space divided)

Examples of the bank to determine whether a credit card issued to him according to personal circumstances, this is a typical binary classification (binary classification) problem. That is only the output of two general y = {- 1, +1}, - 1 representative of the credit card does not send (negative type), representatives of the credit card + 1 (positive class).

Binary classification problem is very common, including credit card payment, spam identification, patient diagnosis, the accuracy of estimates, and so the answer. Binary classification is very central and fundamental problems in machine learning. Linear binary classification model is also non-linear model, according to the actual situation problem, select a different model.

In addition to binary classification, there are multivariate classification (Multiclass Classification) problem. As the name implies, the output of more than two multivariate classification, y = {1, 2, ..., K}, K> 2. General multivariate classification applications are identification numbers, image recognition and so content.

Binary classification and multivariate classification belong to the classification, their outputs are discrete values.

For another case, such as training model, predict house prices, stock returns how many other such problems output y = R, that is, the entire range of real space, is continuous. This kind of problem, we call it regression (Regression).

The most simple linear regression is a typical regression model.

It will be used in a machine learning problem: Structured Learning (Structured Learning). Structured learning of the output space contains some structure inside, some of its solution usually comes from a multi-classification problems extended, multi-language for the field.

Briefly summarize, machine learning space, then divided according to the output, including binary classification, multivariate classification, regression, structured learning , such as different types. Which binary classification and regression is the most basic, the core of two types.

Two, Learning with Different Data Label yn (divided data according to the mark)

D input feature both the training sample if we get x, but also output yn, then we call this type of learning is called supervised learning (Supervised Learning).

Supervised learning can be a binary classification, multivariate classification or regression, the most important thing is to know the output label yn.

And supervised learning as opposed to another type of non-supervised learning (Unsupervised learning).

Unsupervised learning is no output yn label, typically unsupervised learning include: clustering (clustering) problems, such as automatic classification of web pages on the news; density estimation, such as traffic conditions analysis; anomaly detection, such as user network traffic monitor. Typically, unsupervised learning more complex, and unsupervised many problems can be implemented using a number of algorithms are thinking supervised learning.

It is called semi-supervised learning interposed between supervised and unsupervised learning (Semi-supervised Learning).

As the name suggests, semi-supervised learning that is a part of the output data tag yn, while another portion of the tag data is not output yn. In practical applications, semi-supervised learning is sometimes necessary, such as pharmaceutical companies for certain drugs for testing, taking into account the costs and limitations of other experimental population issues, there are only a part of the data output label yn.

In addition, there is a very important types: Enhanced Learning (Reinforcement Learning).

增强学习中,我们给模型或系统一些输入,但是给不了我们希望的真实的输出y,根据模型的输出反馈,如果反馈结果良好,更接近真实输出,就给其正向激励,如果反馈结果不好,偏离真实输出,就给其反向激励。不断通过“反馈-修正”这种形式,一步一步让模型学习的更好,这就是增强学习的核心所在。增强学习可以类比成训练宠物的过程,比如我们要训练狗狗坐下,但是狗狗无法直接听懂我们的指令“sit down”。在训练过程中,我们给狗狗示意,如果它表现得好,我们就给他奖励,如果它做跟sit down完全无关的动作,我们就给它小小的惩罚。这样不断修正狗狗的动作,最终能让它按照我们的指令来行动。实际生活中,增强学习的例子也很多,比如根据用户点击、选择而不断改进的广告系统。

简单总结一下,机器学习按照数据输出标签yn划分的话,包括监督式学习、非监督式学习、半监督式学习和增强学习等。其中,监督式学习应用最为广泛。

三、Learning with Different Protocol f(xn,yn)(根据获取数据的方式不同)

按照不同的协议,机器学习可以分为三种类型:

  • Batch Learning
  • Online

  • Active Learning

batch learning是一种常见的类型。batch learning获得的训练数据D是一批的,即一次性拿到整个D,对其进行学习建模,得到我们最终的机器学习模型。batch learning在实际应用中最为广泛。

online是一种在线学习模型,数据是实时更新的,根据数据一个个进来,同步更新我们的算法。比如在线邮件过滤系统,根据一封一封邮件的内容,根据当前算法判断是否为垃圾邮件,再根据用户反馈,及时更新当前算法。这是一个动态的过程。之前我们介绍的PLA和增强学习都可以使用online模型。

active learning是近些年来新出现的一种机器学习类型,即让机器具备主动问问题的能力,例如手写数字识别,机器自己生成一个数字或者对它不确定的手写字主动提问。active learning优势之一是在获取样本label比较困难的时候,可以节约时间和成本,只对一些重要的label提出需求。

简单总结一下,按照不同的协议,机器学习可以分为batch, online, active。这三种学习类型分别可以类比为:填鸭式,老师教学以及主动问问题。

四、Learning with Different Input Space X(根据输入的数据不同划分)

输入X的第一种类型就是concrete features。比如说硬币分类问题中硬币的尺寸、重量等;比如疾病诊断中的病人信息等具体特征。concrete features对机器学习来说最容易理解和使用。

第二种类型是raw features。比如说手写数字识别中每个数字所在图片的mxn维像素值;比如语音信号的频谱等。raw features一般比较抽象,经常需要人或者机器来转换为其对应的concrete features,这个转换的过程就是Feature Transform。

第三种类型是abstract features。比如某购物网站做购买预测时,提供给参赛者的是抽象加密过的资料编号或者ID,这些特征X完全是抽象的,没有实际的物理含义。所以对于机器学习来说是比较困难的,需要对特征进行更多的转换和提取。

简单总结一下,根据输入X类型不同,可以分为concetet, raw, abstract。将一些抽象的特征转换为具体的特征,是机器学习过程中非常重要的一个环节。在《机器学习技法》课程中,我们再详细介绍。

五、总结:

本节课主要介绍了机器学习的类型,包括Out Space、Data Label、Protocol、Input Space四种类型。

Guess you like

Origin www.cnblogs.com/cchenyang/p/11453571.html