You know how the machines are learning it?

At a family dinner, my father mentioned: now really advanced science and technology, and even artificial intelligence are out, you can recognize your face, you can calculation of spending power, even when playing chess, but also to win the next humans top players! In this regard His Holiness have a question: how these AI so smart, it will do own it?

I suddenly froze for a moment, yes ah, if the machine learning will be thinking, it is how to think about it?

Take the most widely used artificial intelligence, machine learning, in the whole learning process, in fact, will not own up. If roughly divided into two categories, it can be divided into supervised learning and unsupervised learning, supervised learning process as we teach children babbling know Mom and Dad, we give the two men labeled, respectively, "Dad "" and "mother", or like teaching math as 2 dollars a pound of potatoes, 5 pounds is how much the way pre-specified significance of the data of learning is supervised learning;? without supervised learning is by its own rules, or to divide the difference between the way the identifying characteristics of the data, that is, the so-called own will.

But not so on their own machine learning are learned, which involves a number of steps - data preprocessing, model building, model validation and optimization model. In fact, we humans learn and process about the same, but in some ways even more sophisticated. Data preprocessing is to make the data become can be received, can be interpreted to learn; and model it, is like the subjective experience of people formed, of course, subjective experience is not necessarily accurate, so it is necessary to validate the model; and the optimization model is to make themselves more sophisticated.

Here we use machine learning point of reading about the phenomenon of human existence.

▊ Why would people more sensitive to certain types of information?

Xiao Ming father is a chef, or Xu Xiaoming is more sensitive to the taste itself; Tiger's mother is a photographer, or Xu Xiaohu more sensitive to color. If a model is usually steeped in a certain type of data, whether it be sensitive to the type of data it? I believe it has some relevance. At least in the machine learning model, there is such a situation occurs. In the absence of sufficient life experience, the machine learning will use what methods? As shown below, respectively over- and under-sampling. Oversampling the small type multiple copies, the number of samples in order to increase its small species. Undersampling is to remove some of the samples, or select only some of the samples from the mass class from the mass class. Deliberately reduce or increase certain aspects of the data to make balanced overall distribution data.

Of course, not some kind of oversampling and undersampling specific method, but a kind of collection methods, below we will introduce two sampling specifically related to those methods.

E.g. oversampling synthetic minority oversampling techniques (Synthetic Minority Oversampling Technique, SMOTE), a technique to generate new samples are interpolated in the samples of the original data set by passing between individual samples. This process is essentially as follows:

First, the majority class samples exclude minority extracted sample;

Finally, in the same manner as in Synthesis new sample in other minority original sample points.

Undersampling in the near loss method (NearMiss) can be reduced in the case of the model extraction process of information loss. Many similar methods and adjacent to, first of all, the method calculates a distance between the instances of all the minority class instances majority class. Then, select the minority class adjacent to the sample with the shortest distance to the majority class sample points, using a sample of distance remove the majority class.

Which method there are some variations, but as long as the idea is the same, i.e., the minority adjacent sample points from the sample points most shortest class, these variants the only difference is that the shortest distance metric by:

NearMiss-1, calculates the majority class sample points adjacent in the minority average distance class sample shortest majority class sample points, or calculated from the minimum class sample points of the majority class sample point distances minority and, from the efficacy of the like efficiency, and its small to large sort; and finally a need to extract and retain the majority class samples.

NearMiss-2, adjacent to the calculation, the shortest distance majority class sample points most minority samples subjected to sorting small to large; finally necessary to extract and retain a majority class samples.

NearMiss-3, and the first two ideas to solve the problem is different, starting with a few classes of sample points, if the majority class nearest to extract sample points from the sample of minority, in ascending order according to the same, according to a specific number extract.

"Sea tactical" ▊ model

Speaking of the above, the balance of data, data quality assurance under the circumstances, to enhance the robustness of the model, as we like to read, but also do sea tactical (polish).

LeoBreiman proposed in 1994 bagging method (on Bagging) for the training set classified by the combination of randomly generated to improve the classification.

套袋法(Bagging),是一种机器学习集成算法,旨在提高机器学习算法在统计分类和回归中的稳定性和准确性。它还减少了差异,并有助于避免过度拟合。虽然它通常被应用于决策树方法,但它可以与任何类型的方法一起使用。套袋是模型平均法的特例。

其方法的具体实现过程如下。

套袋法原理图

BootStrap原理图

通过做大量“习题”来提升模型的性能;有时候,通过简单粗暴的题海战术并不能得高分,而错题集的方式,则提供了另外的思路。

提升法(Boosting)的最早思想起源于Kearns和Valiant提出的一个这样的问题—能否通过一组弱学习器组成一个强学习器?。弱学习器被定义为仅比随机猜测更好地标记样本的一个分类器,而一个强学习器是与与实际分类结果相近的一个分类器,学习器的本质就是分类模型。罗伯特·夏贝尔在1990年的一篇论文The Strength of Weak Learnability中对Kearns和Valiant提出的问题做出了肯定的回答。

提升法(Boosting)则是一种基于将多个弱模型集成为一个强模型机器学习集成算法,大多数提升法(Boosting)算法通过迭代学习关于某个分布的弱分类器,并将它们添加到最终的强分类器中,当它们被添加时,它们通常在某种程度上被加权,这通常与弱学习者的准确性有关。在添加弱学习者后,数据权重被重新调整,称为重加权。错误分类的训练样本获得了较高的权重,被正确分类的训练样本削减其权重。于是,后来的弱学习器(基模型)将更多地关注之前弱学习器(基模型)错误分类的样本。

Boosting原理图

▊ 机器有了“神经”,那会得神经病么?

在了解神经网络之前,首先介绍下感知器(神经元)是神经网络的基本单元,本质上是一个过滤器,存在一个阈值n(通常为0),当变量大于或者小于这个阈值,最终会生成-1或1,如下:

假设数据集是线性可分的,感知机学习的目标是求得一个能够将训练集正负实例点完全分开的超平面。

感知器层,顾名思义,就是将多个感知器合并成一个层,而这个层以全连接或者部分连接的方式,以上一层的感知器层输出或者原始数据的输入作为输入,其输出则直接作为下一层的输入或者直接作为模型的输出。多个感知器层的叠加组成多感知器层(multilayer perceptron,MLP)。

从左到右来看,该网络的第一层为网络的输入层,通常是将原始数据转化成数字矩阵,提交到输入层,输入层的神经受到激励响应,并传播到下面的隐藏层,最终在输出层呈现训练结果。

那么,这个具体经历了什么?

神经网络主要经历了两次传播—前向传播(Forward Propagation)和反向传播(Back Propagation)。

第1阶段:前向传播,又叫激励传播

前向传播过程

第2阶段:反向传播

反向传播过程

同理,对神经网络模型上每一个神经元中的权重以及偏移系数做上述处理,于是权重得到更新,每一次更新视为一个批量(Batch),将所有的数据完全训练一遍称为一轮(Epoch),随着正向传播与反向传播交替进行,直到达到规定的轮数或者网络的对输入的响应达到预定的目标范围为止。

是否意味着,神经网络的层数越多,其表现能力越强,在一定的区间是这样的,因为神经网络也会犯“神经病”。反向传播算法,在这个算法中通过从输出层到输入层的方向,传播梯度误差;通过计算在网络中对应的权重下的损失函数的梯度,并利用这些梯度更新每个单元的权重,至此一个梯度下降的步进就完成了,通过多次梯度下降的步进,最终每个单元的权重值会收敛于某个固定的值域范围,从而模型训练完成。

在实际引用场景中,尤其是在深度网络中,梯度通过逐层传播之后,可能会越来越小,当传播到足够低的层时,该层的权重值由于梯度过小的原因,从而几乎不会改变,并且在增加训练轮数或者样本的情况下,拟合效果仍然没有明显改观,这种情况叫做梯度消失;

当然,存在与之相反的情况,即随着传播的层数变低,其梯度越来越大,以至于神经网络的中权重会在训练的过程中,较大幅度的更新,甚至到训练周期结束之时,都没有形成一个稳定的模型。


本文节选自博文视点新书《机器学习从入门到入职》

本书从分类、回归、聚类、降维、深度学习等方面介绍了主要的机器学习概念及模型原理,并有大量的基于主流机器学习平台的上机实践内容,可以让新人在理解机器学习原理的同时快速上手实战。直击左下阅读原文,破“门”而入,有如神助!

(扫码获取本书详情)

▊ 内容简介

近年来机器学习是一个热门的技术方向,但机器学习本身并不是一门新兴学科,而是多门成熟学科(微积分、统计学与概率论、线性代数等)的集合。其知识体系结构庞大而复杂,为了使读者朋友能够把握机器学习的清晰的脉络,本书尽可能从整体上对机器学习的知识架构进行整理,并以Sklearn和Keras等机器学习框架对涉及的相关理论概念进行代码实现,使理论与实践相结合。

本书分为4个部分:第1章至第3章主要介绍机器学习的概念、开发环境的搭建及模型开发的基本流程等;第4章至第7章涵盖回归、分类、聚类、降维的实现原理,以及机器学习框架Sklearn的具体实现与应用;第8章至第12章主要阐述深度学习,如卷积神经网络、生成性对抗网络、循环神经网络的实现原理,以及深度学习框架Keras的具体实现与应用;第13章简单介绍机器学习岗位的入职技巧。

本书可作为机器学习入门者、对机器学习感兴趣的群体和相关岗位求职者的参考用书。

▊ 作者简介

张威(Viking Zhang)

曾就职于IBM、平安科技、嘉实基金、微众银行,现就职于“特朗普品质认证”的人工智能公司,拥有多个关于人工智能方面的专利,致力于将人工智能应用场景普及化,将机器学习技术广泛用于运维架构、金融分析等方面。

发布了1739 篇原创文章 · 获赞 737 · 访问量 476万+

Guess you like

Origin blog.csdn.net/broadview2006/article/details/104210787