google machine learning Notes (a)

To summarize knowledge before learning machine learning point Google site, by the way Amway: . Develops.google.com/machine-learning/crash-course
/ * need over the wall * /

Explain the concept

First-label characteristic & & & model sample
label : the need to predict the real kind, the basic linear regression y variables
(direct results, such as: whether as spam)
features : input variables used to describe the data, the basic linear regression of {x1, x2, ...} variable
(auxiliary information, such as: message header, the sender received, routing information, etc.)
sample : specific examples of data, X
(results given target)
- tagged samples (x, y) -> the result is known, can be used to train the model
- unlabeled samples (x,?) -> used to make predictions for new data
model : the sample may be mapped to predict the label, y`
tool (prediction is performed, method)
- regression model can predict continuous values (how much), to solve the housing prices predicted as 1, 2 user clicks on an ad probability
- classification model that can predict the discrete values (whether) to address 1 as a specified whether the message is spam 2 judgment dog cat and mouse picture

Training Training : Training model is represented by a label samples to learn the ideal value of all weights and deviations (determined).
Loss Loss : expressed in terms of the accuracy of model predictions of a single sample.

understand deeper

  • Linear regression: a linear or hyperplane approach is best suited to find a set of points
    - L2 predicted value of a given sample loss, the square error = difference between the predicted value and the squared value tag = (observed: (a common return loss function ) open square) = (y - y`) square
    Mean square error (MSE): All the squares of each sample, and the loss
    Here Insert Picture Description
    · (x, y) represents one sample, x represents the set of features such as age, y sample labels, such as number of times per minute chirping
    function after · prediction (x) weight deviation binding.

  • Reducing Loss reduce losses
    newly selected error before each set of hyper-parameters than a smaller set of parameters.
    Gradient descent method : repeatedly take small steps (step gradient) iterations to adjust the forecast.
    Here Insert Picture Description
    Small batch gradient descent method: using a small portion of each sample, and a gradient such that the equilibrium Loss in bulk range.
    Point: first of all to weight training machine learning models and deviations initial guess, and then repeatedly adjust these speculations, until the loss of the lowest possible weight and bias so far.
    Gradient: partial derivative vector
    Here Insert Picture Description
    super parameters: button to adjust the learning rate (the rate of learning is a kind of hyper-parameters)
    learning rate: also known as step
    too small: it takes too long
    too large: Bounce is hard to find near the bottom end

  • Generalization (Generalization)
    refers to the model fits well've never been ⻅ new data (extracted from the same ⼀ Use to create a model of the distribution) can eyesight.

  • Overfitting: dead set parameters are divided portion of the sample, but less accurate and comprehensive, later adding a new test sample will be a problem. Test samples can have low loss, but a lot of new data when the results will be very bad.

  • 应该采取的细则:
    1、随机抽取独立同分布(i,i,d)的样本,样本之间互不影响。
    2、分布是平稳的,即分布在数据集内不会发生变化。
    3、从同一分布的数据划分中抽取样本。

  • 置信区间:样本统计量所构造的总体参数的估计区间。

  • 验证Validation (另一种划分)
    通过划分训练集和测试集,可以判断模型能否很好地泛化到新数据。
    每次迭代完都要用测试集验证,然后不断调参,注意不要不知不觉过拟合
    Here Insert Picture Description

特征工程 Feature Engineering

从原始数据中提取特征的过程

  • OOV(Out of Vocabulary)分桶:定义一个从特征值(可能的词汇表)到整数的映射。
    独热编码(One-hot encoding):按照字典的形式将(如每个街道的名称)映射到 {0,1,…,v-1} 内的某个函数。 ——用以解决特征是字符串的情况。

如其中1表示Main Street,0表示其他{0,0,0…1,0,0},采用布尔值来定义指示性特征而尽量不用“天数,时长”等量值特征。

  • 分箱技术(binned):一部分特定情况下用特征值。
    特征值处理方法:离群值可以限定在某个上限,大于该上限都按该值处理。
    Here Insert Picture Description
  • 特征组合 Feature Crosses:对于不能用一根线直接把特征集划分干净的情况(占绝大多数情况),采用【A×B】的形式,利用其他特征合成新特征。
    如:划分一三象限和二四象限,将x轴和y轴结果相乘作为新特征。

对于一些复杂情况

First bin may be employed, in another embodiment a specific condition (e.g., city) feature a combination of (latitude and longitude), the combination of features to prevent the same effect.
If the combination is too much, that is too complex, the model will have the opportunity and the training data noise is too fit , will reduce the performance of the model in terms of test data.

Released five original articles · won praise 2 · Views 169

Guess you like

Origin blog.csdn.net/DUTwangtaiyu/article/details/104661606