Coursera吴恩达机器学习week3笔记

Evaluating learning algorithm

Evaluating a Hypothesis

Once we have done some trouble shooting for errors in our predictions by:

  • Getting more training examples:Fixes high variance
  • Trying smaller sets of features:Fixes high variance
  • Trying additional features:Fixes high bias
  • Trying polynomial features:Fixes high bias
  • Increasing λ:Fixes high variance
  • decreasing λ:Fixes high bias

可能有的公式针对训练集已经有很低的错误了,但是依然不够准确,因为这是过拟合的情况。所以为了分析假说公式,我们把数据集分为两类:训练集(70%)和测试集(30%)

在这里插入图片描述

Model Selection and Train/Validation/Test Sets

One way to break down our dataset into the three sets is:

  • Training set: 60%
  • Cross validation set: 20%
  • Test set: 20%

We can now calculate three separate error values for the three different sets using the following method:

  1. Optimize the parameters in Θ using the training set for each polynomial degree.
  2. Find the polynomial degree d with the least error using the cross validation set.
  3. Estimate the generalization error using the test set with J , (d = theta from polynomial with lower error);

This way, the degree of the polynomial d has not been trained using the test set.

Bias vs Variance

Dignosing bisa vs variance

在这里插入图片描述

Regularization and Bias/Variance

在这里插入图片描述

Learning Curve

在这里插入图片描述

在这里插入图片描述

Diagnosing Neural Networks

  • A neural network with fewer parameters is prone to underfitting. It is also computationally cheaper.
  • A large neural network with more parameters is prone to overfitting. It is also computationally expensive. In this case you can use regularization (increase λ) to address the overfitting.

Building a Spam Classifier

Prioritizing What to Work On

  • Collect lots of data (for example “honeypot” project but doesn’t always work)
  • Develop sophisticated features (for example: using email header data in spam emails)
  • Develop algorithms to process your input in different ways (recognizing misspellings in spam).

在这里插入图片描述

Error Analysis

  • Start with a simple algorithm, implement it quickly, and test it early on your cross validation data.
  • Plot learning curves to decide if more data, more features, etc. are likely to help.
  • Manually examine the errors on examples in the cross validation set and try to spot a trend where most of the errors were made.

Handling Skewed Data

在这里插入图片描述

F1 Score: 2*P*R/(P+R)

发布了151 篇原创文章 · 获赞 110 · 访问量 10万+

猜你喜欢

转载自blog.csdn.net/qq_35564813/article/details/104226835