【机器学习】6 机器学习系统的设计

1 Recommend approach

  1. Start with a simple algorithm that you can implement quickly. Implemnt it and test it on your cross-validation data.
  2. Plot learning curves to decide if more data, more features, etc. are likely to help.
  3. Error analysis: Manually(人工) examine the examples ( in cross-validation set ) that your algorithm made errors on. See if you spot any systematic trend(系统化的趋势) in what type of examples it is making errors on

2 Error metrics for skewed classes(偏斜类的误差度量)

情况 预测Predict 实际Actual
正确肯定 True Positive, TP true true
正确否定 True Negative, TN false false
错误肯定 False Positive, FP true false
错误否定 False Negative, FN false true

2.1 Precision ( 查准率 )

  • P r e c i s i o n = T P T P + F P Precision=\frac{TP}{TP+FP} Precision=TP+FPTP

2.2 Recall ( 查全率 )

  • R e c a l l = T P T P + F N Recall=\frac{TP}{TP+FN} Recall=TP+FNTP

2.3 Trading Off Precision and Recall

  • F1 Score: 2 P R P + R 2\frac{PR}{P+R} 2P+RPR

3 Data for Machine Learning

  • Algorithms:
    (1) Perception ( Logistic regression )
    (2) Winnow
    (3) Memory-based
    (4) Naive Bayes
  • It’s not who has the best algorithm that wins. It’s who has the most data.
  • Large data rationable:
    (1) Use a learning algorithm with many parameters → J t r a i n ( θ ) J_{train}(\theta) Jtrain(θ) will be small
    (2) Use a very large training set → J t r a i n ( θ ) J t e s t ( θ ) J_{train}(\theta)J_{test}(\theta) Jtrain(θ)Jtest(θ)
    (3) from (1) + (2) → J t e s t ( θ ) J_{test}(\theta) Jtest(θ) wiil be small

4 Designing a high accuracy learning system

  1. 是否可以通过特征值预测信息
  2. 大量数据 + 多参数算法

5 Reference

吴恩达 机器学习 coursera machine learning
黄海广 机器学习笔记

猜你喜欢

转载自blog.csdn.net/qq_44714521/article/details/108461342