How to learn machine learning better?

Colorado Reed, the founder of Metacademy, published an article called "Machine Learning Leveling Guide", in which he answered a question that beginners often ask him: How can I learn machine learning better? This post will summarize Colorado's recommendations and walk through his roadmap step-by-step.

How to get a better grasp of machine learning

Colorado is a PhD candidate at Berkeley and the founder of Metacademy. Metacademy is an excellent open source platform where many professionals work together to write wiki articles. Currently, these articles are mainly around the two topics of machine learning and artificial intelligence.

In Colorado's suggestion, the best way to learn machine learning is to keep learning from books. He believes that the purpose of reading is to have a book in mind.

It's not surprising that a PhD student would give such advice, and this site may have recommended similar advice before. This advice is okay, but I don't think it applies to everyone. If you are a developer and want to implement machine learning algorithms. The books listed below are a good reference from which to learn step by step.

Machine Learning Roadmap

His roadmap on machine learning is broken down into 5 levels, and each level corresponds to a book that must be mastered. The 5 levels are as follows:

Level 0 (novice): Read Data Smart: Using Data Science to Transform Information into Insight . Advanced data flow that requires knowledge of spreadsheets, and some algorithms.
Level 1 (Apprenticeship): Read Machine Learning with R. Learn to apply different machine learning algorithms in R in different situations. A little knowledge of basic programming, linear algebra, calculus and probability theory is required.
Level 2 (Skilled Worker): Read Pattern Recognition and Machine Learning . Understand how machine learning algorithms work from a mathematical perspective. Understand and debug the output of machine learning methods while gaining a deeper understanding of machine learning concepts. Algorithms, good linear algebra, some vector integration, some algorithm implementation experience is required.
Level 3 (Master): Read Probabilistic Graphical Models: Principles and Techniques . Deep dive into advanced topics such as convex optimization, combinatorial optimization, probability theory, differential geometry, and other mathematics. Take a deep dive into probabilistic graphical models to understand when they should be used and how to interpret their output.
Leval 4 (Grandmaster): Just go and learn, and remember to give feedback to the community.

Colorado has reading suggestions for the book chapters listed in each level, and gives suggested top-level items to know about.

Colorado later republished a blog with a slight modification to this roadmap. He removed the last level and defined new levels as follows: Curious, Novice, Apprentice, Journeyman, Master. Machine learning curious at Level 0, he said, should not read books, but browse and watch top machine learning-related videos.

Overlooked topics in machine learning

Scott Locklin also read that Colorado blog and was inspired to write a corresponding article titled " Neglected Ideas in Machine Learning " ( nicely illustrated by Boris Artzybasheff ).

Scott认为Colorado给出的建议并没有充分的介绍机器学习领域。他认为很少有书籍能做到这一点，不过他还是喜欢Peter Flach所著的《Machine Learning: The Art and Science of Algorithms that Make Sense of Data》这本书，因为书中也接触了一些隐晦的技术。

Scott列出了书本中过分忽视的内容。如下所示：

实时学习：对流数据和大数据很重要，参见Vowpal Wabbit。
强化学习：在机器人方面有过讨论，但很少在机器学习方面讨论。
“压缩”序列预测技术：压缩数据发现学习模式。参见CompLearn。
面向时间序列的技术。
一致性预测：为实时学习精确估计模型。
噪声背景下的机器学习：如NLP和CV。
特征工程：机器学习成功的关键。
无监督和半监督学习。

这个列表很好的指出了机器学习中没有注意到的领域。

最后要说明的是，我自己也有一份关于机器学习的路线图。与Colorado一样，我的路线图仅限于分类/回归类型的监督机器学习，但还在完善中，需要进一步的调查和添加所有感兴趣的主题。与前面的“读这些书就可以了”不同，这个路线图将会给出详细的步骤。

英文出处：Jason Brownlee。

Metacademy的创始人Colorado Reed发布过一篇名为“机器学习练级攻略”，文中回答了初学者经常问他的一个问题：如何才能更好地学习机器学习？这篇文章将总结Colorado的建议并分步讲解他文中的路线图。

如何更好地掌握机器学习

Colorado是伯克利大学的在读博士，同时也是Metacademy的创始人。Metacademy是一个优秀的开源平台，许多专业人员共同在这个平台上编写wiki文章。目前，这些文章主要围绕着机器学习和人工智能这两个主题。

在Colorado的建议中，更好地学习机器学习的方法就是不断的通过书本学习。他认为读书的目的就是让心中有书。

一个博士在读生给出这样的建议并不令人惊讶，以前本站可能还推荐过类似的建议。这个建议还可以，但我不认为适用每个人。如果你是个开发者，想实现机器学习的算法。下面列出的书籍是一个很好的参考，可以从中逐步学习。

机器学习路线图

他的关于机器学习的路线图分为5个级别，每个级别都对应一本书必须要掌握的书。这5个级别如下：

Level 0（新手）：阅读《Data Smart: Using Data Science to Transform Information into Insight》。需要了解电子表格、和一些算法的高级数据流。
Level 1（学徒）：阅读《Machine Learning with R》。学习在不同的情况下用R语言应用不同的机器学习算法。需要一点点基本的编程、线性代数、微积分和概率论知识。
Level 2（熟练工）：阅读《Pattern Recognition and Machine Learning》。从数学角度理解机器学习算法的工作原理。理解并调试机器学习方法的输出结果，同时对机器学习的概念有更深的了解。需要有算法、较好的线性代数、一些向量积分、一些算法实现经验。
Level 3（大师）：阅读《Probabilistic Graphical Models: Principles and Techniques》。深入了解一些高级主题，如凸优化、组合优化、概率论、微分几何，及其他数学知识。深入了解概率图模型，了解何时应该使用以及如何解释其输出结果。
Leval 4（宗师）：随便去学吧，记得反馈社区。

Colorado针对每个级别中列出的书中章节阅读建议，并给出了建议去了解的相关顶级项目。

Colorado后来重新发布了一篇博客，其中对这个路线图做了一点修改。他移除了最后一个级别，并如下定义了新的级别：好奇者、新手、学徒、熟练工、大师。他说道，Level 0中的机器学习好奇者不应该阅读相关书籍，而是浏览观看与机器学习有关的顶级视频。

机器学习中被忽视的主题

Scott Locklin也阅读了Colorado的那篇博客，并从中受到了启发，写了一篇相应的文章，名为“机器学习中被忽视的想法”（文中有Boris Artzybasheff绘制的精美图片）。

Scott列出了书本中过分忽视的内容。如下所示：

实时学习：对流数据和大数据很重要，参见Vowpal Wabbit。
强化学习：在机器人方面有过讨论，但很少在机器学习方面讨论。
“压缩”序列预测技术：压缩数据发现学习模式。参见CompLearn。
面向时间序列的技术。
一致性预测：为实时学习精确估计模型。
噪声背景下的机器学习：如NLP和CV。
特征工程：机器学习成功的关键。
无监督和半监督学习。

这个列表很好的指出了机器学习中没有注意到的领域。

英文出处：Jason Brownlee。