Machine learning how to get started

In this article, I will introduce the perspective of people who come to learn how the machine from entry to the master, here we talk about experience, to talk about tools, and more to talk about methodology.

1. Getting started

As entering the white machine learning, you may be in addition to a curious mind and a lot of blood there is no nothing, of course, better hope you have a basic linear algebra, calculus and probability theory. You may have misgivings about: learned but forgotten. Do not worry, this kind of thing will not forget, but if used, will learn a science; or that you might really never learned, this do not worry, if you really want to learn, never too late now.

Well ado, we get to the introductory phase. Getting there are three main stages of tasks:

  1. Zhou Zhihua quick reading of "watermelon book" ;
  2. See "Machine Learning" on Andrew Ng Coursera ;
  3. Changeling ran algorithm .

After reading this you may not understand or have a lot of a lot of doubts. Do not worry, we answer one by one.

1.1 rapid reading "Watermelon book"

One problem: why should we vote "machine learning"?

Because this book is really good, very appropriate as an introductory books. Equally famous "statistical learning methods," although very good, but it is relatively difficult to white. Although the "watermelon book," above, there are formulas, but we are too complicated when the beginner can jump over first, and the book comes with watermelon example, it can be a good help students understand the working process of the algorithm.

Question two: Why should quickly read?

I emphasize here that fast, because fast is really important. Because if stretched too thin, if you start looking too thin and do not know what to read where, people are very likely to have psychological rejection. The most simple example, you think about when you back English words, how many times is from "abandon" the beginning of her own. Every time going to the final exam, or forty-six Or suddenly wanted to learn English, and come back to pick up the word in the book, how many times is not back to "b" to abandon the. This happens a major reason is the lack of a sense of accomplishment, because the dull sea of knowledge did not get immediate positive feedback to motivate yourself to move on and there have been cases of abandon. Especially for students delayed gratification is not strong, it is easy to get started at the beginning planted played a somersault.

Question three: how to quickly read? How to block?

Our emphasis here is not to say casually double quick read, but not to say that "quantum fluctuations in speed reading" (manual nugget). We want to take the issue before reading read, for example, the algorithm can be used to do, and so what input and output algorithm yes. One approach is better with the recording of each chapter to mind mapping context or thread . Decision tree, for example, after reading "Watermelon Book", we may have the following notes:

By way of recording the mind map can help us take notes in addition to sort out their own ideas, there are two very important benefits. One of them is: completion will not feel very empty . Imagine, if you quickly read a book without notes, then, it is not there will be a Pig to eat ginseng fruit feeling? (Heart OS:? Watermelon book talking about what) if you have a mind map, and see that this is a decision tree to determine if statements based on multiple classifier (of course, may also be aware of the decision tree can be used for regression), there are many property division can choose, at the time of the spanning tree by pruning strategies can reduce the over-fitting, when dealing with successive values, there are two algorithms when dealing with missing values encountered two problems, and so on.

Another benefit: help yourself iteration knowledge . We need to know that the product is iterative, knowledge of course, to iteration. This is why we have to be divided into introductory, advanced and proficient one of the reasons the three-step. Rome was not built in a day, one can also eat fat.

Of course, as a beginner, our mind map may not be so detailed, for example, read "watermelon book" linear classification chapter, we may be:

It does not matter so much, knowledge is constantly iteration, a mind map is only an auxiliary tool for us to learn.

As for time, you can hold their own, I spent less than a week to read, like three to four hours a day.

"Machine Learning" on Coursera 1.2 Andrew Ng

The question: Why is this course?

Here we must note, is "Machine Learning" "Machine Learning", not the CS229's Coursera. We recommend the following reasons:

  • Content basis, although it is in English, but there are Chinese subtitles;
  • Great quality, 120 000 ratings, with an average score of 4.9, to know that only 5 points out;
  • Each lesson is not long, and has a small after-school practice;
  • In addition to his own notes, but also can be found online and then the other students in class on the notes can be used as add their own notes;
  • "A task" "too cool" to be a little pain points, followed by Andrew Ng pushing formula, deepen understanding, for advanced preparation.

Some students may also recommend other teacher's class, and even a bunch of phone data. I just want to say, do not. We must reject the hamster disease (hamsters hoarding), many things would distract us, let us have a feeling will learn a lot of illusion. Less is more, learn boutique on the line.

Second problem: the course you have to fast?

Do not fast! We must seriously take notes (or mind map)! Note also done to see other people's notes, under their own supplement missing the point. With good course must treat the pious attitude to see, do not waste the opportunity for the first time to see the video, look back may not be so have patience.

Question three: How long does it take me?

A total of eleven courses, no course has estimated time, plus organize notes to see other people's notes, they can estimate it. Recommendations completed within two weeks.

Although this course is in English, but also the formula, but because Andrew Ng teacher is very good, so its difficulty is lower than our book watermelon look.

1.3 Changeling ran algorithm

Question one: Why should transfer package to run the algorithm?

There are three reasons:

  • Improve perceptions : I remember when just learning the computer, the teacher in particular, like to say a word to us: they write about, to run a race, feel it. Only hands-on look to personal experience;
  • The ability to improve code : computer engineering is a very complex subject, not only to learn the theoretical knowledge, but also have the ability to achieve good code;
  • A sense of accomplishment : I do not feel able to "predict" the unknown thing itself is a very rewarding thing?

Second problem: the data come from where?

Too many sources of data sets, but then again: do not have sick hamsters, what data we need to take what data can be. Compare the recommended data sources and Kaggle Ali Tianchi, you can go look at those entry-level competition, that running inside the data set.

Question three: What algorithm run?

Since it is a transferred packet, will be used most Sklearn, which can try the algorithm, a run through, and back to the other very fast speed, preferably covered commonly used algorithms such as: LR, SVM, Random Forest Wait.

Question 4: How long to spend time?

This depends on their basis, and if the code capability okay, then one day you can get, if the code capability is not ok, might take two or three days. If the code does not work, then the ability to remember other people to see the code written, rather than trying to fiddle blind.

1.4 summary

To sum up the entry phase of the three tasks of purpose :

  1. "Watermelon book": understanding and awareness algorithms, have a general understanding of machine learning;
  2. "Machine Learning": pushing under the leadership of the teacher Andrew Ng formula, deepen the understanding of the algorithm, to prepare for advanced study;
  3. Changeling run algorithms: algorithms have a perceptual awareness, and improve the ability.

Estimated time : 3 to 4 weeks.

The use of the first phase:

  • Tools : "Watermelon book"; "a statistical learning method"; mind map; Markdown is recommended to take notes or handwritten notes;
  • Methodology : positive feedback in a timely manner; reject hamster disease.

PS: If you have extra time can also look at the "collective wisdom programming," This book is very simple and has a very complete code to achieve, you can choose to read or hands-on according to their preferences.

2. Advanced

After completing the introductory phase of the study, we have successfully promoted from white to machine learning machine learning portal, when you see the word SVM, LR like never familiar. But the revolution is not successful, comrades still work.

That in advanced stage, what we learn it? And how to learn it?

We look at the task Advanced Learning:

  1. Learning "statistical learning methods," "Watermelon book" ;
  2. Learning algorithm commonly used outside of books ;
  3. Learning feature works play .

Similarly, we have to answer the next classmates doubts.

2.1 learning "statistical learning methods," "Watermelon book"

The question: why is learning "Watermelon book"?

Because we are not careful study of the first pass, the first pass we simply had it again. Yet, a good book should not only see it again. To believe that every reader will give me a different experience.

Question two: two Backwards?

I recommend that you read the "statistical learning methods" (see the second edition recommended), and a directory "statistical learning methods" to refresh their knowledge system, "watermelon book" supplemented "statistical learning method" to see what algorithm , and went to "watermelon book" inside look, add something. The reason for this is simple: a directory "statistical learning methods" build our knowledge is more reasonable.

Question three: the how to learn?

This stage can no longer be seen as the entry phase as complex formulas to skip. But I also know it is not a mathematical basis for many students, but part of the formula in the book there are many steps skipped, it is more difficult to read. This time we will borrow the wisdom of crowds:

  • First, we must learn to use CSDN / know almost other professional platform for those with high praise high amount of reading on these platforms Bowen are filtered out of high-quality articles time, I write a lot of very nice blog and writing than books to be more conducive to human understanding, of course, you can take a look at some of the papers, very classic example: Wu Xindong "data mining algorithm ten";
  • Secondly, there is a relatively well-known on github "Pumpkin book", the book is a "watermelon book" formula derived supplement, Zhou Zhihua teacher had recommended;
  • Finally, we make good use of mind mapping, when reading a book or blog, remember improve their own notes, this is really important!

In addition, the "statistical learning methods" thick, especially in the second edition, we have to learn to choose, to distinguish what algorithm needs to look at, what algorithm can be put off. For example, I want to do after the recommendation, ad CTR, risk control that sort of thing, then we can put MCMC, LDA and the like algorithm first put off, and the limited time on LR, SVM, decision tree on.

When it comes to learning a lot of mathematical principles, lack of Han make up what you can, do not lay down their machine learning to learn mathematics to go. Not to say that mathematics is not important, but we need to understand the goals and tasks of each stage, the goal-oriented helps us learn.

Question 5: learn to what extent?

I think at least to the extent of advanced learning include:

  1. Algorithm to learn and grasp the principles of mathematics, formula derivation learn and master advantages and disadvantages of the algorithm;
  2. Algorithms need to learn to understand the role of algorithms, usage scenarios;
  3. Grasp the details of the algorithm, for example, the SMO and SVM kernel function; tuning and improved K-means, how to determine the value of k, which demonstrate how the convergence; parameter updating method of the LR, L1, and L2 Tikhonov Regularization, how PARAMETERS optimization;

To LR for example, after learning of the mind map can look like this:

Although it seems content not more than before, but if you look carefully, there are still significant differences in the depth of content and entry phases.

Question six: about how long it took?

This should be based on the number of people selected for each algorithm, an algorithm can spend an average of two to four days to learn to master.

2.2 commonly used algorithms other than book learning

Question one: What algorithm?

More famous Boosting three brothers: XGBoost, LightGBM and CatBoost;

Second, FM, FFM is also well-known algorithm and its derivatives;

Question two: Why learn?

First of all, these algorithms are more commonly used algorithms, performance, robustness, and is often used in various large and small companies, and major competitions (Kaggle, Ali Tianchi, etc.);

Second, we can learn a science idea of ​​these algorithms, it helps us to tune the algorithm, for example, we use XGBoost time, so we know because learning the LightGBM continuous discrete features help enhance XGBoost generalization;

Question three: how to learn?

First of all, it is recommended to see the high quality of Bowen, the three algorithms have a rough idea and then to see the paper, the benefits of doing so is to read blog can have a general understanding of the algorithm, and then to see the paper more easily to start, in addition to the original papers is a pleasure to read;

Secondly, we do not recommend each school, there is spare capacity of students can learn a little time to pick things tight focus to learn, to know how to choose.

2.3 play

The question: Why play the game?

First of all, we are playing an important step leading to the practice of theory, data is the data of the game will be a lot of work is relatively clean, a great help in the transition from theory to practice in the process.

Secondly, the "no free lunch", that is no perfect algorithm can solve all problems. By playing the game, we can deepen the understanding of the algorithm, familiar with the areas of expertise of the algorithm.

Finally, play the game, we will learn of machine learning in a very important area - features works. We often say, characteristics determine the upper limit, but only algorithm to approximate upper bound, from this statement will be able to see the importance of features. So only the algorithm is not enough, but also learn to do the feature works. Moreover, different algorithms may require different features of the project, which will deepen our understanding of the algorithm.

Second problem: how to learn?

First, it is strongly recommended Kaggle, because there are a lot of big brother to share experiences post on Kaggle, not only the baseline algorithms, there will be data analysis, experience and mentality of the features of the project, it is highly recommended that you go to study and practice;

Secondly, the project is characterized by a more partial experience as a feature new to engineering students, you can play the game again, again look at books and blog: There are some well-known feature of the project book "Getting Started with engineering features and practice" there are many students finishing features works related blog can also look at;

Finally, then again, remember to organize your notes! Because the blog and books have a lot of duplicate content, and only after careful combing to form our own learning.

Question three: Estimated time

As short as a few books to see Demo to see that running the algorithm several weeks to get, or as long as needed to play a few games a year's time.

2.4 summary

To summarize advanced stage of the three tasks of purpose :

  1. Learning "statistical learning methods," "Watermelon book" : a deeper understanding of the algorithm by pushing a formula to master the details of the algorithm;
  2. Book learning algorithm commonly used outside : a complement to the book's contents, to learn and master the more popular algorithm;
  3. Play : the learning algorithm transition from theory to practice, while learning feature works.

Estimated time : The time will be relatively large fluctuations, but no matter how long it takes, can get to this stage of knowledge is very much.

Tools : "statistical learning methods"; "Watermelon book"; "Pumpkin book"; "feature works entry and Practice"; blog; mind mapping; paper;

Methodology : goal-oriented; learn to choose; of course need to remember the positive feedback.

3. Proficient

This stage is not good to talk about, because I'm also in this stage travel, but my personal point of view, has the following tasks:

  1. Depth understanding of the details of the algorithm, compared to the major difference algorithm ;
  2. Learning algorithm in industry applications, see the source code ;
  3. Reading papers ;
  4. Write blog .

Similarly, we are one by one to explain.

3.1 depth understanding of the details of the algorithm, the major difference in comparison algorithm

One problem: I will push formula, and also not detailed enough yet?

First of all, we are talking about pushing formulas often are able to understand the formula, and not know why such a push, and know what but not the why. For example, we will SVM to solve the dual problem by Lagrangian multiplier method, but we may not know Lagrange multiplier method was originally to solve the optimization problem of lighting, rather than inequality in SVM optimization problem, if from the Lagrange multipliers extend from inequality to equality optimize optimize it?

In addition there are: Why must translate into even question it? Why should SVM from min\; max\; f(x)converting to max\; min\; f(x)it? The original formula does not smell? So the formula just push deeper understanding of the algorithm, this stage of learning even ask why. Why should we do, then asked why more could we know: first converted to the dual problem can simplify the complexity of the issue, the original and sample dimensions and issues related to the number of samples, and now only, and the number of samples related to the; followed by conversion for the kernel function can be easily introduced into the dual problem, the map data from high dimension to low dimension, easier to find an optimum solution. .

Secondly, many seemingly simple things, as we learned in advanced stage: LR L1 and L2 paradigm introduced by heavy weights to reduce the value of the constraint overfitting risks. But we have not thought about why the small heavy weight can reduce the risk of over-fitting? Why L1 and L2 paradigm effective?

After this stage, we may know: reduce weight is to reduce the complexity of the algorithm, the prediction accuracy to prevent problems caused by fluctuations in the value of the variable. We also know, L1 and L2 paradigm is efficient because the concept of "zero mean Laplace distribution" and "zero-mean normal distribution" of such prior knowledge.

In short, the task at this stage is to ask yourself why. Why is this? Why is this thing? You did not do it? Just ask yourself why, with dialectical thinking to learn, to understand more about the details of the algorithm.

Question two: Why should compare the major differences algorithm?

One tree does not make a forest, we have to learn to fragmented knowledge from excel point line, and then spliced ​​into the line side, or even piled into a three-dimensional.

In the course of advanced learning, we know better than LR XGBoost effect, SVM mathematical theory is complete even if the industry is still not commonly used. But we do not know the why? Why XGBoost better than LR? Why SVM is not commonly used? Why Bayesian LR and applies different areas? Why is "no free lunch"?

Understand the differences between different algorithms, and the advantages and disadvantages, than say LR and SVM differences and relations, LR and maximum entropy algorithm differences and relations, and so on.

Through the study of the major differences with the contact algorithm, it can help us to a deeper understanding of the various algorithms.

Question three: to learn to what extent?

This is not to say, proficient stage eyes of the beholder wise see wisdom to LR for example, I think it should at least learn this stage:

3.2 learn industrial applications, look at the source code

Question one: Why learn industrial applications?

Because most of us will eventually go to the industrial sector, while the industrial sector thinking and our thinking in the school there is a difference.

First, the industry is more emphasis on the ground, rather than accuracy. Usually we play games, send papers and the like, are taking out of the algorithm is very complicated, but in industrial data interface to ten million or even one hundred million, and too complex algorithm is not conducive to our algorithm landing. Therefore, the industry is required to take a balance of forces in accuracy and calculation;

Secondly, the industry saw the speed, the algorithm must be fast. This time how the algorithm to achieve parallelism is particularly important. The more common methods are: data parallel and parallel features, you can understand the self.

Question two: Why look at the source code?

A look at the source code in order to deepen their understanding of the algorithm, and secondly Learning algorithm optimization strategy when landing or some small Trick, such as how to sort? With O (nlogn) fast discharge it? How to sample? How weighted sampling? How random numbers count? And so on. These are the need to learn from the source.

3.3 reading papers

This should not ask why now, many advanced machine learning / depth learning algorithms are derived from paper, a little old and a little similar GDBT + LR, GDBT + FM this combination (GDBT can be replaced XGBoost), there are younger DIN , DIEN and so on.

Reading papers help us understand the latest model of the algorithm, to keep up with cutting-edge research trajectory.

3.4 Write blog

In fact, write a blog advanced when you can write, even when you can also write the entry. Nothing more than notes issued to tidy up. Note, however, that we said here blog is written for others to read, if it is written that looks just how you write. But now your target audience is other readers, you should take a responsible attitude, your blog post to write.

The written blog said that at least let the reader know what you're talking about it. Many will find it a nuisance, but it is not, before you learn is an input, you do not know how much to digest, and the output is a write blog, you write the number on your behalf will be how much.

Some students might say: "I would but I can not write, I am the pot boiled dumplings."

Sorry, do not think so, in fact, you are not, there is this idea largely because they have unrealistic expectations down to earth for no reason.

In addition, write blog also organize your thoughts in a process that can help us learn good summarizing and deepen their understanding of the algorithm.

3.5 summary

Finally, we summarize proficient stage several tasks:

  1. Depth understanding of the details of the algorithm, the major difference in comparison algorithm : the point strung surface, to deepen understanding of the algorithm;
  2. Studied industrial applications, see Source : Understanding optimization of industrial applications and algorithms floor, deepen understanding of the algorithm;
  3. Reading papers : Understanding the forefront algorithm;
  4. Write blog : comb knowledge, deepen understanding of the algorithm.

Tools : Source; blog; paper;

Methodology : In line with the surface point series; knowledge input output;

4. Recommended learning materials

Greedy college aijiaoai "machine learning", this course covers 16 large classical algorithm explanation, the practical operation of 20 cases, eight major projects work. And a clearance to play the game easy to learn mode that lets you quick start machine learning in two months time.

 

More dry goods share, please pay attention to "greedy Technology AI" public number.

Guess you like

Origin www.cnblogs.com/txkjai/p/12463901.html