A Data Analyst's Machine Learning Journey

guide

The author's last role was a data analyst, during which time I focused on brushing up all the skills necessary for data analysis posts, including ESP basic package (Excel + SQL + Python), Python number three swordsman (Numpy + Pandas + Matplotlib ), and then learned Hive as an extension of SQL, Spark as an extension of Pandas, and Tableau as an extension of Matplotlib. It should be said that for a pure data analysis position, such a technology stack is sufficient, and then the continuous deepening of business knowledge is the kingly way. However, as the word "volume" in the current general environment is at the forefront, the requirements for data analysts' technology stacks are also rising, so machine learning algorithms have gradually become the first target of the majority of data analysts!

Today, this article will sort out the author's experience in learning machine learning algorithms. The corresponding series of previous articles include:

A data analyst's Python learning process

A data analyst's SQL learning journey

Written in 1024: The road to practice of a data analyst !

Before conquering the machine learning algorithm, the author already has a proficient Python foundation (mainly Numpy to be proficient) and a systematic mathematical foundation, but there are also basic concepts and principle formulas that have been forgotten due to a long time. Taking this as a major premise, my learning process of machine learning algorithms can be mainly divided into the following three levels:

  • Watch videos, focusing on quickly establishing a macro framework and system cognition

  • Reading books, focusing on the principles and practical application

  • Learning documents, focusing on checking for leaks and filling in vacancies and continuous deepening

It is worth pointing out that these three levels do not necessarily have to be executed sequentially or have a strict sequence, and cross iterations are often more important under normal circumstances.

Note: This article is purely a recommendation after the author's study, and the video and book recommendations involved in it are not mixed with any ingredients.

01 Video

Machine learning is a very popular subject at the moment, and the corresponding job titles of Internet companies such as data mining engineers, algorithm engineers, and machine learning engineers are also highly popular. Demand determines the market, and there are countless learning resources on the Internet. However, the more resources there are, the easier it is for beginners to have a mixed understanding. Therefore, the author recommends 3 sets of videos here, and the author has systematically reviewed them at least once, taking into account the principle explanation and practical application:

  1. "Data Mining: Theory and Algorithms", by Yuan Bo, a teacher from Tsinghua University's Institute of Advanced Research, was published on Xuetang Online, and the current number of applicants is 110,000+. The course focuses on explaining the principles of machine learning algorithms from a theoretical level, tracing back to the source, and explaining the principles of machine learning algorithms in a philosophical and dialectical manner. This set of courses has been rated as a national-level excellent course. ps: Although the official post is on XuetangX, a certain website can be viewed everywhere!

  2. "Python3 Introduction to Machine Learning", the masterpiece of teacher bobo in MOOC. Compared with the previous teaching video of pure school school, teacher bobo's course pays more attention to the realization of Python3, but also takes into account the theoretical height and even philosophical thinking. More importantly, teacher Bobo's teaching style is also quite popular with the author. The lessons are good, but unlike the first set of videos which was completely free to learn, this is a paid set of lessons.

  3. "Cai Cai's Machine Learning sklearn Classroom", maybe many people who are engaged in machine learning work should know about it. Cai Cai is a teacher of a training institution, and from the name of the video course, it can be seen that this set of videos pays more attention to Machine learning combat, more precisely, machine learning combat using sklearn. The course explanation is very detailed, ranging from the principle to the parameter setting, which is very suitable for beginners. There is also a fee for this set of courses, but he actually published it publicly on a certain website, so he can also study there to show support.

02 Books

Similar to the current situation where machine learning teaching videos abound, related books actually abound. Also in order to alleviate the fear of choice, here the author recommends two books, both of which have been read almost completely, so I have a certain right to speak:

  • "Machine Learning", author: Zhou Zhihua. This is a machine learning Bible-level book known as the watermelon book. The whole book uses the classification of watermelon as the starting point and lead, explaining the main algorithm in a simple way, and with the cover of the watermelon, hence the name Watermelon Book. I personally think that this book is quite good at explaining machine learning at the theoretical level, and it should also be at the top of the domestic books. It seems that any introduction is superfluous.

Of course, there is also the corresponding "Statistical Learning Method" by Li Hang, but since the author has not read it, I will not make a comparison.

  • "Python Data Science Handbook", one of the Turing animal books. In fact, the organizational structure of the book is very simple: mainly 5 chapters. In addition to the introduction of the use of ipython and jupyter by the head, the following 4 chapters introduce the basic use of 4 libraries such as Numpy, Pandas, Matplotlib and Sklearn. Although the structure is simple and the content is not too new, I personally think that such a book structure is very suitable for a data analyst who wants to learn machine learning algorithms. Compared with the watermelon book, which focuses on theoretical promotion, this book is more like providing a basic demo of commonly used algorithms, which can be regarded as how to use machine learning algorithms in sklearn.

03 Documentation

Watching videos and reading books are commonly used introductory learning methods, but reading short and essential documents is actually an efficient learning method. When it comes to learning documents, the first thing you think of may be some public account articles, blogs on some platforms, and learning experiences compiled by some big guys, but the best documents for learning machine learning are actually sklearn’s official documents (of course, many The official documentation of other technologies can be called the best, such as MySQL, but not entirely so), even compared to the official documentation of Numpy and Pandas, I really think that the official documentation of sklearn is simply too well written Already: There are not only an introduction to the algorithm principle, but also a detailed description of the API parameter configuration. More importantly, the organizational structure is very clear, and the explanation is also very easy to understand. It is even said that the official documents can be printed and published directly after being organized into a book!

sklearnofficial documentation

  • English documentation: https://scikit-learn.org/stable/

  • Chinese document: https://www.cntofu.com/book/170/index.html

Attached are personal sklearn study notes:

Finally, compared with mathematical analysts whose technical tools are more clear and fixed, the learning path of machine learning algorithms is obviously longer and steeper. If you consider deep learning and even reinforcement learning, then its endless update iterations are enough to call It's hard, but as a technical person with pursuit, shouldn't self-growth be like this!

Related Reading:

Guess you like

Origin blog.csdn.net/weixin_43841688/article/details/119988710