[Book at the end of the article] Advanced Practice of Machine Learning

The beginning of 2023 is a milestone and important stage for the explosion of artificial intelligence. Large models represented by GPT developed by OpenAI are very popular. The ChatGPT model in the NLP field is very popular and has triggered heated discussions among the people. The latest update of GPT-4 has achieved a leap forward in large-scale multi-modal models. It can accept image and text input at the same time and output correct text responses. On the one hand, many colleagues engaged in artificial intelligence are amazed by the excellent performance of GPT-4, but on the other hand, they are also worried about their careers. If the large model of "big computing power + strong algorithm" is the future development trend of artificial intelligence, then are traditional machine learning algorithms still useful in real business scenarios? Will it be replaced by large models sooner or later? I don't think so. Every business scenario has its own uniqueness. The most valuable thing about an excellent algorithm engineer is a thorough understanding and long-term accumulation of business knowledge. Business knowledge is like the roots of the big tree of machine learning projects, theoretical knowledge is like the branches of the big tree, and algorithm applications are like the leaves on the branches. Only when the roots are deep enough can this big tree branch out. Loose leaves, flowers and fruits. So far, large models have not yet reached the level of human algorithm engineers in understanding and thinking about ever-changing, complex and diverse business forms. Even if one day we can develop algorithm applications for various business scenarios based on large models, it will still be difficult to understand and think about rapidly changing, complex and diverse business forms. Algorithm engineers are required to have strong business capabilities and solid theoretical knowledge of machine learning to guide large models to effectively learn specific business scenarios.

Machine learning practitioners in the era of explosive growth of artificial intelligence are undoubtedly lucky. How to better integrate artificial intelligence into all aspects of human life is an important issue to be solved in this era. Teacher Wang Congying, a senior algorithm engineer at Didi International, found that many newcomers often memorize high-level model theories by heart when they first enter the industry, but they cannot figure out the way and grasp the key points when they are actually applied, resulting in good steel being useless. To the cutting edge, actual business gains cannot be achieved. It would be great if there was a technical book that could guide newcomers from entry to mastery, and from theory to practice. This would not only save the company the cost of training newcomers, but also leave space for newcomers to learn and grow on their own.

In line with this original intention, Teacher Wang spent nearly a year in his spare time to review and summarize the growth process and project experience of himself and his colleagues from Xiaobai to qualified algorithm engineers, and finally combined theory with practice The method is written in the book "Advanced Practice of Machine Learning: Computational Advertising, Supply and Demand Forecasting, Intelligent Marketing, and Dynamic Pricing". He hopes that through his experience, he can truly help readers who are interested in machine learning algorithms.
Insert image description here
In this article, we excerpt some content from the book and briefly introduce causal inference, an emerging branch in the field of machine learning that everyone is more concerned about.

The JD.com link for the book is here:https://item.jd.com/14256304.html Click here to purchase directly

causal inference

Causal inference is an emerging branch in the field of machine learning in recent years. It mainly solves the problem of "which came first, the chicken or the egg". Therefore, the main difference between causal inference and correlation is: Causal inference attempts to infer the impact of variable X on the result Y through changes in variable X, while correlation focuses on expressing trend changes between variables, such as two There is a correlation between the two variable pictures. If the picture increases with the increase of the picture, it means that the picture and the picture are positively related. If the picture decreases with the increase of the picture, it means that the two are negatively related. Therefore, there is an essential difference between causality and correlation. In order to help readers better understand, here is an example:
A study shows that people who eat breakfast are more likely to eat breakfast than those who do not. People who eat breakfast weigh less, so "experts" conclude that eating breakfast can help you lose weight. But in fact, there may be a correlation between eating breakfast and being underweight, not causation. People who eat breakfast may have a series of healthy lifestyles such as regular meals, regular exercise, adequate sleep, etc., which ultimately lead to their lighter weight. Figure 1 shows the confounding factors in causal inference, describing the relationship between a healthy lifestyle, eating breakfast, and being underweight.

Insert image description here
Obviously, people with a healthy lifestyle will eat breakfast, and a healthy lifestyle will also lead to being underweight. It can be seen that a healthy lifestyle is the common cause of eating breakfast and being underweight. It is precisely because of the existence of such common reasons that we cannot easily conclude that there is a causal relationship between eating breakfast and being underweight, so we believe that the conclusions of the "experts" are hasty. There is only correlation between eating breakfast and weight loss, not causation, and the common cause that blocks the inference of causality is called a confounding factor. As shown on the right side of Figure 1, eliminating confounding factors, finding the causal relationship between two variables, and quantifying the degree of change in a certain independent variable X that affects the change in the dependent variable Y are the main contents of causal inference.

Causal inference of past and present lives

Looking at the development history of causal inference in the fields of statistics and machine learning, we have to mention two great figures. One is Donald Rubin, who proposed the famous RCM (Rubin Causal Model, equivalent to the potential causal framework) in 1978. The other one is Judea Pearl who proposed the Causal Diagram framework in 1995. The Nobel Prize in Economics in October 2021 will be awarded to Joshua D. Angrist and Guido W. Imbens, who have made outstanding contributions to the analysis of causality. Their research on causality is based on the potential results framework proposed by Rubin. Rubin’s causal inference The impact on the field is evident. Another major contribution of Rubin is to propose the PSM (Propensity Score Matching) framework to solve the problem of confounding factors in observation data. The Causal Diagram framework proposed by Pearl is completely separated from Rubin's RCM framework and uses directed acyclic graphs to visually represent the causal relationship between variables. The Causal Diagram idea was proposed in 2011 for research on causal inference. Spirit Award. Two giants in the field of causal inference created two different frameworks in the field. Pearl proved in 2000 that the two frameworks are equivalent, but Rubin disagreed with his view. Rubin believed that the potential results framework could be clearer. To express the problem of causal inference, the potential results framework is currently a more commonly used analytical framework in the field of causal inference than the causal diagram. The following will introduce the analytical perspectives of the two causal inference frameworks.

1. Potential Outcome Framework

Before introducing the potential results framework, let us first list two assumptions that need to be stated to describe the individual causal effects. In addition, it should be noted that in order to help everyone get started faster, this article only describes the binary treatment, that is, the individual only accepts the treatment and does not. Handle two situations and correspond to the results of the two processing methods.

Insert image description here

However, in the real world, individual images either receive processing or do not receive processing at the same time. It is impossible to receive processing and not receive processing at the same time. Therefore, the individual causal role is not identifiable, and the individual observation data resultspicture

How to conduct causal inference when the causal role of an individual is known and cannot be identified? Perhaps an effective solution is to transfer the identification of causal effects from individuals to the overall body, so the concept of average treatment effect (ATE, Average Treatment Effect) came into being. The average causal effect no longer compares the causal effects of individuals, but compares the potential results of two groups of groups under different treatments. In addition to receiving different treatments, the two groups must have homogeneous attributes. The average calculated in this way Causal effects can only be unbiased. Randomized controlled trials (RCT) are the basic experimental methods to ensure unbiasedness in two groups. All the data are randomly divided into the experimental group (Treatment Group) and the control group (Control Group), where T=1 for the experimental group and T=0 for the control group, then the formula of the average causal effect is as follows:

Insert image description here

Among them, Y(1) and Y(0) are respectively the results of the experimental group under the condition of receiving treatment and the results of the control group under the condition of not receiving treatment. At this point, the basic theoretical knowledge of causal inference under the framework of potential results has been explained. It can be summarized in the following two points.

  1. Randomized controlled trials ensure homogeneity of groups.
  2. Shift from unassessable individual causal effects to estimating overall average causal effects.

Are randomized controlled trials all in all? In fact, this is not the case. Imagine a question like this: I want to evaluate the causal effect of anti-cancer drug A on patients with cancer. Is it still suitable to conduct a randomized controlled trial in this scenario? The answer is obviously no. First of all, cancer is a serious disease. For humanitarian reasons, it is impossible to completely randomly select a control group without anti-cancer drug intervention. Secondly, even if cancer patients with dedication agree to participate in a randomized controlled trial, in In the medical context, long experimental periods and high costs are also the biggest drawbacks of randomized controlled trials. Through the above example, we know that not all scenarios in real life are suitable for randomized controlled experiments, so researchers try to achieve the effect of randomized controlled experiments by performing a series of processing on observation data. The most famous one is the tendency proposed by Rubin. Propensity Score Matching (PSM).

2. Structural Causal Model (SCM)

The structural causal model is based on a graph structure to describe the causal relationship between two variables. Therefore, before introducing SCM, let’s first understand the Bayesian network. Bayesian network is a probabilistic graphical model based on Directed Acyclic Graph (DAG). It cannot express causality by itself. It expresses the correlation between variables, but the Bayesian network Directed acyclic graph is the graph structure basis of structural causal model, and the probability calculation method of Bayesian network is also the inference basis of structural causal model.
A directed acyclic graph is composed of nodes and directed edges. The upstream of the directed edge is the parent node, and the direction pointed by the directed edge is the child node. The parent node of a node in the DAG and its non-child nodes are independent. According to the total probability formula and conditional independence, the joint probability distribution of all nodes in a directed acyclic graph can be expressed as:

Insert image description here

The picture is the parent node of all the pictures pointing to it. In order to better help readers understand the joint distribution expression in the directed acyclic graph, a specific DAG example is given here, as shown in Figure 2.

Insert image description here

Figure 2. Example of directed acyclic graph

According to the conditional independence of directed acyclic graphs and the formula of joint probability distribution, the joint distribution of Figure 2 can be expressed as:

Insert image description here

Each directed acyclic graph produces a unique joint distribution, but a joint distribution does not necessarily correspond to only one directed acyclic graph. For example, the joint probability distribution of a picture may be a picture or a graph structure picture, and The causal relationships between the two graph structures are completely opposite, which is why Bayesian networks are not suitable for causal models. In order to transform DAG into a causal graph that can express causal relationships, the do operator needs to be introduced. The do operator here expresses an intervention. The picture indicates that all the directed edges pointing to the node picture are cut off, and the node picture is assigned a constant value. After the do operator intervenes, the joint probability distribution of the DAG changes. Expressed in the following form:

Insert image description here

Still taking Figure 2 as an example, assuming that the do operator intervenes on the node picture, then the joint probability distribution of the DAG after the intervention is expressed as:

Insert image description here

To sum up, the DAG graph with the do operator added can express the causal relationship, and its average causal effect formula is as follows:

Insert image description here

The DAG graph with the do operator has the soul of causal inference, but a new problem arises. Not all practical problems give explicit graph structures. Most of the real situations are that neither the graph structure nor all variables can be observed. In order to solve the above problems, Pearl proposed the method of backdoor criterion. Before introducing the backdoor criterion, let's first look at the concept of d-separation.

The full name of d-separation is Directional Separation, which is a method to determine whether variables are independent. For cause-and-effect diagrams that are mainly graph structures, there are three common path structures as shown in Figure 3:

Insert image description here

Figure 3. Three path structures of cause-and-effect diagrams
In the three path structures of chain, fork, and inverse fork in Figure 3, A in the inverse fork structure , C are naturally independent of each other, and B is also called a collider. It has a chain or fork structure. Using B as a condition can block the relationship between A and C, thereby realizing that A and C are independent of each other. D-separation is a blocking operation on different path structures in order to achieve the purpose of variable independence. The specific d-separation rules are summarized as follows.

  1. When there are two arrows on a certain path pointing to a certain variable at the same time, then this variable is called a collider, and this path is blocked by the collider.
  2. If a path contains non-collider, then this path can be blocked when the non-collider is the condition.
  3. When a path is conditioned on a collider, not only will this path not be blocked, but it will be opened.

What needs to be noted here is that using a variable as a condition refers to specifying the value of a certain variable. For example, using the age variable as a condition means specifying age as 0 or 1.

After understanding that the d-separation rule can be blocked based on a certain variable to achieve independence between variables, we can combine the backdoor criterion to eliminate confounding factors and conduct causal inference on the causal diagram of the unknown structure. Before clarifying the backdoor guidelines, you need to understand the concepts of backdoor paths and frontdoor paths. The backdoor path from variable X to variable Y is the path that connects X to Y, but the arrow does not start from X. The corresponding front door path is the path that connects X to Y and the arrow starts from d-separation blocks all backdoor paths between X and Y, then we believe that the causal relationship from X to Y can be identified, and the factors that block the backdoor paths are called confounding factors. At this point, the method that knows the backdoor criterion does not need to observe all variables, but only needs to observe which variable can eliminate the backdoor path as a condition, so that the causal relationship between X and Y can be identified.

3. Summary

Whether it is a potential outcome framework or a structural causal model, causal inference is mainly the process of inferring outcome Y from cause X. In order to ensure that there are no confounding factors between cause Randomized controlled trial. When conditions do not allow for a randomized controlled experiment, the effect of confounding factors on cause X can be eliminated by processing the observation data.

Insert image description here

In addition to high-quality content tailored for developers, this book has also been recognized and recommended by many professionals.

  • Xiao Yanghua, Professor, School of Computer Science and Technology, Fudan University, Head of Knowledge Factory Laboratory

The author introduces the relevant theories and technologies of machine learning in a simple and easy-to-understand manner based on practical application scenarios. It has detailed cases, rich codes, and strong operability. It is a very useful reference book for novices and practitioners of machine learning.

  • Liu Hongyan, Professor, Department of Management Science and Engineering, Tsinghua University

When you open this book, you will not feel like you are reading a textbook, nor will you feel the boredom of writing code. It will feel like you are solving practical problems with familiar old friends in person, and simplifying the complexity to help readers. Friends quickly build a knowledge system and master machine learning technology.

  • Fu Xiangling, professor and doctoral supervisor at School of Computer Science, Beijing University of Posts and Telecommunications

This book is written by the author based on his theoretical accumulation during his master's degree at Beijing University of Posts and Telecommunications and his practical experience after working in an Internet company. It has a complete theoretical system and real Internet project cases to help readers quickly grow into excellent algorithm engineers. Technical bible.

  • Lu Quan, Senior Director of Alibaba

I have known the authors of this book for many years and admire their contributions to the application of machine learning algorithms in the industry. Now the authors have unreservedly summarized their years of accumulated practical experience, providing a valuable reference and guide for young people seeking continuous improvement in the field of machine learning.

  • Qin Zhiwei Chief Scientist, Lyft

This book has a good balance and coverage from basic algorithm theory to core business cases in the industry. Even experienced algorithm engineers will definitely feel that they have benefited a lot from the expansion of their ideas after reading this book carefully. It is an indispensable practical reference book on machine learning algorithms on an engineer's bookshelf.

  • Wang Liang, senior algorithm expert at Alibaba

After decades of development, machine learning technology has come out of the laboratory and been deeply integrated with more and more application scenarios, constantly leading the development of search/advertisement/recommendation and other fields. We believe that machine learning will become an integral part of all walks of life in the future. A fundamental tool for every industry. This book combines machine learning development tools and machine learning application cases in the industry, allowing readers to quickly get started with practice and master this technology.

  • Xia Zhen CEO of Hangzhou Array Technology Co., Ltd.

Machine learning technology is developing rapidly, and many teaching materials are slightly out of date for industry workers. This book is one of the few reference books on the market that thoroughly explains the application of machine learning in several major industries and is rich in cases. The author uses his strong theoretical foundation and many years of practical experience in the industry to summarize it. It is recommended as a must-have for industry personnel!

  • Pujie UBiX CTO

Machine learning technology is the core of artificial intelligence and has been widely used in various industrial fields. This book starts from the basic theory of machine learning and ends with the four core practical scenarios in the industry. On the one hand, it can well help readers master the workflow of machine learning in the industry. On the other hand, it can help improve readers' machine learning practical capabilities by combining it with actual combat scenarios. I sincerely recommend this book to readers who want to learn about or engage in the field of machine learning.

  • Wei Wei, Associate Professor, University of Illinois at Urbana-Champaign

This book combines theory with practice, systematically introducing the background, models, and application practices of machine learning in different business scenarios. The design of the cases in the book is very inspiring. Depending on the reader's background, this book can be used as an introductory manual for practical operations or as a preparatory material for advanced theoretical study.

Insert image description here

Guess you like

Origin blog.csdn.net/weixin_46626339/article/details/134601103