Causal Science Weekly Issue 5: OOD Generalization

In order to help everyone better understand the latest scientific research progress and information in causal science, our causal science community team compiled the fifth issue of "Causal Science Weekly" this week to push recent papers and information on causal science that are worthy of attention. The theme of this issue is "OOD Generalization". In addition, in the "Recent Community Activities" column, we will introduce the theme report "Stable Learning: Discovering the Commonality of Causal Inference and Machine Learning" delivered by Cui Peng, associate professor of Tsinghua University, at the NeurIPS 2020 China pre-lecture meeting. Excellent point in "Basics".

Authors of this issue: Yan Hedong, Xu Xiongrui, Chen Tianhao, Yang Ercha, Gong Heyang, Zhang Tianjian, Fang Wenyi,  Guo Ruocheng

1. Introduction to OOD Generalization

Out-of-distribution(OOD) generalization is a kind of Systemactic generalization. Many people's attention to OOD issues started with Bengio. Traditionally, some practices believe that OOD=Novelty Detection=Outlier Detection, but this article does not discuss OOD at all. detection related content.

Figure 1: Bengio's current number one research interest

Bengio said that half of traditional machine learning is based on the independent and identically distributed (IID) data distribution assumption, but the actual situation is that in many real scenarios, the data we are interested in are often data that appear very rarely, which means we need to pay more attention to more when processing. There are more OODs, that is, fewer distributions appear in the data, which requires us to have new data assumptions in the machine learning algorithm. Especially from the perspective of the Agent, it is important to consider the factors that affect changes in data distribution and the composability of different distributions.

Figure 2: From IID to OOD (Bengio)

Meta-learning (learning learned models) is a way to achieve rapid migration of machine learning to OOD and models. When it comes to OOD, the reason is that there are changes in behavior, or the intervention of user behavior in the data. The knowledge representation of meta-learning, eg the causal structure between variables, can effectively help OOD generalization. The challenge here is how to learn the causal knowledge of unknown intervening variables.

As a specific direction of causal combination machine learning, here are 6 Causal + OOD papers recommended by Guo Ruocheng, a PhD candidate at Arizona State University:

  1. Peters, Jonas, Peter Bühlmann, and Nicolai     Meinshausen. "Causal inference using invariant prediction:     identification and confidence intervals." arXiv preprint arXiv:1501.01332 (2015).

  2. Rothenhäusler, Dominik, Nicolai Meinshausen, Peter     Bühlmann, and Jonas Peters. "Anchor regression: heterogeneous data     meets causality." arXiv     preprint arXiv:1801.06229 (2018).

  3. Rojas-Carulla, Mateo, Bernhard Schölkopf, Richard     Turner, and Jonas Peters. "Invariant models for causal transfer     learning." The Journal of     Machine Learning Research 19, no. 1 (2018): 1309-1342.

  4. Arjovsky, Martin, Léon Bottou, Ishaan Gulrajani, and     David Lopez-Paz. "Invariant risk minimization." arXiv preprint arXiv:1907.02893     (2019).

  5. Krueger, David, Ethan Caballero, Joern-Henrik Jacobsen,     Amy Zhang, Jonathan Binas, Remi Le Priol, and Aaron Courville.     "Out-of-distribution generalization via risk extrapolation     (rex)." arXiv preprint     arXiv:2003.00688 (2020).

  6. Ahuja, Kartik, Karthikeyan Shanmugam, Kush Varshney, and Amit Dhurandhar. "Invariant risk minimization games." ICML 2020.

2. Cui Peng talks about OOD 

Cui Peng is an associate professor in the Department of Computer Science and Technology at Tsinghua University. His main research directions are Stable learning, OOD generalization, fairness, counterfactual prediction. After reading and translating the above-mentioned papers, we specially invited Cui Peng to talk about his impressions. Specifically, as follows:

1) Bengio chose to try to use cause and effect to solve the OOD problem and obtain Systematic generalization capabilities. The Stable learning/prediction of Cui Peng’s team is a series of specific work of Causal + OOD.

2) Teacher Cui Peng emphasized that Testing distribution should not be assumed for OOD issues, so strictly speaking, the above article 3 may not be in the category of OOD.

3) The first two articles incorporate information from causal diagrams to solve the OOD problem, while the last three articles are purely based on representation learning. Stable learning takes into account both Causal implication and Learning frameworks, and is a fusion of these two ideas.

4) Data sets that support OOD research are very important for beginners. Teacher Cui Peng is actually recommending their recent work, the NICO data set, to researchers who are concerned about OOD issues. This data set has been introduced in the previous issue.

For papers on Stable Larning, please check out our weekly magazine: " Cause and Effect Science Weekly" Issue 3: Cause and Effect Powers Stable Learning

Next, is our summary translation of 6 Causal + OOD papers, including commentary and interpretation by Gong Heyang of the Causal community.

    

3. Paper translation and interpretation

3.1 Methods with causal graphs

This part of the paper assumes in some sense that the causal structure is known.

Peters, Jonas, Peter Bühlmann, and Nicolai     Meinshausen. "Causal inference using invariant prediction: identification and confidence intervals." arXiv preprint arXiv:1501.01332 (2015).

Abstract translation: What is the difference between causal and non-causal models when it comes to prediction? Suppose we intervene in the predictor variables or change the entire environment. The predictions of causal models will work in general and with the intervention of observational data. In contrast, if we actively intervene in the variables, the predictions of the non-causal model can be very wrong. Here, we propose to exploit this invariance of predictions under causal models for causal inference: under different experimental settings (e.g., various interventions), we pool all cases that do show differences in prediction accuracy across settings and interventions. Transgender model. A causal model will have a higher probability of being one of these models. This approach produces valid confidence intervals for causal relationships in fairly general cases. We will examine the example of structural equation modeling in more detail and provide sufficient assumptions under which the set of causal predictors can be made identifiable. We will further investigate the robustness of our approach in the case of model misspecification and discuss possible extensions. We study the empirical properties of a variety of data sets, including large-scale genetic perturbation experiments.

Translator: Yan Hedong 

Gong Heyang’s interpretation:

  • If we intervene in some variables or change the entire environment, the prediction effect of the causal model will still be good, but the prediction effect of the non-causal model will not necessarily be good.

  • This article explores the prediction invariance property of the model, and the causal model is likely to have the prediction invariance property.

  • The main contributions are new methods, new concepts, and new theories.

Rothenhäusler, Dominik, Nicolai Meinshausen, Peter     Bühlmann, and Jonas Peters. "Anchor regression: heterogeneous data meets causality." arXiv     preprint arXiv:1801.06229 (2018).

Translation summary: We consider the problem of predicting a dependent variable from a set of covariates in a data set that has a different distribution than the training set. When a large number of variables are affected by the intervention in the new distribution, or when only a part of the variables are affected, the interference is very strong, the causal variable is optimal in terms of prediction accuracy. If there is a shift in the distribution of the training set and the test set, the causal parameters may be too conservative and perform poorly in the above tasks. This inspired the proposal of the anchor regression method, which uses exogenous variables to solve a relaxation of the causal minimum-maximum problem through a modification of the least squares loss. We proved that the prediction performance of the estimator is guaranteed. Specifically, the prediction under the linear shift of the distribution is distribution robust and is still valid even when the instrumental variable assumption is no longer satisfied. We find that if anchor regression and least squares provide the same answer (anchor stability), then the ordinary least squares parameters are stable under certain distribution changes. Anchor regression has been empirically shown to improve reproducibility and avoid the effects of distribution changes.

Translator: Xu Xiongrui

Gong Heyang’s interpretation:

The anchor variable can either be used to encode heterogeneity “within” a data set or heterogeneity “between” data sets. 

Rojas-Carulla, Mateo, Bernhard Schölkopf, Richard     Turner, and Jonas Peters. "Invariant models for causal transfer learning." The Journal of Machine Learning Research 19, no. 1 (2018): 1309-1342.

Abstract translation: Transfer learning methods attempt to combine knowledge from several related tasks or domains to improve performance on the test set. Inspired by causal approaches, we relax the usual covariate shift assumption and assume that it holds for a subset of predictors: given this subset of predictors, the conditional distribution of the target variable is invariant across all tasks . We show how this hypothesis is informed by a causal domain perspective. We focus on the Domain Generalization problem, where no examples from the test task are observed. We show that in adversarial settings, using this subset for prediction is optimal in domain generalization; we further provide examples where the tasks are sufficiently diverse such that the estimator outperforms pooling even on average The data. We also introduce a practical approach that allows automatic inference on the above subsets and present the corresponding code. We present the results of this method on synthetic datasets and gene deletion datasets.

Translator: Chen Tianhao


Gong Heyang’s interpretation:

The main contents include

  • The assumption of the problem is that the conditional distribution of the results given a certain part of the variables remains unchanged ← covariate shift assump

  • We can use adversarial learning to select the correct variables with proof

3.2 Methods without causal graphs

This part is based on representation learning.

Arjovsky, Martin, Léon Bottou, Ishaan Gulrajani, and     David Lopez-Paz. "Invariant risk minimization." arXiv preprint arXiv:1907.02893     (2019).

Abstract translation: We introduce a learning paradigm for learning invariant correlations under multiple training data distributions, called invariant risk minimization (IRM). To achieve this goal, IRM can learn a data representation so that the optimal classifier based on this data representation performs better under different training data distributions. Through theory and experiments, we show how the invariance learned by IRM is connected to the causal structure of the data generation mechanism and improves the generalization ability in the OOD case.

Translator Zhang Tianjian’s note:

The multiple training data distribution here means that the joint distribution will be different in each environment  . To the experimenter, the corresponding environment from which each data point was taken is known.

Gong Heyang’s interpretation:

  • Basic idea spurious correlations do not appear to be stable properties 

  • The main contribution of the article is: OOD new paradigm for multiple training envs. 

    • Image pixels are not causal variables, so we learn them automatically.

    • Propose the IRM principle: To learn invariances..., find a representation such that.... 

  • The math behind the IRM is as follows:

Krueger, David, Ethan Caballero, Joern-Henrik Jacobsen,     Amy Zhang, Jonathan Binas, Remi Le Priol, and Aaron Courville.     "Out-of-distribution generalization via risk extrapolation     (rex)." arXiv preprint     arXiv:2003.00688 (2020).

Abstract translation: Generalization to data sets outside the training distribution is a current challenge in machine learning. A weak form of out-of-distribution (OoD) generalization is the ability to successfully interpolate between multiple observed components. One way to achieve this is through robust optimization, which means minimizing the worst case among convex combinations of training distributions. However, a stronger form of OoD generalization refers to the ability to extrapolate outside the observed distribution at training time. In order to pursue strong Ood generalization, we introduce risk extrapolation (REx). REx can be seen as promoting the robustness of linear combinations of training risks by promoting strict equality between training risks. We conceptually show how this principle allows extrapolation and demonstrate the effectiveness and instance scalability of REx on different OoD generalization tasks.

Translator: Fang Wenyi

Gong Heyang’s interpretation:

  • We propose a new principle, REx, similar to IRM, to solve "Spurious" features that are predictive in training, but not in a test time, such as the color of handwritten numbers.

  • The basic math is as follows:

Risks in different fields are linearly combined, and the combination coefficient is allowed to be negative, through which extrapolation is achieved and the worst case scenario in this combination is optimized. For Variance REx (V-REx), variance is used for regularization, which is more stable than MM-REx in terms of results.

Ahuja, Kartik, Karthikeyan Shanmugam, Kush Varshney, and Amit Dhurandhar. "Invariant risk minimization games." ICML 2020.

Abstract translation: When the difference between the test distribution and the training distribution in the environment is caused by spurious correlations, the standard risk minimization paradigm of machine learning becomes in jeopardy. Through multi-environment data set training, while looking for those predictive indicators that are invariant, and focusing the model on features that have a causal relationship with the results, the impact of false features can be reduced. This study proposes the concept of constant risk minimization to find combinatorial games under different environments in Nash equilibrium. The researchers proposed a simple training algorithm that uses optimal reflection dynamics, and the experimental output is smaller variance, approximate or even better empirical accuracy than Arjovsky et al. (2019) in the two-layer optimization problem. The key theoretical contribution of this study shows that in any finite number of environments, including those of nonlinear classification and transformation, the Nash equilibrium set of the proposed strategies is equivalent to the set of invariant predictors. In summary, this research method is the same as Arjovsky et al. (2019), which requires retaining a large number of environment sets to ensure generalization. The algorithm proposed in this paper supplements existing research on successful game-theoretical machine learning algorithms such as Generative Adversarial Networks (GAN).

Translator: Yang Ercha

Gong Heyang’s interpretation:

  • Variation of IRM + Nash Equilibrium

4. Recent community activities

At the NeurIPS 2020 China pre-lecture held by Zhiyuan Community on November 27, Cui Peng, a young scientist from Zhiyuan and associate professor of the Department of Computer Science and Technology at Tsinghua University, delivered a speech titled "Stable Learning: Discovering the Common Foundation of Causal Inference and Machine Learning "In his speech, Cui Peng said, "We will discuss how to view causal reasoning from the perspective of machine learning." 

In this speech, Cui Peng shared the issue of how to "combine causality with machine learning" based on the relevant research work of his research group in recent years. 

Since 2016, Cui Peng's team began to conduct in-depth research on how to combine causal reasoning with machine learning, and eventually formed the research direction of "stable learning". From a macro perspective, stable learning aims to find common ground between causal reasoning and machine learning to address a range of unsolved problems. 

In his speech, Cui Peng first introduced the current risks of artificial intelligence, namely unexplainability and instability, and pointed out that correlation statistics is an important reason for these risks. Machine learning combined with causal inference can overcome these two shortcomings and achieve stable learning. It is worth mentioning that from a causal perspective, there is a certain internal relationship between interpretability and stability, that is, optimizing the stability of the model can also improve its interpretability. 

Cui Peng then introduced how to achieve stable learning through the idea of ​​"confused variable balance" and pointed out that it has theoretical guarantee. The experimental results also show: "The greater the difference between the environment during training and testing, the greater the performance improvement achieved by using the causal method compared to the correlation method." This demonstrates the advantages of causal inference in reducing machine learning risks and overcoming the defects of correlation statistics. and the potential to lead the next development direction of machine learning.

Introduction to the Causal Science Community : It is a vertical academic discussion community oriented to the field of causal science, jointly promoted by Zhiyuan Community and Jizhi Club. Its purpose is to promote communication and cooperation between causal science professionals and interested enthusiasts, and to promote the academic study of causal science. , The construction and implementation of the industrial ecology will breed a new generation of academic experts and industrial innovators in the field of causal science.

   

The causal science community welcomes you to join!

 

Vision of the Causal Science Community : Answering causal questions is an urgent need in various fields. Causal inference is currently used in many different fields (such as AI and statistics), but the languages ​​and models they use are different, resulting in differences among scientists in these fields. Difficulty communicating. Therefore, we hope to build a community and organize a large number of academic activities to enable scientific researchers to master the core ideas of statistics, become proficient in using various current AI technologies (such as Pytorch/Pyro to build deep probability models), and promote exchanges and exchanges between researchers in various fields. The collision of ideas allows causal reasoning in various fields to have a common paradigm and even common engineering practice standards, promoting the rapid development of the newly formed causal science. Humans with the ability of causal reasoning work closely together to create a powerful civilization. We hope that in the future society, causal reasoning will be integrated into every discipline, especially closely integrated with and improved AI. We look forward to countless Agents (Causal AI) with the ability to climb the ladder of causality . ) and human beings to work together to build the next generation of human civilization!

 

If you have appropriate mathematical foundation and artificial intelligence research experience, have both the curiosity of a scientist and the thinking of an engineer, and hope to participate in the "causal revolution", teach machines causal thinking, and contribute to causal science, please join our WeChat group: Scan Join the community assistant QR code below (please note "causal science")????

        

To read previous issues of "Cause and Effect Science Weekly", please click on the link below:

The first issue of "Causal Science Weekly": The causal community sincerely invites you to join and create a common paradigm for causal reasoning

The second issue of "Causal Science Weekly": How to solve the confusion bias?

The third issue of "Cause and Effect Science Weekly": Cause and effect helps Stable Learning

The fourth issue of "Cause and Effect Science Weekly": Cause and Effect Empowerment Recommendation System

Guess you like

Origin blog.csdn.net/BAAIBeijing/article/details/111056083