论文阅读:《Inferring causation from time series in Earth system sciences》

Research Background

  • The basic routine of modern scientific research is to observe and experiment to interfere with the target system under different controlled conditions.
  • However, in some complex systems such as the climate system, it is very difficult to do control experiments, so it is necessary to try to extract useful causal information from the observational data.
  • The causal inference method aims to discover and quantify the causal interdependence of the system
  • Causality can be divided into two categories, one is type causality, and the other is actual causality. The first type is the deduction from the cause (looking backwards), which is a kind of intervention thinking, changing the quantity of the cause will change the quantity of the effect; the second type is deducing the cause by the effect (looking forward), which is a kind of Counterfactual thinking, directly assume that the cause is another thing or the cause did not happen, and see if the result will change
  • Three levels of causality (from low to high)
    • Association (Statistical Model): Statistical correlation defined by data, most machine learning systems run on this layer
    • Intervention (Bayesian network model based on graphs): What results will an intervention or action lead to? The typical question is "What will happen if we double the price?"
    • Counterfactual [Structural-based causal model]: The typical question is "If you did different behaviors in the past, what will be the current results?"
    • If the high-level causal relationship can be resolved, then the resolution of the low-level relationship will be a matter of course, and vice versa.

Causality is better than correlation

Insert picture description here

Related methods

Granger causality

The basic idea: Whether ignoring the time series X will increase the forecast error of the time series Y. Its shortcoming is that it only examines the relationship between the occurrence of events and does not represent true cause and effect. In essence, it tests a kind of predictability. For example, when swallows fly low, it will rain. Granger Causality concluded that the swallows fly low can help predict rain.

Insert picture description here

Insert picture description here

Insert picture description here

The above method is a classic solution based on a linear autoregressive model. There are also some solutions based on more complex models such as information theory (transfer entropy, a causal inference method between nonlinear time series constructed based on conditional distribution, which can prove that it is compatible with Granger causality equivalent) or multivariate Granger causality test to extend to high-dimensional variables.

Nonlinear state-space methods

Basic idea: To test whether cause and effect always appear together, assuming that the occurrence of events is non-linear, dynamic and non-random, to construct a causal network in multiple sets of time series, the Takens theorem and phase space reconstruction theory are generally used. The typical method is the convergent cross-mapping (CCM) convergent cross-mapping algorithm.

Insert picture description here

If X can be predicted by embedding the reconstructed system based on the time delay of Y, then there is a causal relationship between X and Y.

Causal network

Research motivation: The state space method requires time series to satisfy chaos, so it cannot handle random situations well. A graph model based on Markov chains can be applied to solve such problems.

  • Peter-Clark (PC) algorithm

    The PC algorithm starts from a connected graph, and then iteratively removes the connections between different nodes. PC can point out the reasons why some variables can become other variables, but it cannot be said how strong the effect of this influence is. Algorithms similar to the PC algorithm cannot get a single DAG (directed acyclic graph) from the observation data, because multiple DAGs can describe the same conditional independence information. If subtracting edges from the full graph is replaced by adding edges from an empty graph, the Greedy equivalence search algorithm can be obtained. Adding or deleting edges requires judging conditional independence, which can be achieved through some machine learning and information theory solutions.

Insert picture description here

  • PCMCI algorithm
    This method introduces MCI detection on the basis of PC. It balances the characteristics of a large number of nonlinear interrelationships in complex systems, causal effects with a long time lag, and causal relationships that only appear in some cases. Error detection and undetection of these two types of errors make the model have a stronger ability to detect causality.

  • Fast Causal Inference (FCI) algorithm

    This method does not require the assumption of causal sufficiency, which means that it does not require all possible driving factors to be observed.

Structural causal model framework

Research motivation: GC method requires a certain time interval between causal events, so it is difficult to deal with situations that occur almost simultaneously. Network algorithms can solve this problem, but the cause and effect graph is required to have Markov properties. Based on the Bayesian network, SCM modifies the conditional probability to a functional expression, which can better calculate the causal direction.

The structural causal model (SCM) is composed of three parts: graphical model, structural equation, and counterfactual and intervention logic. The graph model is a language that expresses causal knowledge. Counterfactuals and intervention logic help them clarify what they want to know. Structural equations connect the two with solid semantics.

Insert picture description here

Comparison of advantages and disadvantages

Insert picture description here

Some challenges in research

  • It is necessary to extract the time series representing related sub-processes from the overall grid data, which can be achieved by averaging or dimensionality reduction.
  • When reconstructing causality, there may be interactions on small or large time scales
  • The distribution of variables may be non-Gaussian
  • How to measure the network, commonly used node degree, etc.
  • The data itself has the characteristics of high dimensionality, synergy, and small sample size

Research and improvement direction

  • Each method has its own advantages and disadvantages, so multiple methods can be combined
  • Filtering steps are used in preprocessing, which can eliminate seasonal factors within variables and reduce the difficulty of finding causal relationships on different scales
  • Although black box models such as deep learning and machine learning cannot be used directly for causal inference, they can be used to extract features
  • To some extent, causal inference can be regarded as a classification problem
  • Study how to generate simulated data with causal relationships to test the accuracy of the model
  • Use some physical models to impose constraints on causal inference methods to get more accurate results

Guess you like

Origin blog.csdn.net/jining11/article/details/108853654