Introduction to Explainable Artificial Intelligence - Reading Notes (2)

Chapter 2 Bayesian Methods

Model joint probability distributions of multiple random variables, characterizing uncertainty and correlation in data and models.

2.1 Bayesian Networks

It is an important class of probabilistic graphical models that mainly solve problems: representation, inference, and learning.

Key elements: directed acyclic graph G, probability distribution p

express

Random variable X = ( X 1 , X 2 , . . . , X d ) , π k is the set of parent nodes corresponding to X k, X π k is the set of corresponding random variables X = (X_1,X_2,...,X_d), \pi_k is the set of parent nodes corresponding to X_k, X_{\pi k} is the set of corresponding random variablesX=(X1,X2,...,Xd),Pikis xkThe corresponding set of parent nodes, Xpkis the set of corresponding random variables

The joint probability distribution is expressed in the form of factor multiplication:
p ( X ) = ∏ i = 1 dp ( X i ∣ X π x ) p(X) = \prod\limits_{i=1}^dp(X_i|X_{\pi x})p(X)=i=1dp(XiXx _)
Conditional independence: A and B are independent given C.

Three conditionally independent basic structures: fork, chain, collision

infer

  1. Likelihood: Observing a value e (aka evidence) of a variable, computing the probability of the value. (e.g. calculating the probability P(A=1,D=1) that the mouse and eagle populations are doing well)
  2. Conditional probability: Observing the evidence e, calculate the conditional probability posterior probability of the unobserved variable. (e.g., assuming the mouse is well developed, how is the eagle developed P(A|D=1))
  3. Maximum posterior probability value: Given some evidence e, calculate the maximum probability value of not being observed. (Same as above, then the most likely development of the eagle? argmax p(A=a|D=1))

Variable Reduction: An Exact Inference Approach

Approximate inference methods: quickly give approximate results. There are two main categories. The first category is the sampling-based method and the Markov chain Monte Carlo method. The second category is the variational inference method, which finds the one closest to the true posterior distribution as an approximation.

Learning Bayesian Networks

  1. Parameter learning: Assuming that the Bayesian network structure is given, estimate the optimal parameter or probability distribution

    point estimate. The indicator is statistical divergence. Maximum likelihood estimation, equivalent to KL divergence.

    Complete Bayesian method: regard the model parameters as global random variables (prior), apply the Bayesian formula, estimate the posterior probability distribution on a parameter, and consider all models to average

  2. structured learning

Bayesian programming learning

Small sample learning: Given several data situations, how to learn a suitable model to complete the prediction

Bayesian Programming Learning BPL is an interpretable hierarchical Bayesian model:

  1. Representation: symbol level (BPL samples different basic units to construct sub-parts, to the relationship between parts, to words) + entity level (given template to write step by step)

  2. Inference: Given an image, BPL infers the posterior probability distribution of the corresponding parts, subparts, and relations. (Random walk from the upper left corner, sample all possibilities, and get an approximate posterior)

  3. Learning: Two levels, traditional learning (training on many different characters, inferring the posterior distribution of parameters), learning how to learn (transfer learning from previous experience on new data)

2.2 Bayesian Deep Learning

Cross-fusion of Bayesian learning and deep learning

  • Deep generative model: use the fitting ability of NN to describe the complex relationship of variables in probability modeling, and obtain a more capable probability model
  • Bayesian neural network: Bayesian inference is used to describe the model uncertainty in deep learning, and the weight is changed to a probability distribution

deep generative model

Variational Autoencoder VAE and Generative Adversarial Network GAN. The generation of the fitted data of the two is unexplainable. The interpretable is thus expressed by the Bayesian network, and the network fits the rest.

Example: Graphical-GAN, a probabilistic graph generation confrontation network, can automatically learn interpretable features without semantic annotation.

Bayesian neural network

dropout, approximate Bayesian inference on deep learning

MC dropout, which samples different random versions of the same network as a posterior distribution, can estimate the average prediction, and can also estimate the uncertainty of the prediction.

From Bayesian Networks to Interpretable Causal Models

The causal model considers variables outside the model, and the connection relationship describes the causal relationship (directed) t.js/# Chapter 2 Bayesian method

Model joint probability distributions of multiple random variables, characterizing uncertainty and correlation in data and models.

2.1 Bayesian Networks

It is an important class of probabilistic graphical models that mainly solve problems: representation, inference, and learning.

Key elements: directed acyclic graph G, probability distribution p

express

Random variable X = ( X 1 , X 2 , . . . , X d ) , π k is the set of parent nodes corresponding to X k, X π k is the set of corresponding random variables X = (X_1,X_2,...,X_d), \pi_k is the set of parent nodes corresponding to X_k, X_{\pi k} is the set of corresponding random variablesX=(X1,X2,...,Xd),Pikis xkThe corresponding set of parent nodes, Xpkis the set of corresponding random variables

The joint probability distribution is expressed in the form of factor multiplication:
p ( X ) = ∏ i = 1 dp ( X i ∣ X π x ) p(X) = \prod\limits_{i=1}^dp(X_i|X_{\pi x})p(X)=i=1dp(XiXx _)
Conditional independence: A and B are independent given C.

Three conditionally independent basic structures: fork, chain, collision

infer

  1. Likelihood: Observing a value e (aka evidence) of a variable, computing the probability of the value. (e.g. calculating the probability P(A=1,D=1) that the mouse and eagle populations are doing well)
  2. Conditional probability: Observing the evidence e, calculate the conditional probability posterior probability of the unobserved variable. (e.g., assuming the mouse is well developed, how is the eagle developed P(A|D=1))
  3. Maximum posterior probability value: Given some evidence e, calculate the maximum probability value of not being observed. (Same as above, then the most likely development of the eagle? argmax p(A=a|D=1))

Variable Reduction: An Exact Inference Approach

Approximate inference methods: quickly give approximate results. There are two main categories. The first category is the sampling-based method and the Markov chain Monte Carlo method. The second category is the variational inference method, which finds the one closest to the true posterior distribution as an approximation.

Learning Bayesian Networks

  1. Parameter learning: Assuming that the Bayesian network structure is given, estimate the optimal parameter or probability distribution

    point estimate. The indicator is statistical divergence. Maximum likelihood estimation, equivalent to KL divergence.

    Complete Bayesian method: regard the model parameters as global random variables (prior), apply the Bayesian formula, estimate the posterior probability distribution on a parameter, and consider all models to average

  2. structured learning

Bayesian programming learning

Small sample learning: Given several data situations, how to learn a suitable model to complete the prediction

Bayesian Programming Learning BPL is an interpretable hierarchical Bayesian model:

  1. Representation: symbol level (BPL samples different basic units to construct sub-parts, to the relationship between parts, to words) + entity level (given template to write step by step)

  2. Inference: Given an image, BPL infers the posterior probability distribution of the corresponding parts, subparts, and relations. (Random walk from the upper left corner, sample all possibilities, and get an approximate posterior)

  3. Learning: Two levels, traditional learning (training on many different characters, inferring the posterior distribution of parameters), learning how to learn (transfer learning from previous experience on new data)

2.2 Bayesian Deep Learning

Cross-fusion of Bayesian learning and deep learning

  • Deep generative model: use the fitting ability of NN to describe the complex relationship of variables in probability modeling, and obtain a more capable probability model
  • Bayesian neural network: Bayesian inference is used to describe the model uncertainty in deep learning, and the weight is changed to a probability distribution

deep generative model

Variational Autoencoder VAE and Generative Adversarial Network GAN. The generation of the fitted data of the two is unexplainable. The interpretable is thus expressed by the Bayesian network, and the network fits the rest.

Example: Graphical-GAN, a probabilistic graph generation confrontation network, can automatically learn interpretable features without semantic annotation.

Bayesian neural network

dropout, approximate Bayesian inference on deep learning

MC dropout, which samples different random versions of the same network as a posterior distribution, can estimate the average prediction, and can also estimate the uncertainty of the prediction.

From Bayesian Networks to Interpretable Causal Models

The causal model considers variables outside the model, and the connection relationship depicts the causal relationship (directed)

Guess you like

Origin blog.csdn.net/weixin_44546100/article/details/127751640