Huashu Reading Notes (2)-Probability and Information Theory

Summary of all notes: "Deep Learning" Flower Book-Summary of Reading Notes

"Deep Learning" PDF free download: "Deep Learning"

1. Probability

It is directly related to the frequency of event occurrence and is called frequentist probability; when it
comes to the level of certainty, it is called Bayesian probability.

Second, random variables

Random variables can be discrete or continuous.

Three, probability distribution

It is used to describe the probability of a random variable or a cluster of random variables in each possible state. The way we describe the probability distribution depends on whether the random variable is discrete or continuous.

The probability distribution of a discrete variable can be described by a probability mass function ; when the object of our study is a continuous random variable, we use a probability density function to describe its probability distribution.

Fourth, marginal probability

We know the joint probability distribution of a set of variables, but want to know the probability distribution of a subset of them. Such probability is defined on a subset of the distribution is called marginal probability distributions (marginal probability distribution).

5. Conditional probability

In many cases, we are interested in a certain event, given the probability of occurrence of other events, this probability is called conditional probability .

6. The chain rule of conditional probability

P ( a , b , c ) = P ( a ∣ b , c ) P ( b , c ) = P ( a ∣ b , c ) P ( b ∣ c ) P ( c ) P(a,b,c)=P(a|b,c)P(b,c)=P(a|b,c)P(b|c)P(c) P(a,b,c)=P(ab,c)P(b,c)=P(ab,c)P(bc)P(c)

Seven, independence and conditional independence

Two random variables xxxyyy , if their probability distribution can be expressed as the product of two factors, and one factor only containsxxThe other factor of x only containsyyy , we call these two random variables independent of each other.
x ⊥ yx\bot yx y
if aboutxxxyyThe conditional probability distribution of y is forzzEach value of z can be written as a product, then these two random variablesxxxyyy in a given random variablezzz is conditionally independent. x ⊥ y ∣ zx\bot y|zxyz

8. Expectation, variance and covariance

Covariance (covariance) in a sense gives the strength of the linear correlation between two variables and the scale of these variables: C ov (f (x), g (y)) = E [(f (x) − E [f (x)]) (g (y) − E [g (y)])] Cov(f(x),g(y))=E[(f(x)-E[f(x)] )(g(y)-E[g(y)])]C o v ( f ( x ) ,g(y))=E[(f(x)E[f(x)])(g(y)E [ g ( y ) ] ) ]

Nine, commonly used probability distribution

  1. Bernoulli distribution
  2. Multinoulli distribution (multinoulli distribution) or categorical dis-tribution
  3. Normal distribution or Gaussian distribution
  4. Exponential distribution
  5. Laplace 分布(Laplace distribution)
  6. Dirac distribution or empirical distribution
  7. Mixture distribution (GMM Gaussian mixture model)

10. Useful properties of commonly used functions

logistic sigmoid function: σ (x) = 1 1 + exp ⁡ (− x) \sigma(x)=\frac1{1+\exp(-x)}σ ( x )=1+exp(x)1

softplus function: ζ (x) = log ⁡ (1 + exp ⁡ (x)) \zeta (x)=\log(1+\exp(x))ζ ( x )=log(1+exp(x))

Some common properties:
Insert picture description here

Eleven, Bayesian rule

P ( x ∣ y ) = P ( x ) P ( y ∣ x ) P ( y ) P(x|y)=\frac{P(x)P(y|x)}{P(y)} P(xy)=P(y)P(x)P(yx)

12. Technical details of continuous variables

13. Information Theory

The main research is to quantify how much information a signal contains.

We use Kullback-Leibler (KL) divergence to measure the difference between two distributions: DKL (P ∣ ∣ Q) = E x ∼ P [log ⁡ P (x) Q (x)] = E x ∼ P [log ⁡ P (x) − log ⁡ Q (x)] D_{KL}(P||Q)=E_{x\sim P}\Big[\log\frac{P(x)}{Q( x)}\Big]=E_{x\sim P}\Big[\log{P(x)}-\log{Q(x)}\Big]DK L(PQ)=ExP[logQ(x)P(x)]=ExP[logP(x)logQ(x)]

KL divergence has many useful properties, the most important of which is that it is non-negative. KL divergence is 0 if and only if PPP andQQQ is the same distribution in the case of discrete variables, or ``almost everywhere'' the same in the case of continuous variables.

14. Structured probability model

There are two main types of structured probability models: directed and undirected. Directed or undirected is not a characteristic of a probability distribution; it is a characteristic of a special description of a probability distribution, and any probability distribution can be described in these two ways.

The next chapter portal: Huashu reading notes (3)-numerical calculation

Guess you like

Origin blog.csdn.net/qq_41485273/article/details/112706884