Summary of all notes: "Deep Learning" Flower Book-Summary of Reading Notes
"Deep Learning" PDF free download: "Deep Learning"
1. Probability
It is directly related to the frequency of event occurrence and is called frequentist probability; when it
comes to the level of certainty, it is called Bayesian probability.
Second, random variables
Random variables can be discrete or continuous.
Three, probability distribution
It is used to describe the probability of a random variable or a cluster of random variables in each possible state. The way we describe the probability distribution depends on whether the random variable is discrete or continuous.
The probability distribution of a discrete variable can be described by a probability mass function ; when the object of our study is a continuous random variable, we use a probability density function to describe its probability distribution.
Fourth, marginal probability
We know the joint probability distribution of a set of variables, but want to know the probability distribution of a subset of them. Such probability is defined on a subset of the distribution is called marginal probability distributions (marginal probability distribution).
5. Conditional probability
In many cases, we are interested in a certain event, given the probability of occurrence of other events, this probability is called conditional probability .
6. The chain rule of conditional probability
P ( a , b , c ) = P ( a ∣ b , c ) P ( b , c ) = P ( a ∣ b , c ) P ( b ∣ c ) P ( c ) P(a,b,c)=P(a|b,c)P(b,c)=P(a|b,c)P(b|c)P(c) P(a,b,c)=P(a∣b,c)P(b,c)=P(a∣b,c)P(b∣c)P(c)
Seven, independence and conditional independence
Two random variables xxx和yyy , if their probability distribution can be expressed as the product of two factors, and one factor only containsxxThe other factor of x only containsyyy , we call these two random variables independent of each other.
x ⊥ yx\bot yx ⊥ y
if aboutxxx和yyThe conditional probability distribution of y is forzzEach value of z can be written as a product, then these two random variablesxxx和yyy in a given random variablezzz is conditionally independent. x ⊥ y ∣ zx\bot y|zx⊥y∣z
8. Expectation, variance and covariance
Covariance (covariance) in a sense gives the strength of the linear correlation between two variables and the scale of these variables: C ov (f (x), g (y)) = E [(f (x) − E [f (x)]) (g (y) − E [g (y)])] Cov(f(x),g(y))=E[(f(x)-E[f(x)] )(g(y)-E[g(y)])]C o v ( f ( x ) ,g(y))=E[(f(x)−E[f(x)])(g(y)−E [ g ( y ) ] ) ]
Nine, commonly used probability distribution
- Bernoulli distribution
- Multinoulli distribution (multinoulli distribution) or categorical dis-tribution
- Normal distribution or Gaussian distribution
- Exponential distribution
- Laplace 分布(Laplace distribution)
- Dirac distribution or empirical distribution
- Mixture distribution (GMM Gaussian mixture model)
10. Useful properties of commonly used functions
logistic sigmoid function: σ (x) = 1 1 + exp (− x) \sigma(x)=\frac1{1+\exp(-x)}σ ( x )=1+exp(−x)1
softplus function: ζ (x) = log (1 + exp (x)) \zeta (x)=\log(1+\exp(x))ζ ( x )=log(1+exp(x))
Some common properties:
Eleven, Bayesian rule
P ( x ∣ y ) = P ( x ) P ( y ∣ x ) P ( y ) P(x|y)=\frac{P(x)P(y|x)}{P(y)} P(x∣y)=P(y)P(x)P(y∣x)
12. Technical details of continuous variables
13. Information Theory
The main research is to quantify how much information a signal contains.
We use Kullback-Leibler (KL) divergence to measure the difference between two distributions: DKL (P ∣ ∣ Q) = E x ∼ P [log P (x) Q (x)] = E x ∼ P [log P (x) − log Q (x)] D_{KL}(P||Q)=E_{x\sim P}\Big[\log\frac{P(x)}{Q( x)}\Big]=E_{x\sim P}\Big[\log{P(x)}-\log{Q(x)}\Big]DK L(P∣∣Q)=Ex∼P[logQ(x)P(x)]=Ex∼P[logP(x)−logQ(x)]
KL divergence has many useful properties, the most important of which is that it is non-negative. KL divergence is 0 if and only if PPP andQQQ is the same distribution in the case of discrete variables, or ``almost everywhere'' the same in the case of continuous variables.
14. Structured probability model
There are two main types of structured probability models: directed and undirected. Directed or undirected is not a characteristic of a probability distribution; it is a characteristic of a special description of a probability distribution, and any probability distribution can be described in these two ways.
The next chapter portal: Huashu reading notes (3)-numerical calculation