版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/crazy_scott/article/details/82818042
Discrete
Bernoulli distribution
pmf
f
X
(
x
)
=
P
(
X
=
x
)
=
{
(
1
−
p
)
1
−
x
p
x
for x = 0 or 1
0
otherwise
f_X(x) = P(X= x) =\left\{\begin{aligned}(1-p)^{1-x}p^x & \quad \text{for x = 0 or 1}\\ 0 & \quad\text{otherwise}\end{aligned}\right.
f X ( x ) = P ( X = x ) = { ( 1 − p ) 1 − x p x 0 for x = 0 or 1 otherwise
expectation
E
(
X
)
=
p
E(X) = p
E ( X ) = p
Binomial distribution
pmf
f
X
(
k
)
=
P
(
X
=
k
)
=
{
C
n
k
p
k
(
1
−
p
)
n
−
k
for k=0,1,....,n
0
otherwise
f_X(k) = P(X= k) =\left\{\begin{aligned}C_n^kp^k(1-p)^{n-k} & \quad \text{for k=0,1,....,n}\\ 0 & \quad\text{otherwise}\end{aligned}\right.
f X ( k ) = P ( X = k ) = { C n k p k ( 1 − p ) n − k 0 for k=0,1,....,n otherwise
expectation
E
(
X
)
=
n
p
E(X) = np
E ( X ) = n p
variance
v
a
r
(
X
)
=
n
p
(
1
−
p
)
var(X) = np(1-p)
v a r ( X ) = n p ( 1 − p )
Geometric distribution
pmf
f
X
(
k
)
=
P
(
X
=
k
)
=
{
p
(
1
−
p
)
k
−
1
for k=1,2,3...
0
otherwise
f_X(k) = P(X= k) =\left\{\begin{aligned}p(1-p)^{k-1} & \quad \text{for k=1,2,3...}\\ 0 & \quad\text{otherwise}\end{aligned}\right.
f X ( k ) = P ( X = k ) = { p ( 1 − p ) k − 1 0 for k=1,2,3... otherwise
expectation
E
(
X
)
=
1
P
E(X) = \frac{1}{P}
E ( X ) = P 1
Negative binomial distribution
The negative binomial distribution arises as a generalization of the geometric distribution.
Suppose that a sequence of independent trials each with probability of success
p
p
p is performed until there are
r
r
r successes in all.
so can be denote as
p
⋅
C
k
−
1
r
−
1
p
r
−
1
(
1
−
p
)
(
k
−
1
)
−
(
r
−
1
)
p \cdot C_{k-1}^{r-1} p^{r-1}(1-p)^{(k-1)-(r-1)}
p ⋅ C k − 1 r − 1 p r − 1 ( 1 − p ) ( k − 1 ) − ( r − 1 )
pmf
f
X
(
k
)
=
P
(
X
=
k
)
=
{
C
k
−
1
r
−
1
p
r
(
1
−
p
)
k
−
r
for k=1,2,3...
0
otherwise
f_X(k) = P(X= k) =\left\{\begin{aligned}C_{k-1}^{r-1}p^r(1-p)^{k-r} & \quad \text{for k=1,2,3...}\\ 0 & \quad\text{otherwise}\end{aligned}\right.
f X ( k ) = P ( X = k ) = { C k − 1 r − 1 p r ( 1 − p ) k − r 0 for k=1,2,3... otherwise
Hypergeometric distribution
Suppose that an urn contains
n
n
n balls, of which
r
r
r are black and
n
−
r
n-r
n − r are white. Let
X
X
X denote the number of black balls drawn when taking
m
m
m balls without replacement.
pmf
f
X
(
k
)
=
P
(
X
=
k
)
=
{
C
r
k
C
n
−
r
m
−
k
C
n
m
0
≤
k
≤
r
0
otherwise
f_X(k) = P(X= k) =\left\{\begin{aligned}\frac{C_r^kC_{n-r}^{m-k}}{C_n^m} & \quad 0\le k \le r\\ 0 & \quad\text{otherwise}\end{aligned}\right.
f X ( k ) = P ( X = k ) = ⎩ ⎪ ⎨ ⎪ ⎧ C n m C r k C n − r m − k 0 0 ≤ k ≤ r otherwise
Possion distribution
can be derived as the limit of a binomial distribution as the number of trials approaches infinity and the probability of success on each trial approaches zero in such a way that
n
p
=
λ
np = \lambda
n p = λ ,
λ
\lambda
λ can be seen as the successful trials
pmf
P
(
X
=
k
)
=
λ
k
k
!
e
−
λ
k
=
0
,
1
,
2...
P(X = k) = \frac{\lambda^k }{k!} e^{-\lambda} \quad k = 0,1,2...
P ( X = k ) = k ! λ k e − λ k = 0 , 1 , 2 . . .
Continuous
Uniform distribution
A uniform r.v on the interval [a,b] is a model for what we mean when we say “choose a number at random between a and b”
pdf
f
X
(
x
)
=
{
1
b
−
a
a
≤
x
≤
b
0
otherwise
f_X(x) = \left\{\begin{aligned}\frac{1}{b-a} & \quad a\le x \le b\\ 0 & \quad\text{otherwise}\end{aligned}\right.
f X ( x ) = ⎩ ⎨ ⎧ b − a 1 0 a ≤ x ≤ b otherwise
Exponential distribution
Exponential distribution is often used to model lifetimes or waiting times, in which context it is conventional to replace
x
x
x by
t
t
t .
pdf
f
X
(
x
)
=
{
λ
e
−
λ
x
x
≥
0
0
otherwise
f_X(x) = \left\{\begin{aligned}\lambda e^{-\lambda x} & \quad x\ge 0\\ 0 & \quad\text{otherwise}\end{aligned}\right.
f X ( x ) = { λ e − λ x 0 x ≥ 0 otherwise
cdf(easy to get)
F
X
(
x
)
=
{
1
−
e
−
λ
x
x
≥
0
0
otherwise
F_X(x) = \left\{\begin{aligned}1-e^{-\lambda x} & \quad x\ge 0\\ 0 & \quad\text{otherwise}\end{aligned}\right.
F X ( x ) = { 1 − e − λ x 0 x ≥ 0 otherwise
expectation
E
(
X
)
=
λ
E(X) = \lambda
E ( X ) = λ
variance
v
a
r
(
X
)
=
λ
2
var(X) = \lambda^2
v a r ( X ) = λ 2
property
let
X
,
Y
X,Y
X , Y are independent Poisson r.v.s with
θ
1
,
θ
2
\theta_1,\theta_2
θ 1 , θ 2 ,then
X
+
Y
∼
P
o
i
s
s
o
n
(
θ
1
+
θ
2
)
X+Y\sim Poisson (\theta_1+\theta_2)
X + Y ∼ P o i s s o n ( θ 1 + θ 2 )
Gamma distribution
pdf
g
(
t
)
=
{
λ
α
τ
(
α
)
t
α
−
1
e
−
λ
t
t
≥
0
0
otherwise
g(t) = \left\{\begin{aligned}\frac{\lambda^\alpha}{\tau (\alpha)}t^{\alpha-1}e^{-\lambda t} & \quad t\ge 0\\ 0 & \quad\text{otherwise}\end{aligned}\right.
g ( t ) = ⎩ ⎪ ⎨ ⎪ ⎧ τ ( α ) λ α t α − 1 e − λ t 0 t ≥ 0 otherwise
τ
(
x
)
=
∫
0
∞
u
x
−
1
e
−
u
d
u
,
x
>
0
\tau(x) = \int _0^\infty u^{x-1}e^{-u}du,x>0
τ ( x ) = ∫ 0 ∞ u x − 1 e − u d u , x > 0
expectation
E
(
X
)
=
α
λ
E(X) = \frac{\alpha}{\lambda}
E ( X ) = λ α
variance
V
a
r
(
X
)
=
α
λ
2
Var(X)= \frac{\alpha}{\lambda ^2}
V a r ( X ) = λ 2 α
Property
Note that if
α
=
1
\alpha = 1
α = 1 , the gamma density coincides with the exponential density.
conduct
∵
τ
(
α
)
=
∫
0
∞
x
α
−
1
e
−
t
d
x
\because \tau(\alpha ) =\int _0^\infty x^{\alpha-1}e^{-t}dx
∵ τ ( α ) = ∫ 0 ∞ x α − 1 e − t d x
∴
x
=
λ
t
,
→
τ
(
α
)
=
λ
α
∫
0
∞
t
α
−
1
e
−
λ
t
d
t
\therefore x = \lambda t,\to \tau (\alpha) = \lambda^\alpha \int _0^\infty t^{\alpha-1}e^{-\lambda t}dt
∴ x = λ t , → τ ( α ) = λ α ∫ 0 ∞ t α − 1 e − λ t d t
∴
1
τ
(
α
)
λ
α
∫
0
∞
t
α
−
1
e
−
λ
t
d
t
=
1
\therefore \frac{1}{\tau (\alpha)}\lambda^\alpha \int _0^\infty t^{\alpha-1}e^{-\lambda t}dt = 1
∴ τ ( α ) 1 λ α ∫ 0 ∞ t α − 1 e − λ t d t = 1
∴
g
(
t
)
=
λ
α
τ
(
α
)
t
α
−
1
e
−
λ
t
\therefore g(t) =\frac{\lambda^\alpha}{\tau(\alpha)}t^{\alpha-1}e^{-\lambda t}
∴ g ( t ) = τ ( α ) λ α t α − 1 e − λ t
α
\alpha
α is called a shape parameter for the gamma density,
Varying
α
\alpha
α changes the shape of the density
λ
\lambda
λ is called a scale parameter
Varying
λ
\lambda
λ corresponds to changing the units of measurement and does not affect the shape of the density
how to understand gamma?
Normal distribution
pdf
g
(
t
)
=
{
1
σ
2
π
e
−
(
x
−
μ
)
2
/
(
2
σ
2
)
t
≥
0
0
otherwise
g(t) = \left\{\begin{aligned}\frac{1}{\sigma\sqrt{2\pi}}e^{-(x-\mu)^2/(2\sigma^2)} & \quad t\ge 0\\ 0 & \quad\text{otherwise}\end{aligned}\right.
g ( t ) = ⎩ ⎪ ⎨ ⎪ ⎧ σ 2 π
1 e − ( x − μ ) 2 / ( 2 σ 2 ) 0 t ≥ 0 otherwise
μ
\mu
μ is the mean
σ
\sigma
σ is the standard deviation
If
X
∼
N
(
μ
;
σ
2
)
X \sim N(\mu; \sigma^2)
X ∼ N ( μ ; σ 2 ) ,and
Y
=
a
X
+
b
Y = aX + b
Y = a X + b , then
Y
∼
N
(
a
μ
+
b
,
a
2
σ
2
)
Y \sim N(a\mu+b,a^2\sigma^2)
Y ∼ N ( a μ + b , a 2 σ 2 )
especially, if
X
∼
N
(
μ
,
σ
2
)
X \sim N(\mu,\sigma^2)
X ∼ N ( μ , σ 2 ) , then
Z
=
x
−
μ
σ
∼
N
(
0
,
1
)
Z = \frac{x-\mu}{\sigma}\sim N(0,1)
Z = σ x − μ ∼ N ( 0 , 1 )
a
X
+
b
Y
∼
N
(
a
μ
X
+
b
μ
Y
,
a
2
σ
X
2
+
b
2
σ
Y
2
+
2
a
b
ρ
σ
X
σ
Y
)
aX+bY \sim N(a\mu_X+b\mu_Y,a^2\sigma_X^2 + b^2\sigma_Y^2 + 2ab\rho \sigma_X\sigma_Y)
a X + b Y ∼ N ( a μ X + b μ Y , a 2 σ X 2 + b 2 σ Y 2 + 2 a b ρ σ X σ Y )
property
if
X
,
Y
∼
N
(
0
,
1
)
X,Y \sim N(0,1)
X , Y ∼ N ( 0 , 1 ) ,then $U = \frac{X}{Y} $ is Cauchy r.v (lec3)
f
U
(
u
)
=
1
π
(
u
2
+
1
)
f_U(u) = \frac{1}{\pi (u^2+1)}
f U ( u ) = π ( u 2 + 1 ) 1
Exponential family
A family of pdfs or pmfs is called an exponential family if it can be expressed as:
p
(
x
,
θ
)
=
H
(
x
)
exp
(
θ
T
ϕ
(
x
)
−
A
(
θ
)
)
p(x,\theta) = H(x)\exp(\theta^T \phi(x) - A(\theta))
p ( x , θ ) = H ( x ) exp ( θ T ϕ ( x ) − A ( θ ) )
H
(
x
)
H(x)
H ( x ) is the normalization factor
It is very helpful to model heterogeneous data in the era of big data.
Bernoulli, Gaussian, Binomial, Poisson, Exponential, Weibull, Laplace, Gamma, Beta, Multinomial, Wishart distributions are all exponential families
the explain can be seen here
Property
E
(
X
)
=
E
(
E
(
X
∣
Y
)
)
E(X) = E(E(X|Y))
E ( X ) = E ( E ( X ∣ Y ) )
V
a
r
(
X
)
=
E
(
V
a
r
(
X
∣
Y
)
)
+
V
a
r
(
E
(
X
∣
Y
)
)
Var(X) = E(Var(X|Y)) + Var(E(X|Y))
V a r ( X ) = E ( V a r ( X ∣ Y ) ) + V a r ( E ( X ∣ Y ) )