Machine learning must be known 12 kinds of probability distributions will (with Python code implementation)

Machine learning has its own unique mathematical basis, we use calculus to process infinitely small change of function, and calculate their change; we used linear algebra to handle the calculations; we also use probability theory and statistical modeling uncertainty. In this one, probability theory has its unique position to predict the results of the model, the learning process, learning objectives can be understood by the probability point of view.

 

At the same time, from the finer point of view, the probability distribution of the random variable content is what we have to understand. In this article, the author describes the project statistical distribution of all you need to know, he also provides an implementation of the code of each distribution.

 

Project address: https: //github.com/graykode/distribution-is-all-you-need

 

 

Let us look at the overall probability distribution has what it:

 

 

Very interesting is the view of each distribution are linked. For example, Bernoulli distribution, binomial distribution it is repeated several times, if then expanded to a multi-class, became the polynomial distribution. Note that, where the conjugate (Conjugate) represents the probability distribution of mutually conjugate; Multi-Class random variable representing more than 2; N Times represents we also consider the prior distribution P (X).

 

Bayesian theory concepts, if the posterior distribution p (θ | x) and the prior distribution p (θ) is the probability distribution of the same family, the posterior distribution profile may be referred to conjugate a priori distribution may be referred to likelihood function of the conjugate prior.

 

In order to study the probability distribution, project authors suggested that we see Bishop of pattern recognition and machine learning. Of course, if you are ready to go over "Probability theory and mathematical statistics", that is excellent.

 

Probability distribution and characteristics

 

1. uniform (continuous)

 

Refers to a uniformly distributed random variable closed interval [a, b] in, and the probability of occurrence of each variable are the same.

 

 

2. Bernoulli distribution (discrete)

 

Bernoulli distribution are not considered a priori probabilities P (X), which is the distribution of a single binary random variable. It φ∈ by a single parameter [0, 1] controls, [Phi] gives the probability of a random variable is equal. We use the Mutual entropy functions for binary classification, it is consistent with the form of taking the negative logarithm of the Bernoulli distribution.

 

 

3. binomial distribution (discrete)

 

Binomial distribution is the concept proposed by Bernoulli, it refers to the repeated n independent Bernoulli trials. There are only two possible outcomes in each trial, and the two results occur or not opposite to each other.

 

 

4.Multi-Bernoulli distribution (discrete)

 

Multi-Bernoulli distribution yet visible profile (Categorical distribution), its category exceeds 2, the form of the cross-entropy and the negative logarithm of this distribution is consistent with the form.

 

5. multinomial distribution (discrete)

 

Category polynomial distribution is the distribution (Multinomial distribution) is a special case, the scope of its relationship with distribution as Bernoulli distribution relationship with binomial distribution.

 

 

6.Beta profile (continuous)

 

Beta distribution (Beta Distribution) as a Bernoulli distribution is binomial and conjugate prior distribution density distribution function, it refers to a group defined in the continuous probability (0,1) distribution. Beta distribution is a special case of uniform distribution, i.e., alpha = 1, beta = 1 distribution.

 

 

7. Dirichlet distribution (continuous)

 

Dirichlet distribution (Dirichlet distribution) is a class of distributed in a positive simplex (standard simplex) set to support (Support) high dimensional continuous probability real domain, a Beta distribution promote higher-dimensional. Bayesian inference, as a Dirichlet distribution polynomial conjugate a priori distribution applied, are used to construct the Dirichlet mixture model in machine learning.

 

 

8.Gamma profile (continuous)

 

Gamma distribution is statistically the common continuous distribution, exponential distribution, chi-square distribution and Erlang distribution are its special case. If Gamma (a, 1) / Gamma (a, 1) + Gamma (b, 1), then the Gamma distribution is equivalent to Beta (a, b) distribution.

 

 

9. exponential distribution (continuous)

 

Exponential distribution can be used to indicate the time interval independent random events, such as passengers enter the airport intervals into the call center of the interval and so on. When alpha is equal to 1, the exponential distribution is a special case of the Gamma distribution.

 

 

10. Gaussian distribution (continuous)

 

Gaussian or normal distribution is one of the most important, it is widely used throughout the model machine learning. For example, our right to reuse the initialization Gaussian distribution, our hidden vector with Gaussian distribution is normalized so on.

 

 

When the mean of the normal distribution is 0, 1 time variance, which is the standard normal distribution, which is our most popular distribution.

 

11. The chi-square distribution (continuous)

 

Briefly, chi-square distribution (Chi-squared) can be understood as the sum of squares of k independent degrees of freedom compliance standard normal variable k is a chi-squared distribution. Chi-square distribution is a special gamma distribution, the probability distribution is one of the most widely used in statistical inference, such as computing hypothesis testing and confidence intervals.

 

 

12. Student t- distribution

 

Student t- distribution (Student t-distribution) according to the estimated sample had a normal distribution and variance of unknown population, the average value is. t distribution is symmetrical inverted bell-shaped distribution, just as normal as it accounted for more long tail, which means t distribution is more prone to sample away from the mean.

 

 

Code is distributed implementation

 

NumPy way to build a variety of distribution above and cartography provides the corresponding code reader can be found in the original project. Shows cartography constructed as follows exponential distribution, we can directly define the probability density function, and then print it out just fine.

 

import numpy as np
from matplotlib import pyplot as plt

def exponential(x, lamb):
    y = lamb * np.exp(-lamb * x)
    return x, y, np.mean(y), np.std(y)

for lamb in [0.5, 1, 1.5]:

    x = np.arange(0, 20, 0.01, dtype=np.float)
    x, y, u, s = exponential(x, lamb=lamb)
    plt.plot(x, y, label=r'$mu=%.2f, sigma=%.2f,'
                         r' lambda=%d$' % (u, s, lamb))
plt.legend()
plt.savefig('graph/exponential.png')
plt.show()

 

Published 363 original articles · won praise 74 · views 190 000 +

Guess you like

Origin blog.csdn.net/sinat_26811377/article/details/104616633