When using LR, often used to represent a probability function Sigmod why LR can be used Sigmod function?
First, LR assumption that only one, is characterized by two categories of unequal distribution with mean equal to the variance of the Gaussian distribution. Why is it assumed that Gaussian distribution? In one aspect, it is readily appreciated Gaussian distribution; the other hand, from the viewpoint of information theory, when the mean and variance are known, the Gaussian distribution is a distribution of maximum entropy. When the maximum entropy distribution, risk can be shared equally. As binary search method, each will look as middle point, the purpose is to be shared equally risk.
Since the definition of "risk":
In the formula, it shows a sample prediction is zero risk, expressed as a risk prediction sample 1, represents the prediction is , for the actual risk associated.
In LR algorithm, it is believed, will not pose a risk prediction is correct, that is, and are 0, in addition, that the tag is zero, and the forecast is 1 and think the label is 1, and the prediction is 0, both brought the risk is the same, so and unified with to represent.
Above the "risk" reduces to:
For one sample, it should be predicted based on their category minimize the risk that compare two conditional probabilities, and the probability distribution of samples to the largest class.
Such as:
The logarithmic formula, then Naive Bayes formula and obtain:
Because and is a constant, the constant can be used instead of, set into the Gauss formula,
Index taking both sides, to give:
In summary, LR algorithm can be calculated using Sigmod function analysis.