Issues related to the probability of treatment

  1. Why in dealing with probability issues, will use log (p (x)), instead of p (x) of this operation.
    There are two reasons, (1) the optimal solution in the process, such as p ( x ) = e x 2 p(x) = e^{-x^2} gradient scale itself very well, when x is slightly larger, and its value is close to 0, then the learning rate will demand approaches infinity. While the log function is very good nature, it makes it very easy to find the right step x. (2) when it comes to even by the plurality of probabilities, a plurality of p (x) is multiplied by a small value close to zero leads to a probability, the probability of such a position of 0.1 was observed when 8 times (even by 8th), outside the accuracy range will float. If at this time to take the sum log, its operation is actually an index itself more easily to save computer.
Published 36 original articles · won praise 3 · views 10000 +

Guess you like

Origin blog.csdn.net/wang_jun_whu/article/details/104043043