Model of stability

Algorithm Engineer Responsibility is not only the proposed algorithm, but to present a more stable algorithm

Stability (Computational Stability) 1. Calculation of

Robustness (Robustness) computing model of stability especially computing performance, I guess computer background friends certainly will not feel strange this. Here is a simple example, if we let an integer type (int) variable to store a floating point variable (float), then we will lose accuracy. In machine learning, we often involves a large amount of computation, the accuracy is limited by the operation of a computer, many times we have to rounding (a Rounding), approximating irrational numbers to floating point numbers. This process will inevitably cause a large number of small errors, with the cumulative rounding errors add up and eventually cause the system to model error or failure. We take a look at the risk of machine learning computational stability of several common.

1.1. Underflow (Underflow) and overflow (Overflow)

As the name suggests, the overflow is representative of the content exceeds the limit of the container. In machine learning, because we use a lot of probability (Probability), and the probability interval often between 0-1, which led to the possibility of underflow greatly improved.

As a simple example, we often need more than multiplying probabilities, each probability assumptions  P_i = 0.01 :

P = P_i^{10}=0.000000000000000000001

From this it can be seen, only it needs a 1% probability that the results can be obtained by multiplying a minimum. And the machine learning often hundreds digit is multiplied by a similar situation causes the computer can not tell the difference between a minimum number and 0 and. In this case, the underflow may lead to failure of the direct model.

Similarly, the overflow is also very prone to the condition. Just think we need more larger number multiplied, can very easily exceed the upper limit of the computer. The upper limit value of 64-bit computers are not as big as we imagined:

L_{upper} = 2^{63}-1=9,223,372,036,854,775,808

Therefore, in the actual model, we will avoid multiplying a plurality of probabilities, and turned to find their number (the Log) , for example:

ln(\prod_{i=1}^nP(x_i)) = ln(P(x_1))+ln(P(x_2))+...+ln(P(x_n))=ln(\sum_{i=1}^nP(x_i))

So we will be more successful for a number of multiplicative transformation addition, to avoid the overflow may occur. And there is more on the nature of the number of beautiful mathematics, such as its monotonically increasing, ease into a probabilistic model, optimization of convex and so on.

1.1. Underflow (Underflow) and overflow (Overflow)

As the name suggests, the overflow is representative of the content exceeds the limit of the container. In machine learning, because we use a lot of probability (Probability), and the probability interval often between 0-1, which led to the possibility of underflow greatly improved.

As a simple example, we often need more than multiplying probabilities, each probability assumptions  P_i = 0.01 :

P = P_i^{10}=0.000000000000000000001

From this it can be seen, only it needs a 1% probability that the results can be obtained by multiplying a minimum. And the machine learning often hundreds digit is multiplied by a similar situation causes the computer can not tell the difference between a minimum number and 0 and. In this case, the underflow may lead to failure of the direct model.

Similarly, the overflow is also very prone to the condition. Just think we need more larger number multiplied, can very easily exceed the upper limit of the computer. The upper limit value of 64-bit computers are not as big as we imagined:

L_{upper} = 2^{63}-1=9,223,372,036,854,775,808

Therefore, in the actual model, we will avoid multiplying a plurality of probabilities, and turned to find its number (Log), for example:

ln(\prod_{i=1}^nP(x_i)) = ln(P(x_1))+ln(P(x_2))+...+ln(P(x_n))=ln(\sum_{i=1}^nP(x_i))

So we will be more successful for a number of multiplicative transformation addition, to avoid the overflow may occur. And there is more on the nature of the number of beautiful mathematics, such as its monotonically increasing, ease into a probabilistic model, optimization of convex and so on.

1.2 Smoothing (Smoothing) and 0

And underflow and overflow Similarly, we often find that encountered in machine learning "even ride" type 0 is an element, resulting in the loss of operational significance. With Naive Bayes (Naive Bayes) as an example:

P(\bm{x}|y=c) = P(y=c)\prod_{i=1}^{d}P(x_i|c)

We distinguish a sample point belongs to a classification  c probability for the characteristic  x_i of the class  c probability of  P(x_i|c) the product, namely the equation. But suppose as long as there is any one  P(x_i|c)=0or  P(y=c)=0 , then multiply this type of product would be zero. However, 0 is not really because it often occurs with probability 0.5, but only our training data had not occurred.

In a sense, this is also a kind of instability is calculated. Common practice is to use Laplacian smoothing (Laplace Smoothing) is calculated to correct this instability. Simply put, it is to artificially increase the likelihood of each one example, the probability that it is no longer zero.

Then take a particular feature in the classification probability value will be corrected as follows:

Lap(P(x_i|c)) = \frac{|D_{c,x_i}|+1}{|D_c|+N_i}

After this smoothing process, all of our value multipliers are not zero. Similar approaches are also often used in natural language processing (NLP), such as N-gram language model models often required to smoothing processing, the smoothing process may learn Europe.

1.3. Stability of the algorithm (Algorithmic Stability) and disturbance (Perturbation)

In machine learning or statistical learning models, we often need to consider the stability of the algorithm, that algorithm for data disturbance robustness. "Generalization error model is determined by error (Bias) and variance (Variance), and the high variance instability is the culprit."

Simply put, if an algorithm when small changes occur on the input value of output produced tremendous change, we can say that this algorithm is unstable. Here is not just that the algorithm machine learning algorithms, but also on behalf of other algorithms "middle course" involved, give a few specific examples:

  • Process matrix inversion (Inverting a Matrix) belongs to the unstable, we often choose to avoid matrix inversion. Interested readers can learn more about its causes.
  • Another interesting example is the study of neural networks batch (Batch Learning), not a training example of that is when the bulk of the training but learning neural network training data. Need to be careful when choosing the size of the batch (Batch Size) and the corresponding corresponding learning rate (Learning Rate), the error rate and the size of the study will lead to instability of the learning process. When we learn in small quantities, small samples of high variance (High Variance) leads us to learn gradients (Gradient) is very accurate, in this case, you should use small learning rate to prevent the pace we step too Big! Conversely, when we choose the larger size of the batch, you can safely use a greater rate.
  • The nature of the decision tree (Decision Tree) led to it also belongs to an unstable model. Small changes in the training data can even change the structure of the decision tree, the decision tree for the credibility that we always draw a question mark. In order to solve the problem of its instability, the researchers invented the integrated learning (Ensemble Learning), which Bagging it to enhance its stability by reducing the variance method.

So for convenience, we summarize the stable part of the model. There are a variety of the more common model support vector machine (SVM) model is derived, which is one of the reasons for the beginning of this century of fire SVM)

2. Stability data (Data Stability)

Strictly speaking, the stability data are often specific to the stability of time series (Time Series) is. The author of the data referred to herein in a broad sense, not only the time series. Basically, data stability depends on its Variance.

2.1. Independent and identically distributed (Independent Identically Distributed) and generalization (Generalization Ability)

Generalization of a machine learning model refers to its ability to fit in the new sample. Data model can obtain better generalization ability of their training to ensure that the data are independent and identically distributed samples derived from the parent distribution. Let's use a little knowledge of statistics ....

Suppose we have a parent (Population), its distribution is a positive integer from 1 to 100:

\ Bm {D} = \ {1,2,3, ..., 98,99,100 \}

Suppose we have three samples obtained from the D:

  • \ Bm {D} _1 = \ {1,4,16,25,36,49,64,81 \}
  • \ Bm {D} _2 = \ {10,20,30,40,50,60,70,80,90 \}
  • \ Bm {D} _3 = \ {1,2,3,4,5,6,7,8,9 \}

We find that the first sample seems that the number of square feet, the second sample is a multiple of ten, while the third sample appears to consecutive integers are less than 10. In this sample, we can safely guess learning model can not obtain good generalization ability by learning the three data sets .... because they are not independent and identically distributed samples.

Then the reader may ask, what is considered independent and identically distributed samples, first of all:

  1. We hope that the data sample is not intentional selection, such as deliberately pick a bunch of square numbers
  2. We hope that the data is sampled from the same distribution inside the pick, rather than pick a few from several distribution in each ...

So how to ensure that our training data is stable enough? I have a few suggestions that looks like nonsense:

  1. Training data the better ... This will reduce the chance of data, reduce Variance
  2. Ensure the training data and maternal data and forecast data from a distribution. For example, you can not use the average IQ of statisticians to predict the average IQ of biologists, it is not fair ... As for unfair to which side, left to the reader to think.

Thus the stability of the basic premise of the data is independent and identically distributed, and the number the better. Stable empirical data error model (Empirical Risk) which is approximately equal to ensure generalization error (Generalization Risk).

2.2 The new normal: Category imbalance

More and more machine learning problems will be encountered unbalanced data distribution, imbalance here can mean many things, such as positive and negative examples disparity between the number of binary classification problems. But note that if the distribution of the parent itself is unbalanced, do not be so balanced distribution by sampling. This violation of the sampling independent and identically distributed!

The face of a natural imbalance of data, we have a lot of practice can be processed, the more common practice of rebalancing include:

  • Oversampling (Over-Sampling): The smaller amount of data classification reuse
  • Undersampled (Down-Sampling): The large amount of data to selectively discard a portion of the classification.

In a similar situation, often manifested integrated learning is very good, it needs thanks to integrated learning can effectively reduce Variance. The reader must note that both oversampling or undersampling will cause problems, such as oversampling easily lead to over-fitting but in fact a waste of under-sampling data.

Therefore imbalance often brought stability problems, and fundamental research or because of excessive Variance.

3. Stability Performance - "theory apologist"

Stability evaluation of machine learning model (Stability) machine learning and assessment of performance (Performance) have essentially different, not simply by assessing the accuracy of such indicators is stable or not a machine learning. To give a simple example, suppose a model for a while performed particularly well, especially while relatively poor, we dare to use this model in the actual production of it? To put it plainly, stability or because the variance Variance decision data.

So small partners have to say, we may be able to use cross-validation (cross-validation) to assess the stability of an algorithm model. Yes it was the right idea, but the biggest problem is that cross-validation too slow. Whether half of (5-fold) or ten fold (10-fold) will require a longer time and repetitive operations. Life is precious, 1s can not be wasted!

So we generally calculated by learning theory (Computational Learning Theory) sometimes also called statistical theory of computation (Statistical Learning Theory) to analyze the algorithm. Introduces two frames for your reference:

  • Probability approximately correct framework (Probably Approximately Correct, PAC). PAC main frame answers a question: whether a learning algorithm from a sample at a time complexity of the polynomial function  \ Bm {x} approximating a concept learning, and to ensure that the error is within a certain range.
  • Error bounds frame (Mistake Bound Framework, MBF). MBF from another point of answering a question that a learning model before learning the correct concept in the training process will mistake How many times?

Given the length and depth and breadth of this concept, the author expands to form the topic in a future article. But learning theory is calculated to quantify learning model stability pointed in one direction, while also easing the statistical machine learning since learning bias against long - Machine Learning lack of theoretical foundation.

Only going to practice and are not intended readership research in the field of machine learning, without having to delve too deeply into what in the end is PAC, because its usefulness is limited, but also use a lot of knowledge of probability theory.

4. Small sentiment

The purpose of this article is not to list all the stability problems, nor is it you want extremely suspicious, skeptical. I simply would like to take this article Description of machine learning is an interdisciplinary, it requires not only that you understand the above computer floating-point precision to prevent overflow, you also need to understand the data sampling process statistics.

From this perspective, the readers of Computer Science came to relax their horizons, there are many other areas of machine learning are closely related; and statistical or mathematical background friends do not think the computer is only operational tool, you encounter a lot of problems in fact To put it plainly is the arithmetic problem. However, in addition to stability, to explore the unknown, it is innovation. So relax "stability" of the boundaries, constantly exploring the boundaries of truth.

Guess you like

Origin www.cnblogs.com/limingqi/p/12046480.html