Using probability theory mathematical statistics to compare the sum (function test evaluation program of statistical methods applied probability) the merits of C ++ functions

Finished Code: https://github.com/user4898426/CppTiming

 First, the problem is introduced:

Compare the same effect using two different algorithms written function better, or just explore the nature of the programming language of the code written simple comparison of two who fast who is slow, this is a very common requirement. Obviously, the easiest way is a function of two repeated enough times (such as a million times), then simply time-consuming more with less.

But there is for some of the more subtle issues may differ took two very small that intuitive feeling is likely to be mistaken for error conditions, such as when the only difference between the two, the assumption is 0.2% of the cases intuition let us ignore that gap, or classified as error, or that it is not sufficient to produce a qualitative gap influence.

But whether we can do high score? Consider this scenario: the gap is 0.2%, but in the case of the test a thousand times, the function A is better than B function stably faster 0.2% , it is clear that, although the gap is very small at this time, but we can safely that there is a high probability function A unique than function B, function A better, although marginally .

Thus, this time, the simple and intuitive method becomes ineffective, and probability theory and mathematical statistics method provides a powerful tool to solve this problem. This paper attempts to probability theory and mathematical statistical methods applied to this issue, I have learned a limited, if any errors, omissions, please let us know.

Second, the theoretical basis:

Normal distribution, t distribution, hypothesis testing

Third, specific ideas:

Method a: double normal population model

1. Basic assumptions:

Set function f_{1}, f_{2}timing results were X_{1}, X_{2}.

  1. For a single function is provided, the timing error is centered at 0 obey a normal distribution, i.e., \Delta \sim N\left ( 0,\sigma ^{2} \right )
  2. Functions provided a constant run length \ mu, the measured value X = \ mu + \ Delta, theX \sim N\left ( \mu,\sigma ^{2} \right )
  3. Equal to the function set two timing results of variance, that is \left ( X_{1} \sim N \left( \mu_{1}, \sigma_{1}^{2} \right ) \right ) \wedge \left ( X_{2} \sim N \left( \mu_{2}, \sigma_{2}^{2} \right ) \right ) \wedge \left ( \sigma_{1} = \sigma_{2} \right )

Since the test on the same machine, Hypothesis 3 is reason to believe that, in addition, if there is no assumption that 3, I will not forget ha ha.

2. Hypothesis testing:

Since the problem under study is a function f_{1}, f_{2}"which one is better," the question is not "Is there a difference between" problem, so I chose one-sided test. In the actual experiment, there are typically faster than the other one of the assumptions (e.g., after a measurement, we can always make assumptions, assuming an average of fewer functions faster), it may be assumed here f_{1}than f_{2}faster.

For the level of significance \alpha:

  • The null hypothesis: f_{1}unlike f_{2}fast, that is,H_ {0}: \ mu_ {1} \ geq \ mu_ {2}
  • Alternative Hypothesis: f_{1}ratio of f_{2}fast, i.e.,H_{1} : \mu_{1} < \mu_{2}

Take the test statistic t = \frac{\left ( \bar{X_{1}} - \bar{X_{2}} \right ) - \left ( \mu_{1} - \mu_{2} \right )}{S \sqrt{ \frac{1}{n_{1}} + \frac{1}{n_{2}}}}, whichS = \sqrt{\frac{\left ( n_{1} - 1 \right ) S_{1}^{2} + \left ( n_{1} - 1 \right ) S_{1}^{2}}{m+n-2}}

Then tobey t distribution, that is, t \sim t \left ( n_{1} + n_{2} - 2 \right )

Refused domain t \leq - t_{\alpha}\left ( n_{1}+n_{2}-2 \right )

So, if you let the computer automatically determine just find this tquantile and t distributions, you can make qualitative conclusions.

Method two: paired test model

When I came up with this method, I'm surprised myself, but I could not find a loophole, the reliability of this method needs further expert argumentation.

1. Basic assumptions:

Set function f_{1}, f_{2}timing results were X_{1}, X_{2}.

  1. For a single function is provided, the timing error is centered at 0 obey a normal distribution, i.e., \Delta \sim N\left ( 0,\sigma ^{2} \right )
  2. Functions provided a constant run length \ mu, the measured value X = \ mu + \ Delta, theX \sim N\left ( \mu,\sigma ^{2} \right )

2. The problem is transformed:

Obtained from the basic assumptions,\left ( X_{1} \sim N \left( \mu_{1}, \sigma_{1}^{2} \right ) \right ) \wedge \left ( X_{2} \sim N \left( \mu_{2}, \sigma_{2}^{2} \right ) \right )

Then take Y = X_ {1} - X_ {2}, then Y \sim N \left ( \mu_{1} - \mu_{2} , \sigma_{1}^{2} + \sigma_{2}^{2} \right ), that is Yalso normal distribution

Apparently, Ythe observed valueand x_ {1} = - x_ {2}

3. Hypothesis testing:

There is still using one-sided test, the same principle of a method, it may be assumed f_{1}than f_{2}fast.

For the level of significance \alpha:

  • The null hypothesis: f_{1}unlike f_{2}fast, that is,H_ {0}: \ mu_ {1} \ geq \ mu_ {2} \ Leftrightarrow \ mu_ {1} - \ mu_ {2} \ geq 0
  • Alternative Hypothesis: f_{1}ratio of f_{2}fast, i.e.,H_{1} : \mu_{1} < \mu_{2} \Leftrightarrow \mu_{1} - \mu_{2} < 0

At this point, it becomes a test of a single normal population Ytest mean.

Take the test statistic t = \frac{\bar{Y} - 0}{S / \sqrt{n}}, which Sis Ythe sample standard deviation

then t \sim t \left ( n - 1 \right )

Refusal domain t \leq - t_{\alpha} \left ( n - 1 \right )

Similarly, as long as this is determined tquantile and t distributions, you can make qualitative conclusions.

Compare Fourth, the two methods

Compare source code with Annex

The basic principle is then verified by two methods were artificially created by two different speed functions.

Million times the test results are as follows:

test time: 10000
test 10000 times
method1:        9620 right
method2:        8071 right

The method can be seen in the accuracy of 96%, about 80% in two methods, a method is more reliable point.

V. Attachment:

I have written above in accordance with the idea of ​​a complete set of standards-based library of source code, let's call the library, the library function high-precision timing from the most basic to print the final result of the report, all of the code has been written, both for personal use, but also come to share, for details, see GitHub repository home page:

https://github.com/user4898426/CppTiming

Here is an example of a test and print results:

#include "timing.h"

void foo1()
{
	int a = 0;
	for (int i = 0; i < 10000; ++i)
	{
		a += i;
	}
}

void foo2()
{
	int a = 0;
	for (int i = 0; i < 11000; ++i)
	{
		a += i;
	}
}

int main()
{
	auto_compare(foo1, foo2);
}

Results are as follows:

+-------------- Comparision Report -------------+
|                                               |
|   Sample Records:                             |
|     index      f1(002515DC)    f2(002515E1)   |
|       0      1.7869e-05      1.93277e-05      |
|       1      1.7869e-05      1.93277e-05      |
|       2      1.7869e-05      1.93277e-05      |
|       3      1.85983e-05     1.8963e-05       |
|       4      1.75043e-05     1.93277e-05      |
|                                               |
+-----------------------------------------------+
|                                               |
|   Analysis:                                   |
|     mean:    1.79419e-05     1.92547e-05      |
|     vari:    1.59584e-13     2.65973e-14      |
|     socm:    1.27667e-13     2.12778e-14      |
|     ival(0.95):                               |
|              [1.87249e-05,    [1.95744e-05,   |
|                1.71589e-05]     1.89351e-05]  |
|                                               |
+-----------------------------------------------+
|                                               |
|   Significance Level : 0.05                   |
|   Faster By:     7.317%                       |
|   Time Saved:    -6.818%                      |
|   Conclusion:                                 |
|          ******************************       |
|          * f1 is significantly faster *       |
|          ******************************       |
|                                               |
+-----------------------------------------------+
|                                               |
|   Method II :                                 |
|          ******************************       |
|          * f1 is significantly faster *       |
|          ******************************       |
|                                               |
+-----------------------------------------------+

 

Released nine original articles · won praise 6 · views 1115

Guess you like

Origin blog.csdn.net/u014132143/article/details/99701986