Finished Code: https://github.com/user4898426/CppTiming
First, the problem is introduced:
Compare the same effect using two different algorithms written function better, or just explore the nature of the programming language of the code written simple comparison of two who fast who is slow, this is a very common requirement. Obviously, the easiest way is a function of two repeated enough times (such as a million times), then simply time-consuming more with less.
But there is for some of the more subtle issues may differ took two very small that intuitive feeling is likely to be mistaken for error conditions, such as when the only difference between the two, the assumption is 0.2% of the cases intuition let us ignore that gap, or classified as error, or that it is not sufficient to produce a qualitative gap influence.
But whether we can do high score? Consider this scenario: the gap is 0.2%, but in the case of the test a thousand times, the function A is better than B function stably faster 0.2% , it is clear that, although the gap is very small at this time, but we can safely that there is a high probability function A unique than function B, function A better, although marginally .
Thus, this time, the simple and intuitive method becomes ineffective, and probability theory and mathematical statistics method provides a powerful tool to solve this problem. This paper attempts to probability theory and mathematical statistical methods applied to this issue, I have learned a limited, if any errors, omissions, please let us know.
Second, the theoretical basis:
Normal distribution, t distribution, hypothesis testing
Third, specific ideas:
Method a: double normal population model
1. Basic assumptions:
Set function , timing results were , .
- For a single function is provided, the timing error is centered at 0 obey a normal distribution, i.e.,
- Functions provided a constant run length , the measured value , the
- Equal to the function set two timing results of variance, that is
Since the test on the same machine, Hypothesis 3 is reason to believe that, in addition, if there is no assumption that 3, I will not forget ha ha.
2. Hypothesis testing:
Since the problem under study is a function , "which one is better," the question is not "Is there a difference between" problem, so I chose one-sided test. In the actual experiment, there are typically faster than the other one of the assumptions (e.g., after a measurement, we can always make assumptions, assuming an average of fewer functions faster), it may be assumed here than faster.
For the level of significance :
- The null hypothesis: unlike fast, that is,
- Alternative Hypothesis: ratio of fast, i.e.,
Take the test statistic , which
Then obey t distribution, that is,
Refused domain
So, if you let the computer automatically determine just find this quantile and t distributions, you can make qualitative conclusions.
Method two: paired test model
When I came up with this method, I'm surprised myself, but I could not find a loophole, the reliability of this method needs further expert argumentation.
1. Basic assumptions:
Set function , timing results were , .
- For a single function is provided, the timing error is centered at 0 obey a normal distribution, i.e.,
- Functions provided a constant run length , the measured value , the
2. The problem is transformed:
Obtained from the basic assumptions,
Then take , then , that is also normal distribution
Apparently, the observed value
3. Hypothesis testing:
There is still using one-sided test, the same principle of a method, it may be assumed than fast.
For the level of significance :
- The null hypothesis: unlike fast, that is,
- Alternative Hypothesis: ratio of fast, i.e.,
At this point, it becomes a test of a single normal population test mean.
Take the test statistic , which is the sample standard deviation
then
Refusal domain
Similarly, as long as this is determined quantile and t distributions, you can make qualitative conclusions.
Compare Fourth, the two methods
Compare source code with Annex
The basic principle is then verified by two methods were artificially created by two different speed functions.
Million times the test results are as follows:
test time: 10000
test 10000 times
method1: 9620 right
method2: 8071 right
The method can be seen in the accuracy of 96%, about 80% in two methods, a method is more reliable point.
V. Attachment:
I have written above in accordance with the idea of a complete set of standards-based library of source code, let's call the library, the library function high-precision timing from the most basic to print the final result of the report, all of the code has been written, both for personal use, but also come to share, for details, see GitHub repository home page:
Here is an example of a test and print results:
#include "timing.h"
void foo1()
{
int a = 0;
for (int i = 0; i < 10000; ++i)
{
a += i;
}
}
void foo2()
{
int a = 0;
for (int i = 0; i < 11000; ++i)
{
a += i;
}
}
int main()
{
auto_compare(foo1, foo2);
}
Results are as follows:
+-------------- Comparision Report -------------+
| |
| Sample Records: |
| index f1(002515DC) f2(002515E1) |
| 0 1.7869e-05 1.93277e-05 |
| 1 1.7869e-05 1.93277e-05 |
| 2 1.7869e-05 1.93277e-05 |
| 3 1.85983e-05 1.8963e-05 |
| 4 1.75043e-05 1.93277e-05 |
| |
+-----------------------------------------------+
| |
| Analysis: |
| mean: 1.79419e-05 1.92547e-05 |
| vari: 1.59584e-13 2.65973e-14 |
| socm: 1.27667e-13 2.12778e-14 |
| ival(0.95): |
| [1.87249e-05, [1.95744e-05, |
| 1.71589e-05] 1.89351e-05] |
| |
+-----------------------------------------------+
| |
| Significance Level : 0.05 |
| Faster By: 7.317% |
| Time Saved: -6.818% |
| Conclusion: |
| ****************************** |
| * f1 is significantly faster * |
| ****************************** |
| |
+-----------------------------------------------+
| |
| Method II : |
| ****************************** |
| * f1 is significantly faster * |
| ****************************** |
| |
+-----------------------------------------------+