CSI-V: Execution time of the test program

foreword

People often ask: "How fast does program X run on machine Y?", and our general answer is to give a rough description that can estimate the execution time of the program, such as: the program runs in N minutes How much data is out; from this we can speculate on the running performance of the program. And few people pay attention to the exact time the program runs, unless we need to know whether the program has improved the performance after optimization, or want to compare the execution efficiency of two similar algorithms. At this time, we may need to measure a lot of running data to get the CPE (see CSI-VI) when the program is running for observation and comparison. At this time, we need to understand how to test the execution time of the program.

First of all, we need to explain that the computer cannot test the absolute running time of a program. This may be caused by many reasons. The most obvious one is that the program is executed concurrently in the computer, so the switching between processes can lead to certain errors. There are also factors such as cache, processor branch prediction, etc. So don't expect the execution time of a program to be absolutely the same on the computer every time. In this article, we learn two mechanisms for recording the passage of time: one is based on a low frequency timer, which periodically interrupts the processor; the other is based on a counter, which is incremented by 1 every clock cycle.

1. Time flow on a computer system

Computers have two timescales:

At the microscopic level, they execute instructions at a rate of one or more per clock cycle, where the clock cycle only takes about 1ns (nanoseconds), which is related to the machine's main frequency.

At the macro level, as is commonly said and accepted, for example, a graphics display refreshes every 33ms, a disk typically takes about 10ms to initiate a disk transfer, and the processor allocates a time slice of about 5~20ms to each task At such a speed, the user feels that the tasks are being performed simultaneously, because a person cannot perceive a time period shorter than 100ms.

2 Process scheduling and timer interrupts

The computer has an external timer that periodically sends an interrupt signal to the processor. The time between these interrupt signals is called the interval time. When a timer interrupt occurs, the operating system scheduler can choose to either continue the currently executing process or switch to another process. This interval must be set appropriately enough to provide the illusion of simultaneous execution of multiple tasks, but not so short that it results in poor performance. A typical timer interval range is 1~10ms.

3 Measure time by interval count

The operating system also uses timers to keep track of the cumulative time used by each process, and this information provides a less-than-prepared measure of program execution time. Let's first look at the diagram below:

Seeing this picture, we must first clarify a fact: during the execution of a process, the program is active (in running its instructions) in one time period, and the program is inactive in another time period (waiting to be called by the operating system). And the time of the program activity occupies a large percentage of the entire execution time.

The operating system maintains a count of the amount of user time and the amount of system time used by each process. When a timer interrupt occurs, the operating system determines which process is active and increments the timer interval to a technical value for that process. . If executed in kernel mode, then increase the system time, otherwise increase the user time. As identified in the figure above, the user and system times of process A are Au and As, and the user and system times of process B are Bu and Bs, and each timer interrupt is identified by an incremented count value.

In order to get the execution time of the process, we can call the library function times to read the timer of the process. The function declaration is as follows:

Struct tms 
{
Clock_ttms_utime; //User time
Clock_ttms_stime; //System time
Clock_ttms_cutime; //Cumulative user time used by
subprocesses that have been terminated and recycled Clock_ttms_cstime; //Cumulative system used by subprocesses that have been terminated and recycled time
};
Clock_ttimes(struct tms* buf);

These time measurements are expressed in units of clock ticks. The defined constant CLK_TCK specifies the number of clock ticks per second. The clock type clock_t is usually defined as a long integer. times returns the total number of clock ticks that have elapsed since the system started, so we can calculate the total time between two different points in a program execution by calling times twice and calculating the difference between the two return values.

和times方法类似,在WIN32平台下API GetTickCount也是返回自系统启动后的时钟滴答数。

其在MSDN中的定义如下:

DWORD GetTickCount(void);

调用GetTickCount返回的自系统启动以来流逝的毫秒(ms)数。还有其他类似的API如:

timeGetSystemTime和timeGetTime,具体说明请参考MSDN。

ANSI C标准定义的另一个函数clock:

Clock_t clock(void)

其返回值和times一样,都声明为clock_t类型,但是两个函数

表达时间的单位不同,要将clock函数报告的时间变为秒数,必须把它除以定义好的常数CLOCK_PER_SEC。

4.周期计数器

为了给计时测量提供更高的精确度,许多处理器还包含一个运行在时钟周期级的计时器。这个计时器是一个特殊的寄存器,每个时钟周期它都会加1.可以用特殊的机器指令来读取这个值。

IA32周期计数器

在IA32体系结构中,周期计数器是一个64位无符号数。对于一个运行时钟为1GHz的处理器,每570年,这个计数器才会从264-1绕回到0。IA32计数器使用rdtsc(readtimestamp counter,读时间戳计数器)指令来访问的。这条指令没有参数。它将寄存器%edx设置为计数器的高32位,而寄存器%eax设置为低32位。据此我们可以提供一个C程序的接口,来实现使用周期计数的方式测量程序的执行时间。

在这里,我参考书中代码给出了windows下的测量接口,需要注意的是并不是所有的处理器都有支持类似的计数器,因此这是依赖于硬件平台的。

static unsignedcyc_hi = 0;
static unsignedcyc_lo = 0;
//这个方法使用rdtsc指令将edx和eax的值放置到指定的变量中
voidaccess_counter(unsigned *hi,unsigned* lo)
{
__asm
{
CPUID
RDTSC
mov ebx,hi
mov [ebx],edx
mov ebx,lo
mov [ebx],eax
}
}
//这个方法记录当前周期计数器的值,为计算做准备
voidstart_counter()
{
access_counter(&cyc_hi,&cyc_lo);
}
//这个方法返回自上次调用start_counter后经历的时钟周期数
doubleget_counter()
{
unsigned ncyc_hi,ncyc_lo;
unsigned hi,lo,borrow;
double result;
access_counter(&ncyc_hi,&ncyc_lo);
lo = ncyc_lo-cyc_lo;
borrow = lo>ncyc_lo;
hi=ncyc_hi-cyc_hi-borrow;
result = (double)hi*(1<<30)*4+lo;
if(result<0){
fprintf(stderr,"Error:counterreturns neg value %.0f\n",result);
}
return result;
}
//测量处理器的主频
double ghz(intverbose,int sleeptime)
{
double rate;
start_counter();
Sleep(sleeptime);//note:Windows API Sleep的参数以ms为单位
rate = get_counter()/(1e6*sleeptime);
if(verbose)
printf("processor clock rate= %.1fGhz\n",rate);
return rate;
}

当然,有了以上的C接口,我们也可以这样使用它们来测量我们的程序段:

int main()
{
start_counter();
int sum = 0,i;
for(i=0;i<20000;i++){
sum+=i;
}
printf("cycle counter:%.1f\n",get_counter());
getchar();
return 0;
}

其实,在windows API提供了类似的方法来获取高精度的时间值。这个方法的官方定义和声明如下:

BOOLQueryPerformanceCounter(LARGE_INTEGER* lpPerformanceCount);

这个方法返回当前高精度性能计数器的值。其中LARGE_INTEGER结构的定义如下:

typedef union _LARGE_INTEGER {
struct {
DWORD LowPart;
LONG HighPart;
};
LONGLONG QuadPart
} LARGE_INTEGER, *PLARGE_INTEGER;

从结构上来看,其相关的值也是一个64位的周期计数器。可以推测该API也是使用周期计数的方式来测量时间的。同时,跟其相关的另一个API :QueryPerformanceFrequency

BOOL QueryPerformanceFrequency(LARGE_INTEGER *lpFrenquency);

这个方法返回高精度性能计数器每秒的计数值。如果硬件不支持,函数调用就会失败。

但是,需要注意的是,在多处理器计算机上,我们需要将测试程序所在的线程绑定在某一CPU上,而避免其在cpu间进行迁移,这是由于在多cpu系统中,不同的CPU由RDTSC指令读取到的CPU周期数可能不同,用两个不同的CPU得到的周期数做计算会得到无意义的值。因此,我们需要用到SetThreadAffinityMask方法将某一线程和CPU绑定,其相关内容介绍请参见MSDN。

在linux系统中,也提供了一个高精度的计时方法clock_gettime,其原型为:

int clock_gettime(clockid_t clk_id, struct timespect *tp);
clockid_t 用于指定计时时钟的类型,timespec结构为:
struct timespec
{
Time_t tv_sec;
Long tv_nsec ;
}

注:该方法提供了纳秒级的精度值。

上下文切换的影响

可能我们已经认为,使用周期计数器测量程序的执行时间将会非常接近程序的实际运行时间,但实际上是不一定的。想象这样情况,如果在两次调用计数器例程之间,有另外某个进程执行了,或者如果是机器负载很重,程序段的运行时间特别长,这将导致测量到的时间和实际需要的运行时间差距很大。这是由于在进程切换中间导致额外的时间产生,如果负载过重,进程切换过于频繁,那么执行时间的差异就会越大。

高速缓存和其他因素的影响

在前面我们提到过高速缓存和分支预测也会对计时造成一定的影响。这里我简单的说明下。

一个程序在第一次执行中,其所需要的数据并没加载到高速缓存(cache-1和cache-2,或者叫片内和片外高速缓存),这时程序执行会从主存中加载数据。而再最近的时间内再一次执行该程序时就从高速缓存加载,而我们知道从高速缓存加载的速度要比内存块10倍左右。所以导致测量的结果出现误差。关于分支预测,这里涉及到微处理硬件,现代大多的处理器都可以对指令进行冒险加载,即对一个判断分支提前做出预测,如果预测正确就继续执行,否则,会带来一定的时间处罚。当然不管是高速缓存还是分支预测对于我们测量程序执行时间相对来说都很小。

5. K次最优测量方法

对于上述可能带来的误差,这些误差总是导致过高地估计真是的执行时间,但我们仍然有方法来获得执行时间可靠的测量值。在这里我们重复执行一个过程,记录K次最快的时间。如果我们发现这种测量误差∈很小,那么用测量的最快值来表示过程的真实执行时间就是合理的。

“k次最优(K-best)方法”。它要求设置三个参数:

K:我们要求在某个接近最快值范围内的测量值数量。

∈:这些测量必须有多大程度的接近。也就是如果测量值按照升序标号为v1,v2,v3….vn

那么我们要求(1+∈)v1>=vk

M:在我们中止之前,测量值的最大数量

对于此方法,我们按照排序的方式维护者一个K个最快时间的数组,对于每个新的测量值,它会检查这个值是否比当前数组位置K中的值更快。如果是,它会替换数组元素K,然后执行一系列相邻数组之间的交换,将这个值移动到数组中适当的位置。继续这个过程,直到误差满足标准,此时我们称测量值已经”收敛了”,或者我们查过了界限M,此时我们称测量值不能收敛。

6. 基于gettimeofday函数的测量

这个库函数gettimeofday是用来查询系统时钟的,以确定当前的日期和时间,其定义如下:

Struct timeval{
Long tv_sec; /*seconds*/
Long tv_usec; /*us*/
}
Intgettimeofday(struct timeval* tv,NULL);

这个函数把时间写入到一个调用者传过来的结构中,这个结构包括一个单位为s的字段和一个单位为us的字段。第一个字段存放的是自从1970年1月1日以来经过的总秒数。

据此方法我们也可以写相应的方法提供测量时间的接口:

Static structtimeval tstart;
Void start_timer()
{
Gettimeofday(&tstart,NULL);
}
Double get_timer()
{
Struct timeval tfinish;
Long sec,usec;
Gettimeofday(&tfinish,NULL);
Sec = tfinish.tv_sec – tstart.tv_sec;
Usec = tfinish.tv_usc –tstart.tv_usec;
Return sec+1e-6*usec;
}

到这里,基本上介绍了关于测量程序执行时间的大多数方法,我们可以根据需要在可接受误差范围内选定合适的方法,详细内容请参见<深入理解计算机系统>测量程序执行时间一章。 如果对软件测试有兴趣,想了解更多的测试知识,可以加入我的QQ群  高级测试学习大家庭:652068511

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325875743&siteId=291194637