由NaN引出svdcmp()的BUG

svdcmp()Numerical Recipes in C书中提供的SVD计算程序,之前用的时候遇到一个坑,记录以备忘。

发现在NR in C的论坛上早已有提及到:BUG: svdcmp() out-of-bounds array access,摘录部分如下:

BUG: svdcmp() out-of-bounds array access
edward
04-10-2006, 05:48 PM
Hi,
I’ve encountered a bug in svdcmp() (NR in C) which I’ve searched on the web and no one seems to have published a fix for.
In the loop for “Diagonalization of the biadiagonal form”, we have the following code:

flag=1;
for (l=k;l>=1;l--) {
    
     /* Test for splitting. */
	nm=l-1; /* Note that rv1[1] is always zero. */
	if ((double)(fabs(rv1[l])+anorm) == anorm) {
    
    
		flag=0;
		break;
	}
	if ((double)(fabs(w[nm])+anorm) == anorm) break;
}

Consider the case where for some reason, the if conditions are never true (either due to precision issues, or NaNs in the variables). Then when l == 1, nm becomes 0 and the access into w[] becomes out-of-bounds.
I tried doing a local fix where I change that line to

f (nm > 0 && (float)(fabsf(w[nm])+anorm) == anorm) break;

However, this doesn’t work as code further down the line assumes that l > 1. In an attempt to fix this I’ve inserted this code prior to the w[nm] access

if (l == 1) {
    
     // Added to avoid crash
flag = 0;
break;
}

That seems to avoid all the out-of-bounds accesses (thereby stopping potential crashing). However, does anyone know what the right fix is? I’ve since added some precautions to ensure that the matrix doesn’t have NaNs but in general, I think we should have a proper fix to make this code robust. Any ideas?
Thanks,
-Edward
Saul Teukolsky
04-12-2006, 12:12 PM
Dear Edward,

I think the reason why I hit it was because I had NaNs then (as previously mentioned). In that case, the if conditions will fail even when rv1[1] is 0 since anorm in my case is likely NaN.
Unfortunately, I’m at home right now and can’t double-check. I will try it again tomorrow and report back. Thanks for taking the time to reply!
Cheers,
-Edward
edward
04-13-2006, 09:09 AM
Yep, it’s because anorm was NAN. So let’s suppose I want to make it robust in face of NAN’s (robust in the sense of not crashing). What’s the right approach?
Thanks,
-Edward
Saul Teukolsky
04-13-2006, 05:50 PM
Hi Edward,
I think the “fixes” you inserted are as good as anything. Once the code has NaN’s, the results aren’t meaningful anyway.
Saul Teukolsky

由于这个程序的下标是以1为起始的,所以l == 1nm == 0时对于w[nm]的访问就非法了,那么为什么会出现l == 1的情况呢?
注意这一段代码:

	if ((float)(fabsf(rv1[l])+anorm) == anorm) {
    
    
		flag=0;
		break;
	}

由于rv1[1]总是为0,如果该if条件满足,就能够及时退出循环,不会产生后续对w[nm]的访问。从逻辑上看是很合理的做法,然而却忽视了数值可能为NaN的问题!
What if anorm is Not A Number ?
稍作验证便可以知道当anormNaN时判断是否相等的话就会导致问题了:

	float anorm = NAN;
	if ((anorm + 0) == anorm)
		cout << "true" << endl;
	else
		cout << "false" << endl; // The output is false!

也就是说当anorm = NaN时,anorm == anorm的结果会是FALSE(而anorm != anorm的结果会是TRUE),哈哈。

最后补充一些IEEE 754相关材料吧——

IEEE 754中关于Comparison是这样说明的:

Four mutually exclusive relations arepossible: less than, equal, greater than, and unordered. The last case ariseswhen at least one operand is NaN. Every NaN shall compare unordered witheverything, including itself.
IEEE754

BTW,关于IEEE754如此设计的原因,stackoverflow上有这样一个讨论What is the rationale for all comparisons returning false for IEEE754 NaN values?

其他参考资料:
[1]来自Kahan大神的Lecture Notes on the Status of IEEE Standard 754 for Binary Floating-Point Arithmetic
——这里提到一个坑就是某些过度优化会将诸如x==x语句的值是为1来处理,而对于NaN就不对了。
[2]Oracle公司关于浮点运算的文章What Every Computer Scientist Should Know About Floating-Point Arithmetic

おすすめ

転載: blog.csdn.net/u013213111/article/details/121483822