AMD fTPM RNG BUG makes Linus Torvalds dissatisfied

guide Linus Torvalds recently expressed dissatisfaction with AMD's fTPM hardware random number generator on the mailing list, and suggested disabling it, because of the problems it caused to the kernel on Ryzen systems.

AMD fTPM RNG bug makes Linus Torvalds dissatisfied AMD fTPM RNG bug makes Linus Torvalds dissatisfied

Linus Torvalds recently expressed dissatisfaction with AMD's fTPM hardware random number generator on the mailing list, and suggested disabling it, because of the problems it caused to the kernel on Ryzen systems.

It is reported that AMD fTPM's random number generator has recently caused some stuttering issues, initially affecting Windows users, and subsequently affecting  Linux  users. While a fix for this issue has been rolled out and rolled back to earlier kernels, some thorny issues related to AMD's fTPM RNG remain unresolved, and some users are still reporting stuttering.

A new bug report last week claimed that using fTPM on some AMD platforms could cause stuttering. The fTPM firmware version used in this report is 0x3005700020005, which is the first time this has occurred on the Rembrand platform; existing kernel patches did not help.

This inevitably aroused the anger of Linus Torvalds, who said on the mailing list:

Let's disable the stupid fTPM hwrnd.

Maybe use it at startup to "gather entropy from different sources", but obviously it shouldn't be used at runtime.

Why would anyone use this crap, since any machine that supposedly fixes this problem (which it obviously doesn't) will have no problem with the CPU rdrand instruction?

If you don't trust the CPU rdrand implementation (which is also buggy - see clear_rdrand_cpuid_bit () and x86_init_rdrand ()), why trust the fTPM version which causes more problems? So I don't see any downside to saying "that fTPM thing doesn't work". Even if it does work in the end, there are alternatives that are no worse.

Therefore, I don't see any harm in saying "that fTPM thing is not working". Even if it works in the future, there are other alternatives that won't be worse than the present.

and added:

So, [problems with RDRAND] sounds unlikely, but who knows... Microcode can apparently do anything, at least the initial fTPM issues seem to be due to the BIOS doing some really crazy things like SPI flash access.

I can easily imagine the BIOS fTPM code using some absolutely horrible global "EFI synchronization" lock or something, causing some completely unrelated random issues.

For example, I wouldn't be surprised if it wasn't the fTPM hwrnd code itself that decided to read some random number from SPI, but it was just serializing with something else involved in the BIOS. Not all BIOS guys are known for their fully parallel scalable code...

I'd be very surprised if CPU microcode could do anything similar. And it's not impossible - HP has messed with timestamp counters with SMI, and I can imagine them - or someone else - doing the same with rdrand.

But compared to "EFI BIOS uses a one big lock approach", it does sound unlikely.

So rdrand (and especially rdseed) can be quite slow, but I think we're talking hundreds of CPU cycles (maybe thousands). Quite different from the jank reports we've seen from fTPM.

Hopefully with the extra pressure from Torvalds there will be some additional clarity and fixes to address the AMD fTPM issue under Linux.

 

Guess you like

Origin blog.csdn.net/weixin_43223083/article/details/132296176