gravitas_deficiency
It’s not fear mongering. It’s a real area-denial weapons system capability that the PRC initially deployed in 1991, and upgraded the terminal weapon into a hypersonic glide body in 2015. Lob a few of those at a CVN and it’s probably going to the bottom of the ocean.
If the Biden admin was the first time the seriously looked at and understood that threat… well, that’s a bit insane, because the PRC has had that capability since the mid 2010s, and it’s only gotten more effective.
Yeah, it’s pretty clear the regime is trying to provoke a response. They’re doing it domestically, too.
Lol
Lmao, even
Gotta rizz ‘em with the ‘tism
Grandpa: I don’t have autism wtf are you talking about
Also Grandpa: I’ve meticulously organized my workshop and have a massive array of jars with lids fastened to the bottom side of my shelves; they are organized, in order, according to type, outer diameter, inner diameter, and finally, metallurgy.
MTBF is absolutely not six years if you’re running your H100 nodes at peak load and heat soaking the shit out of them. ML workloads are particularly hard on GPU RAM in particular, and sustained heat load on that particular component type on the board is known to degrade performance and integrity.
As to Meta’s (or MS, or OpenAI, or what have you) doc on MTBF: I don’t really trust them on that, because they’re a big player in the “AI” bubble, so of course they’d want to give the impression that the hardware they’re using in their data centers still have a bunch of useful life left. That’s a direct impact to their balance sheet. If they can misrepresent extremely expensive components that they have a shitload of as still being worth a lot, instead of being essentially being salvage/parts only, I would absolutely expect them to do that. Especially in the regulatory environment in which we now exist.
The problem is that the deprecation/obsolescence/lifetime cycles of GPUs are WAY more rapid than anyone in the “AI” circlejerk bubble is willing to admit. Aside from the generational upgrades that you tend to see in GPUs, which make older models far less valuable in terms of investment, server hardware simply cannot function at peak load indefinitely - and running GPUs at peak load constantly MASSIVELY shortens the MTBF.
TL;DR: the way GPUs are used in ML applications mean that they tend to cook themselves WAY quicker than the GPU you have in your gaming machine or console - as in, they often have a couple of years lifetime, max, and that failure rate is a bell curve.
I’m pretty sure that ship has sailed already
And there we go. Knew they’d try it sooner or later.
🌽🌽🌽
🌽~🌽~🌽^🌽^🌽~🌽~🌽