this post was submitted on 22 May 2025
477 points (97.0% liked)
memes
14891 readers
5029 users here now
Community rules
1. Be civil
No trolling, bigotry or other insulting / annoying behaviour
2. No politics
This is non-politics community. For political memes please go to !politicalmemes@lemmy.world
3. No recent reposts
Check for reposts when posting a meme, you can only repost after 1 month
4. No bots
No bots without the express approval of the mods or the admins
5. No Spam/Ads
No advertisements or spam. This is an instance rule and the only way to live.
A collection of some classic Lemmy memes for your enjoyment
Sister communities
- !tenforward@lemmy.world : Star Trek memes, chat and shitposts
- !lemmyshitpost@lemmy.world : Lemmy Shitposts, anything and everything goes.
- !linuxmemes@lemmy.world : Linux themed memes
- !comicstrips@lemmy.world : for those who love comic stories.
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
The meme don't make sense. An SRAM cache of that size would be so slow that you would most likely save clock cycles reading directly from RAM an not having a cache at all...
Slow? Not necessarily.
The main issue with that much memory is the data routing and the physical locality of the memory. Assuming you (somehow) could shrink down the distance from the cache to the registers and could have a wide enough data line/request lines you can have data from such a cache in ~4 cycles (assuming L1 and a hit).
What slows down memory for L2 is the wider address space and slower residence checks. L3 gets a bit slower because of even wider address spaces but also it has to deal with concurrency issues since it's shared among cores. It also ends up being slower because it physically has to be further away from the cores due to it's size.
If you ever look at a CPU die, you'll see that L1 caches are generally tiny and embedded right into the center of the processor. L2 tends to be bolted onto the sides of the physical cores. And L3 tends to be the largest amount of silicon real estate on a CPU package. This is all what contributes to the increasing fetch performance for each layer along with the fact that you have to check the closest layers first (An L3 hit, for example, means that the CPU checked L1 and L2 and failed at both which takes time. So L3 access will always be at least the L1 + L2 times).
I agree. When evaluating cache access latency, it is important to consider the entire read path rather than just the intrinsic access time of a single SRAM cell. Much of the latency arises from all the supporting operations required for a functioning cache, such as tag lookups, address decoding, and bitline traversal. As you pointed out, implementing an 8 GB SRAM cache on-die using current manufacturing technology would be extremely impractical. The physical size would lead to substantial wire delays and increased complexity in the indexing and associativity circuits. As a result, the access latency of such a large on-chip cache could actually exceed that of off-chip DRAM, which would defeat the main purpose of having on-die caches in the first place.