this post was submitted on 27 May 2025
1950 points (99.5% liked)

Programmer Humor

23554 readers
2080 users here now

Welcome to Programmer Humor!

This is a place where you can post jokes, memes, humor, etc. related to programming!

For sharing awful code theres also Programming Horror.

Rules

founded 2 years ago
MODERATORS
 
you are viewing a single comment's thread
view the rest of the comments
[โ€“] jcg@halubilo.social 1 points 2 days ago (1 children)

I think the main barriers are context length (useful context. GPT-4o has "128k context" but it's mostly sensitive to the beginning and end of the context and blurry in the middle. This is consistent with other LLMs), and just data not really existing. How many large scale, well written, well maintained projects are really out there? Orders of magnitude less than there are examples of "how to split a string in bash" or "how to set up validation in spring boot". We might "get there", but it'll take a whole lot of well written projects first, written by real humans, maybe with the help of AI here and there. Unless, that is, we build it with the ability to somehow learn and understand faster than humans.

[โ€“] MudMan@fedia.io 1 points 2 days ago

I don't know, some of these guys have acccess to a LOT of code, and even more debate about what those good codebases entail.

I think the other issue is more relevant. Even 128K tokens is not enough for something really big, and the memory and processing costs for that do skyrocket. People are trying to work around it with draft models and summarization models, so they try to pick out the relevant parts of a codebase in one pass and then base their code generation just on that, and... I don't think that's going to work reliably at scale. The more chances you give a language model to lose their goddamn mind and start making crap up unsupervised the more work it's going to be to take what they spit out and shape it into something reasonable.