this post was submitted on 30 Oct 2025
916 points (99.4% liked)

Technology

76512 readers
2272 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] ErmahgherdDavid@lemmy.dbzer0.com 4 points 1 day ago (1 children)

Unlike the dotcom bubble, Another big aspect of it is the unit cost to run the models.

Traditional web applications scale really well. The incremental cost of adding a new user to your app is basically nothing. Fractions of a cent. With LLMs, scaling is linear. Each machine can only handle a few hundred users and they're expensive to run:

Big beefy GPUs are required for inference as well as training and they require a large amount of VRAM. Your typical home gaming GPU might have 16gb vram, 32 if you go high end and spend $2500 on it (just the GPU, not the whole pc). Frontier models need like 128gb VRAM to run and GPUs manufactured for data centre use cost a lot more. A state of the art Nvidia h200 costs $32k. The servers that can host one of these big frontier models cost, at best, $20 an hour to run and can only handle a handful of user requests so you need to scale linearly as your subscriber count increases. If you're charging $20 a month for access to your model, you are burning a user's monthly subscription every hour for each of these monster servers you have turned on. That's generous and assumes they're not paying the "on-demand" price of $60/hr.

Sam Altman famously said OpenAI are losing money on their $200/mo subscriptions.

If/when there is a market correction, a huge factor of the amount of continued interest (like with the internet after dotcom) is whether the quality of output from these models reflects the true, unsubsidized price of running them. I do think local models powered by things like llamacpp and ollama and which can run on high end gaming rigs and macbooks might be a possible direction for these models. Currently though you can't get the same quality as state-of-the-art models from these small, local LLMs.

[–] definitemaybe@lemmy.ca 2 points 23 hours ago* (last edited 20 hours ago) (1 children)

Re: your last paragraph:

I think the future is likely going to be more task-specific, targeted models. I don't have the research handy, but small, targeted LLMs can outperform massive LLMs at a tiny fraction of the compute costs to both train and run the model, and can be run on much more modest hardware to boot.

Like, an LLM that is targeted only at:

  • teaching writing and reading skills
  • teaching English writing to English Language Learners
  • writing business emails and documents
  • writing/editing only resumes and cover letters
  • summarizing text
  • summarizing fiction texts
  • writing & analyzing poetry
  • analyzing poetry only (not even writing poetry)
  • a counselor
  • an ADHD counselor
  • a depression counselor

The more specific the model, the smaller the LLM can be that can do the targeted task (s) "well".

[–] ErmahgherdDavid@lemmy.dbzer0.com 2 points 20 hours ago

Yeah I agree. Small models is the way. You can also use LoRa/QLoRa adapters to "fine tune" the same big model for specific tasks and swap the use case in realtime. This is what apple do with apple intelligence. You can outperform a big general LLM with an SLM if you have a nice specific use case and some data (which you can synthesise in come cases)