this post was submitted on 21 Nov 2025
824 points (98.2% liked)

Technology

76945 readers
4642 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] khepri@lemmy.world 3 points 13 hours ago (2 children)

It's why I trust my random unauditable chinese matrix soup over my random unauditable american matrix soup frankly

[–] DandomRude@lemmy.world 6 points 13 hours ago (2 children)

You mean Deepseek on a local device?

[–] brucethemoose@lemmy.world 2 points 6 hours ago* (last edited 6 hours ago) (1 children)

Most aren't really running Deepseek locally. What ollama advertises (and basically lies about) is the now-obselete Qwen 2.5 distillations.

...I mean, some are, but it's exclusively lunatics with EPYC homelab servers, heh. And they are not using ollama.

[–] DandomRude@lemmy.world 2 points 5 hours ago (2 children)

Thx for clarifying.

I once tried a community version from huggingface (distilled), which worked quite well even on modest hardware. But that was a while ago. Unfortunately, I haven't had much time to look into this stuff lately, but I wanted to check that again at some point.

[–] brucethemoose@lemmy.world 2 points 4 hours ago

You can run GLM Air on pretty much any gaming desktop with 48GB+ of RAM. Check out ubergarm's ik_llama.cpp quants on Huggingface; that’s state of the art right now.

[–] brucethemoose@lemmy.world 2 points 4 hours ago* (last edited 4 hours ago) (1 children)

Also, I’m a quant cooker myself. Say the word, and I can upload an IK quant more specifically tailored for whatever your hardware/aim is.

[–] DandomRude@lemmy.world 1 points 3 hours ago (1 children)

Thank you! I might get back to you on that sometime.

[–] brucethemoose@lemmy.world 2 points 2 hours ago

Do it!

Feel free to spam me if I don’t answer at first. I’m not ignoring you; Lemmy fails to send me reply notifications, sometimes.

[–] khepri@lemmy.world 1 points 13 hours ago (1 children)

naw, I mean more that the kind of people who uncritically would take everything a chatbot says a face value are probably better off being in chatGPTs little curated garden anyway. Cause people like that are going to immediately get grifted into whatever comes along first no matter what, and a lot of those are a lot more dangerous to the rest of us that a bot that won't talk great replacement with you.

[–] DandomRude@lemmy.world 1 points 12 hours ago (1 children)

Ahh, thank you—I had misunderstood that, since Deepseek is (more or less) an open-source LLM from China that can also be used and fine-tuned on your own device using your own hardware.

[–] ranzispa@mander.xyz 1 points 11 hours ago (3 children)

Do you have a cluster with 10 A100 lying around? Because that's what it gets to run deepseek. It is open source, but it is far from accessible to run on your own hardware.

[–] khepri@lemmy.world 1 points 4 hours ago* (last edited 4 hours ago)

I run quantized versions on deepseek that are usable enough for chat, and it's on a home set that is so old and slow by today's standards I won't even mention the specs lol. Let's just say the rig is from 2018 and it wasn't near the best even back then.

[–] DandomRude@lemmy.world 1 points 11 hours ago (1 children)

Yes, that's true. It is resource-intensive, but unlike other capable LLMs, it is somewhat possible—not for most private individuals due to the requirements, but for companies with the necessary budget.

[–] FauxLiving@lemmy.world 5 points 11 hours ago (1 children)

They're overestimating the costs. 4x H100 and 512GB DDR4 will run the full DeepSeek-R1 model, that's about $100k of GPU and $7k of RAM. It's not something you're going to have in your homelab (for a few years at least) but it's well within the budget of a hobbyist group or moderately sized local business.

Since it's an open weights model, people have created quantized versions of the model. The resulting models can have much less parameters and that makes their RAM requirements a lot lower.

You can run quantized versions of DeepSeek-R1 locally. I'm running deepseek-r1-0528-qwen3-8b on a machine with an NVIDIA 3080 12GB and 64GB RAM. Unless you pay for an AI service and are using their flagship models, it's pretty indistinguishable from the full model.

If you're coding or doing other tasks that push AI it'll stumble more often, but for a 'ChatGPT' style interaction you couldn't tell the difference between it and ChatGPT.

[–] brucethemoose@lemmy.world 1 points 6 hours ago* (last edited 6 hours ago) (1 children)

You should be running hybrid inference of GLM Air with a setup like that. Qwen 8B is kinda obsolete.

I dunno what kind of speeds you absolutely need, but I bet you could get at least 12 tokens/s.

[–] FauxLiving@lemmy.world 1 points 7 minutes ago

Thanks for the recommendation, I'll look into GLM Air, I haven't looked into the current state of the art for self-hosting in a while.

I just use this model to translate natural language into JSON commands for my home automation system. I probably don't need a reasoning model, but it doesn't need to be super quick. A typical query uses very few tokens (like 3-4 keys in JSON).

The next project will be some kind of agent. A 'go and Google this and summarize the results' agent at first. I haven't messed around much with MCP Servers or Agents (other than for coding). The image models I'm using are probably pretty dated too, they're all variants of SDXL and I stopped messing with ComfyUI before video generation was possible locally, so I gotta grab another few hundred GB of models.

It's a lot to keep up with.😮‍💨

[–] brucethemoose@lemmy.world 1 points 6 hours ago* (last edited 6 hours ago)

That's not strictly true.

I have a Ryzen 7800 gaming destkop, RTX 3090, and 128GB DDR5. Nothing that unreasonable. And I can run the full GLM 4.6 with quite acceptable token divergence compared to the unquantized model, see: https://huggingface.co/Downtown-Case/GLM-4.6-128GB-RAM-IK-GGUF

If I had a EPYC/Threadripper homelab, I could run Deepseek the same way.

[–] RaoulDook@lemmy.world 2 points 6 hours ago (1 children)

Trusting any of that shit is the problem.

[–] khepri@lemmy.world 1 points 4 hours ago

There you go. Any of these things is just another datapoint. You need many datapoints to decide if the information you're getting is valuable and valid.