this post was submitted on 16 Jun 2025

112 points (95.9% liked)

Selfhosted

46671 readers

774 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
No spam posting.
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.
Don't duplicate the full text of your blog or github here. Just post the link for folks to click.
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
No trolling.

Resources:

selfh.st Newsletter and index of selfhosted software and apps
awesome-selfhosted software
awesome-sysadmin resources
Self-Hosted Podcast from Jupiter Broadcasting

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago

MODERATORS

HybridSarcasm@lemmy.world

HybridSarcasm@lemmy.hybridsarcasm.xyz

112

What is a self-hosted small LLM actually good for (<= 3B) (lemmy.world)

submitted 2 weeks ago* (last edited 2 weeks ago) by catty@lemmy.world to c/selfhosted@lemmy.world

65 comments fedilink hide all child comments

I've tried coding and every one I've tried fails unless really, really basic small functions like what you learn as a newbie compared to say 4o mini that can spit out more sensible stuff that works.

I've tried explanations and they just regurgitate sentences that can be irrelevant, wrong, or get stuck in a loop.

So. what can I actually use a small LLM for? Which ones? I ask because I have an old laptop and the GPU can't really handle anything above 4B in a timely manner. 8B is about 1 t/s!

top 50 comments

sorted by: hot top controversial new old

[–] HelloRoot@lemy.lol 39 points 2 weeks ago* (last edited 2 weeks ago)

Sorry, I am just gonne dump you some links from my bookmarks that were related and interesting to read, cause I am traveling and have to get up in a minute, but I've been interested in this topic for a while. All of the links discuss at least some usecases. For some reason microsoft is really into tiny models and made big breakthroughs there.

https://reddit.com/r/LocalLLaMA/comments/1cdrw7p/what_are_the_potential_uses_of_small_less_than_3b/

https://github.com/microsoft/BitNet

https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/

https://news.microsoft.com/source/features/ai/the-phi-3-small-language-models-with-big-potential/

https://techcommunity.microsoft.com/blog/aiplatformblog/introducing-phi-4-microsoft%E2%80%99s-newest-small-language-model-specializing-in-comple/4357090

[–] iii@mander.xyz 25 points 2 weeks ago (1 children)

Converting free text to standardized forms such as json

[–] MadMadBunny@lemmy.ca 5 points 2 weeks ago (1 children)

Oh—do you happen to have any recommendations for that?

[–] iii@mander.xyz 16 points 2 weeks ago

DeepSeek-R1-Distill-Qwen-1.5B

[–] some_guy@lemmy.sdf.org 19 points 2 weeks ago (1 children)

I installed Llama. I've not found any use for it. I mean, I've asked it for a recipe because recipe websites suck, but that's about it.

[–] GreenKnight23@lemmy.world 43 points 2 weeks ago

you can do a lot with it.

I heated my office with it this past winter.

[–] entwine413@lemm.ee 15 points 2 weeks ago* (last edited 2 weeks ago) (1 children)

I've integrated mine into Home Assistant, which makes it easier to use their voice commands.

I haven't done a ton with it yet besides set it up, though, since I'm still getting proxmox configured on my gaming rig.

[–] Passerby6497@lemmy.world 3 points 2 weeks ago (1 children)

What are you using for voice integration? I really don't want to buy and assemble their solution if I don't have to

[–] entwine413@lemm.ee 3 points 2 weeks ago (1 children)

I just use the companion app for now. But I am designing a HAL9000 system for my home.

[–] shnizmuffin@lemmy.inbutts.lol 2 points 2 weeks ago (1 children)

[ A DIM SCREEN WITH ORANGE TEXT ]

Objective: optimize electrical bill during off hours.

... USER STATUS: UNCONSCIOUS 
... LIGHTING SYSTEM: DISABLED
... AUDIO/VISUAL SYSTEM: DISABLED 
... CLIMATE SYSTEM: ECO MODE ENABLED
... SURVEILLANCE SYSTEM: ENABLED 
... DOOR LOCKS: ENGAGED
... CELLULAR DATA: DISABLED
... WIRELESS ACCESS POINTS: DISABLED
... SMOKE ALARMS: DISABLED
... CO2 ALARMS: DISABLED
... FURNACE: SET TO DIAGNOSTIC MODE
... FURNACE_PILOT: DISABLED
... FURNACE_GAS: ENABLED

WARN: Furnace gas has been enabled without a Furnace pilot. Please consult the user manual to ensure proper installation procedure.

... FURNACE: POWERED OFF

Objective realized. Entering low power mode.

[ Cut to OP, motionless in bed ]

[–] entwine413@lemm.ee 3 points 2 weeks ago (1 children)

Luckily my entire neighborhood doesn't have gas and I have a heat pump.

But rest assured, I'm designing the system with 20% less mental illness

[–] shnizmuffin@lemmy.inbutts.lol 3 points 2 weeks ago (1 children)

All systems need a little mental illness.

[–] entwine413@lemm.ee 3 points 2 weeks ago

It's what keeps things fun! I don't want a system that I don't have to troubleshoot every once in a while.

[–] MTK@lemmy.world 13 points 2 weeks ago (1 children)

Have you tried RAG? I believe that they are actually pretty good for searching and compiling content from RAG.

So in theory you could have it connect to all of you local documents and use it for quick questions. Or maybe connected to your signal/whatsapp/sms chat history to ask questions about past conversations

[–] catty@lemmy.world 5 points 2 weeks ago (1 children)

No, what is it? How do I try it?

[–] MTK@lemmy.world 14 points 2 weeks ago (1 children)

RAG is basically like telling an LLM "look here for more info before you answer" so it can check out local documents to give an answer that is more relevant to you.

You just search "open web ui rag" and find plenty kf explanations and tutorials

[–] iii@mander.xyz 4 points 2 weeks ago* (last edited 2 weeks ago)

I think RAG will be surpassed by LLMs in a loop with tool calling (aka agents), with search being one of the tools.

[–] ikidd@lemmy.world 10 points 2 weeks ago (2 children)

It'll work for quick bash scripts and one-off things like that. But there's not usually enough context window unless you're using a 24G GPU or such.

[–] catty@lemmy.world 4 points 2 weeks ago

Yeah shell scripts are one of those things that you never remember how to do something and have to always look it up!

[–] smayonak@lemmy.world 3 points 2 weeks ago (1 children)

Snippets are a great use.

I use StableCode on my phone as a programming tutor for learning Python. It is outstanding in both speed and in accuracy for this task. I have it generate definitions which I copy and paste into Anki the flashcard app. Whenever I'm on a bus or airplane I just start studying. Wish that it could also quiz me interactively.

[–] catty@lemmy.world 6 points 2 weeks ago (1 children)

Please be very careful. The python code it'll spit out will most likely be outdated, not work as well as it should (the code isn't "thought out" as if a human did it.

If you want to learn, dive it, set yourself tasks, get stuck, and f around.

[–] smayonak@lemmy.world 4 points 2 weeks ago

I know what you mean. All the code generated with ai was loaded with problems. Specifically it kept forcing my api keys into the code without using environmental variables. But for basic coding concepts it has so far been perfect. even a 3b model seemingly generates great definitions

[–] Mordikan@kbin.earth 9 points 2 weeks ago (1 children)

I've used smollm2:135m for projects in DBeaver building larger queries. The box it runs on is Intel HD 530 graphics with an old i5-6500T processor. Doesn't seem to really stress the CPU.

UPDATE: I apologize to the downvoter for not masochistically wanting to build a 1000 line bulk insert statement by hand.

[–] HiTekRedNek@lemmy.world 2 points 2 weeks ago (1 children)

How, exactly, do you have Intel HD graphics, found on Intel APUs, on a Ryzen AMD system?

[–] Mordikan@kbin.earth 2 points 2 weeks ago

Sorry, I was trying to find parts for my daughter's machine while doing this (cheap Minecraft build). I corrected my comment.

[–] swelter_spark@reddthat.com 8 points 2 weeks ago

7b is the smallest I've found useful. I'd try a smaller quant before going lower, if I had super small vram.

[–] irmadlad@lemmy.world 8 points 2 weeks ago

As cool and neato as I find AI to be, I haven't really found a good use case for it in the selfhosting/homelabbing arena. Most of my equipment is ancient and lacking the GPU necessary to drive that bus.

[–] RickyRigatoni@retrolemmy.com 5 points 2 weeks ago (1 children)

I have it roleplay scenarios with me and sometimes I verbally abuse it for fun.

[–] wise_pancake@lemmy.ca 1 points 1 week ago (1 children)

Weirdly I'm polite to all LLMs, but Gemini sets me off and I end up yelling at it.

[–] RickyRigatoni@retrolemmy.com 2 points 1 week ago

it's just so pushy and hard to remove. it's asking for abuse.

[–] CrayonDevourer@lemmy.world 4 points 2 weeks ago* (last edited 2 weeks ago) (3 children)

Currently I've been using a local AI (a couple different kinds) to first - take the audio from a Twitch stream; so that I have context about the conversation, convert it to text, and then use a second AI; an LLM fed the first AIs translation + twitch chat and store 'facts' about specific users so that they can be referenced quickly for a streamer who has ADHD in order to be more personable.

That way, the guy can ask User X how their mothers surgery went. Or he can remember that User K has a birthday coming up. Or remember that User G's son just got a PS5 for Christmas, and wants a specific game.

It allows him to be more personable because he has issues remembering details about his users. It's still kind of a big alpha test at the moment, because we don't know the best way to display the 'data', but it functions as an aid.

[–] shnizmuffin@lemmy.inbutts.lol 9 points 2 weeks ago (19 children)

Hey, you're treating that data with the respect it demands, right? And you definitely collected consent from those chat participants before you Hoover'd up their [re-reads example] extremely Personal Identification Information AND Personal Health Information, right? Because if you didn't, you're in violation of a bunch of laws and the Twitch TOS.

load more comments (19 replies)

[–] catty@lemmy.world 1 points 2 weeks ago (4 children)

Surely none of that uses a small LLM <= 3B?

load more comments (4 replies)

[–] Hadowenkiroast@piefed.social 1 points 2 weeks ago (1 children)

sounds like salesforce for a twitch setting. cool use case, must make fun moments when he mentions such things.

[–] jlow@discuss.tchncs.de 4 points 2 weeks ago (1 children)

Esp. if the LLM just hallucinates 50% of the "facts" a about the users 👌

[–] CrayonDevourer@lemmy.world 3 points 2 weeks ago* (last edited 2 weeks ago)

That hasn't been a problem at all for the 200+ users it's tracking so far for about 4 months.

I don't know a human that could ever keep up with this kind of thing. People just think he's super personable, but in reality he's not. He's just got a really cool tool to use.

He's managed some really good numbers because being that personal with people brings them back and keeps them chatting. He'll be pushing for partner after streaming for only a year and he's just some guy I found playing Wild Hearts with 0 viewers one day... :P

[–] ragingHungryPanda@lemmy.zip 4 points 2 weeks ago (2 children)

I've run a few models that I could on my GPU. I don't think the smaller models are really good enough. They can do stuff, sure, but to get anything out of it, I think you need the larger models.

They can be used for basic things, though. There are coder specific models you can look at. Deepseek and qwen coder are some popular ones

load more comments (2 replies)

[–] hendrik@palaver.p3x.de 3 points 2 weeks ago* (last edited 2 weeks ago)

I think that's a size where it's a bit more than a good autocomplete. Could be part of a chain for retrieval augmented generation. Maybe some specific tasks. And there are small machine learning models that can do translation or sentiment analysis, though I don't think those are your regular LLM chatbots... And well, you can ask basic questions and write dialogue. Something like "What is an Alpaca?" will work. But they don't have much knowledge under 8B parameters and they regularly struggle to apply their knowledge to a given task at smaller sizes. At least that's my experience. They've become way better at smaller sizes during the last year or so. But they're very limited.

I'm not sure what you intend to do. If you have some specific thing you'd like an LLM to do, you need to pick the correct one. If you don't have any use-case... just run an arbitrary one and tinker around?

load more comments