Selfhosted
A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.
Rules:
- 
Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon. 
- 
No spam posting. 
- 
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear. 
- 
Don't duplicate the full text of your blog or github here. Just post the link for folks to click. 
- 
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda). 
- 
No trolling. 
Resources:
- selfh.st Newsletter and index of selfhosted software and apps
- awesome-selfhosted software
- awesome-sysadmin resources
- Self-Hosted Podcast from Jupiter Broadcasting
Any issues on the community? Report it using the report flag.
Questions? DM the mods!
Thanks, had not heard of this before! From skimming the link, it seems that the integration with HASS mostly focuses on providing wyoming endpoints (STT, TTS, wakeword), right? (Un)fortunately, that's the part that's already working really well 😄
However, the idea of just writing a stand-alone application with Ollama-compatible endpoints, but not actually putting an LLM behind it is genius, I had not thought about that. That could really simplify stuff if I decide to write a custom intent handler. So, yeah, thanks for the link!!
Hmm. I had pretty much the same experience, and wondered about having multiple conversation agents for specific tasks - but didn't get around to trying that out. Currently, I am using it without LLM, albeit with GPU accelerated whisper (and other custom CV tasks for camera feeds). This gives me fairly accurate STT, and I have defined a plethora of variable sentences for hassil (intent matcher), so I often get the correct match. There is the option for optional words and or-alternatives, for instance:
sentences:
 - (start|begin|fire) [the] [one] vaccum clean(er|ing) [robot] [session]
So this would match "start vacuum", but also "fire one vacuum cleaning session"
Of course, this is substantial effort initially, but once configured and debugged (punctuation is poison!) works pretty well. As an aside, using the atom echo satellites gave me a lot of errors, simply because the microphones are bad. With a better quality satellite device (the voice preview) the success rate is much higher, almost flawless.
That all said, if you find a better intent matcher or another solution, please do report back as I am very interested in an easier solution that does not require me to think of all possible sentence ahead of time.
ML engineer here. My intuition says you won’t get better accuracy than with sentence template matching, provided your matching rules are free of contradictions. Of course, the downside is you need to remember (and teach others) the precise phrasing to trigger a certain intent. Refining your matching rules is probably a good task for a coding agent.
Back in the pre-LLM days, we used simpler statistical models for intent classification. These were way smaller and could easily run on CPU. Check out random forests or SVMs that take bags of words as input. You need enough examples though to train them on.
With an LLM you can reframe the problem as getting the model to generate the right ‘tool’ call. Most intents are a form of relation extraction: there’s an ‘action’ (verb) and one or more participants (subject, object, etc.). You could imagine a single tool definition (call it ‘SpeakerIntent’) that outputs the intent type (from an enum) as well as the arguments involved. Then you can link that to the final intent with some post-processing. There’s a 100M version of gemma3 that’s apparently not bad at tool calling.
Thanks for your input! The problem with the LLM approach for me is mostly that I have so many entities, HASS exposing them all (or even the subset of those I really, really want) is already big enough to slow everything to a crawl, and to get bad results from all models I've tried. I'll give the model you mentioned another shot though.
However, I really don't want to use an LLM for this. It seems brittle and like overkill at the same time. As you said, intent classification is a wee bit older than LLMs.
Unfortunately, the sentence template matching approach alone isn't sufficient, because quite frequently, the STT is imperfect. With HomeAssistant, currently the intent "turn off all lights" is, for example, not understood if STT produces "turn off all light". And sure, you can extend the template for that. But what about
- turn of all lights
- turn off wall lights
- turnip off all lights
- off all lights
- off all fights
- ...
A human would go "huh? oh, sure, I'll turn off all lights". An LLM might as well. But a fuzzy matching / closest Levensthein distance approach should be more than sufficient for this, too.
Basically, I generally like the sentence template approach used by HASS, but it just needs that little bit of additional robustness against imperfections.
From my understanding of word embeddings (as used by LLMs), you could skip the LLM and directly compare the similarity of what the STT outputs to each task or phrase in a list you have prepared. You'd need to test it out a few times to see what threshold works, but even testing against dozens of phrases should be much faster than spinning up an LLM - and it should be fully deterministic.
Yep, that's the idea! This post basically boils down to "does this exist for HASS already, or do I need to implement it?" and the answer, unfortunately, seems to be the latter.