Anubis is an elegant solution to the ai bot scraper issue, I just wish the solution to everything wasn't just spending compute everywhere. In a world where we need to rethink our energy consumption and generation, even on clients, this is a stupid use of computing power.
Selfhosted
A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.
Rules:
-
Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
-
No spam posting.
-
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.
-
Don't duplicate the full text of your blog or github here. Just post the link for folks to click.
-
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
-
No trolling.
Resources:
- selfh.st Newsletter and index of selfhosted software and apps
- awesome-selfhosted software
- awesome-sysadmin resources
- Self-Hosted Podcast from Jupiter Broadcasting
Any issues on the community? Report it using the report flag.
Questions? DM the mods!
It also doesn’t function without JavaScript. If you’re security or privacy conscious chances are not zero that you have JS disabled, in which case this presents a roadblock.
On the flip side of things, if you are a creator and you’d prefer to not make use of JS (there’s dozens of us) then forcing people to go through a JS “security check” feels kind of shit. The alternative is to just take the hammering, and that feels just as bad.
No hate on Anubis. Quite the opposite, really. It just sucks that we need it.
Theres a compute option that doesnt require javascript. The responsibility lays on site owners to properly configure IMO, though you can make the argument its not default I guess.
https://anubis.techaro.lol/docs/admin/configuration/challenges/metarefresh
From docs on Meta Refresh Method
Meta Refresh (No JavaScript)
The metarefresh challenge sends a browser a much simpler challenge that makes it refresh the page after a set period of time. This enables clients to pass challenges without executing JavaScript.
To use it in your Anubis configuration:
# Generic catchall rule
- name: generic-browser
user_agent_regex: >-
Mozilla|Opera
action: CHALLENGE
challenge:
difficulty: 1 # Number of seconds to wait before refreshing the page
algorithm: metarefresh # Specify a non-JS challenge method
This is not enabled by default while this method is tested and its false positive rate is ascertained. Many modern scrapers use headless Google Chrome, so this will have a much higher false positive rate.
This is news to me! Thanks for enlightening me!
if you are a creator and you’d prefer to not make use of JS (there’s dozens of us) then forcing people to go through a JS “security check” feels kind of shit. The alternative is to just take the hammering, and that feels just as bad.
I'm with you here. I come from an older time on the Internet. I'm not much of a creator, but I do have websites, and unlike many self-hosters I think, in the spirit of the internet, they should be open to the public as a matter of principle, not cowering away for my own private use behind some encrypted VPN. I want it to be shared. Sometimes that means taking a hammering. It's fine. It's nothing that's going to end the world if it goes down or goes away, and I try not to make a habit of being so irritating that anyone would have much legitimate reason to target me.
I don't like any of these sort of protections that put the burden onto legitimate users. I get that's the reality we live in, but I reject that reality, and substitute my own. I understand that some people need to be able to block that sort of traffic to be able to limit and justify the very real costs of providing services for free on the Internet and Anubis does its job for that. But I'm not one of those people. It has yet to cost me a cent above what I have already decided to pay, and until it does, I have the freedom to adhere to my principles on this.
To paraphrase another great movie: Why should any legitimate user be inconvenienced when the bots are the ones who suck. I refuse to punish the wrong party.
Scarcity is what powers this type of challenge: you have to prove you spent a certain amount of electricity in exchange for access to the site, and because electricity isn't free, this imposes a dollar cost on bots.
You could skip the detour through hashes/electricity and do something with a proof-of-stake cryptocurrency, and just pay for access. The site owner actually gets compensated instead of burning dead dinosaurs.
Obviously there are practical roadblocks to this today that a JavaScript proof-of-work challenge doesn't face, but longer term...
The cost here only really impacts regular users, too. The type of users you actually want to block have budgets which easily allow for the compute needed anyways.
I think maybe they wouldn't if they are trying to scale their operations to scanning through millions of sites and your site is just one of them
Yeah, exactly. A regular user isn't going to notice an extra few cents on their electricity bill (boiling water costs more), but a data centre certainly will when you scale up.
I appreciate a simple piece of software that does exactly what it’s supposed to do.
The front page of the web site is excellent. It describes what it does, and it does its feature set in quick, simple terms.
I can't tell you how many times I've gone to a website for some open-source software and had no idea what it was or how it was trying to do it. They often dive deep into the 300 different ways of installing it, tell you what the current version is and what features it has over the last version, but often they just assume you know the basics.
I've repeatedly stated this before: Proof of Work bot-management is only Proof of Javascript bot-management. It is nothing to a headless browser to by-pass. Proof of JavaScript does work and will stop the vast majority of bot traffic. That's how Anubis actually works. You don't need to punish actual users by abusing their CPU. POW is a far higher cost on your actual users than the bots.
Last I checked Anubis has an JavaScript-less strategy called "Meta Refresh". It first serves you a blank HTML page with a <meta> tag instructing the browser to refresh and load the real page. I highly advise using the Meta Refresh strategy. It should be the default.
I'm glad someone is finally making an open source and self hostable bot management solution. And I don't give a shit about the cat-girls, nor should you. But Techaro admitted they had little idea what they were doing when they started and went for the "nuclear option". Fuck Proof of Work. It was a Dead On Arrival idea decades ago. Techaro should strip it from Anubis.
I haven't caught up with what's new with Anubis, but if they want to get stricter bot-management, they should check for actual graphics acceleration.
Something that hasn't been mentioned much in discussions about Anubis is that it has a graded tier system of how sketchy a client is and changing the kind of challenge based on a a weighted priority system.
The default bot policies it comes with has it so squeaky clean regular clients are passed through, then only slightly weighted clients/IPs get the metarefresh, then its when you get to moderate-suspicion level that JavaScript Proof of Work kicks. The bot policy and weight triggers for these levels, challenge action, and duration of clients validity are all configurable.
It seems to me that the sites who heavy hand the proof of work for every client with validity that only last every 5 minutes are the ones who are giving Anubis a bad wrap. The default bot policy settings Anubis comes with dont trigger PoW on the regular Firefox android clients ive tried including hardened ironfox. meanwhile other sites show the finger wag every connection no matter what.
Its understandable why some choose strict policies but they give the impression this is the only way it should be done which Is overkill. I'm glad theres config options to mitigate impact normal user experience.
I like the quirky SPH character
Counterpoint: Anubis is not awesome: https://lock.cmpxchg8b.com/anubis.html
thank you! this needed said.
- This post is a bit critical of a small well-intentioned project, so I felt obliged to email the maintainer to discuss it before posting it online. I didn’t hear back.
i used to watch the dev on mastodon, they seemed pretty radicalized on killing AI, and anyone who uses it (kidding!!) i'm not even surprised you didn't hear back
great take on the software, and as far as i can tell, playwright still works/completes the unit of work. at scale anubis still seems to work if you have popular content, but does hasnt stopped me using claude code + virtual browsers
im not actively testing it though. im probably very wrong about a few things, but i know anubis isn't hindering my personal scraping, it does fuck up perplexity and chatgpt bots, which is fun to see.
good luck Blue team!
the dev […] seemed pretty radicalized on killing Ai
As one should, to lead a similar project.
What use cases does perplexity do that Claude doesn't for you?
For clarity: I didn’t write the article, it’s just a good reference.
I don't mind Anubis but the challenge page shouldn't really load an image. It's wasting extra bandwidth for nothing.
Just parse the challenge and move on.
Afaik, you can set it up not to have any image, or have any other one.
It's a palette of 10 colours. I would guess it uses an indexed colorspace, reducing the size to a minimum.
edit: 28 KB on disk
A HTTP get request is a few hundred bytes. The response is 28KB. Thats 280x. If a large botnet wanted to denial of service an Anubis protected site, requesting that image could be enough.
Ideally, Anubis should serve as little data as possible until the POW is completed. Caching the POW algorithm (and the image) to a CDN would also mitigate the issue.
The whole point of Anubis is to not have to go through a CDN to sustain scrapping botnets
I don't really understand what I am seeing here, so I have to ask -- are these Security issues a concern?
https://github.com/TecharoHQ/anubis/security
I have a server running a few tiny web sites, so I am considering this, but I'm always concerned about the possibility that adding more things to it could make it less secure, versus more. Thanks for any thoughts.
all of the issues listed are closed so any recent version is fine.
also, you probably don't need to deploy this unless you have a problem with bots.
Security issues are always a concern the question is how much. Looking at it they seem to at most be ways to circumvent the Anubis redirect system to get to your page using very specific exploits. These are marked as m low to moderate priority and I do not see anything that implies like system level access which is the big concern. Obviously do what you feel is best but IMO its not worth sweating about. Nice thing about open source projects is that anyone can look through and fix, if this gets more popular you can expect bug bounties and professional pen testing submissions.
When I visit sites on my cellphone, Anubis often doesn't let me through.
I've never had any issues on my phone using Fennec or Firefox. I don't have many addons installed apart from uBlock Origin. I wouldn't be surprised if some privacy addons cause issues with Anubis though.
Yeah, my setup is almost like yours; I'm also on Firefox with unlock and the only difference is that I'm also using Privacy Badger
I have a script that watches apache or caddy logs for poison link hits and a set of bot user agents, adding IPs to an ipset blacklist, blocking with iptables. I should polish it up for others to try. My list of unique IPs is well over 10k in just a few days.
git repos seem to be real bait for these damn AI scrapers.
"Anubis has risen, Wendell"
"Are you Jane's addiction"?
I use it with OpenBSD’s relayd and I find it amazing how little maintenance it needs.
It's a great service. I hate the character.
You know the thing is that they know the character is a problem/annoyance, thats how they grease the wheel on selling subscription access to a commecial version with different branding.
https://anubis.techaro.lol/docs/admin/botstopper/
pricing from site
Commercial support and an unbranded version
If you want to use Anubis but organizational policies prevent you from using the branding that the open source project ships, we offer a commercial version of Anubis named BotStopper. BotStopper builds off of the open source core of Anubis and offers organizations more control over the branding, including but not limited to:
- Custom images for different states of the challenge process (in process, success, failure)
- Custom CSS and fonts
- Custom titles for the challenge and error pages
- "Anubis" replaced with "BotStopper" across the UI
- A private bug tracker for issues
In the near future this will expand to:
- A private challenge implementation that does advanced fingerprinting to check if the client is a genuine browser or not
- Advanced fingerprinting via Thoth-based advanced checks
In order to sign up for BotStopper, please do one of the following:
- Sign up on GitHub Sponsors at the $50 per month tier or higher
- Email sales@techaro.lol with your requirements for invoicing, please note that custom invoicing will cost more than using GitHub Sponsors for understandable overhead reasons
I have to respect the play tbh its clever. Absolutely the kind of greasy shit play that Julian from the trailer park boys would do if he were an open source developer.
I wish more projects did stuff like this.
It just feels silly and unprofessional while being seriously useful. Exactly my flavour of software, makes the web feel less corporate.
You can customize the images if you want: https://anubis.techaro.lol/docs/admin/botstopper#customizing-images
At the time of commenting, this post is 8h old. I read all the top comments, many of them critical of Anubis.
I run a small website and don't have problems with bots. Of course I know what a DDOS is - maybe that's the only use case where something like Anubis would help, instead of the strictly server-side solution I deploy?
I use CrowdSec (it seems to work with caddy btw). It took a little setting up, but it does the job.
Am I missing something here? Why wouldn't that be enough? Why do I need to heckle my visitors?
Despite all that I still had a problem with bots knocking on my ports spamming my logs.
By the time Anubis gets to work, the knocking already happened so I don't really understand this argument.
If the system is set up to reject a certain type of requests, these are microsecond transactions of no (DDOS exception) harm.
Thanks for this! In going to set this up for myself.

