Anubis is usually installed in such a case.
Selfhosted
A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.
Rules:
-
Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
-
No spam posting.
-
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.
-
Don't duplicate the full text of your blog or github here. Just post the link for folks to click.
-
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
-
No trolling.
Resources:
- selfh.st Newsletter and index of selfhosted software and apps
- awesome-selfhosted software
- awesome-sysadmin resources
- Self-Hosted Podcast from Jupiter Broadcasting
Any issues on the community? Report it using the report flag.
Questions? DM the mods!
I need to look into it, thanks!
Are you using anything to defend against bots?
I have nothing against bots per se, they help to spread the word about my open source code which I want to share with others.
It's just unfortunate that forgejo fills up the hard drive to such an extend and doesn't quite let you disable this archive feature.
You should limit the amount of storage available to a single service.
Also, set up Anubis or restrict access
Yeah, I really need to figure out how to do quotas per service.
If you have a Linux server, you can try partitioning your drive using LVM. You can prevent services from consuming all disk space by giving each one their own logical volume.
I already have LVM but I was using it to combine drives. But it's not a bad idea, if I can't do it with Docker, at least that would be a different solution.
Yeah, I put not protection in front of mine, aftering noticing bots were scanning code and did grab emails. Using Anibus for now, still looking at other alternatives.
I've searched the docs a bit and found this setting: https://forgejo.org/docs/latest/admin/config-cheat-sheet/#quota-subjects-list
It seems to be partially for your case, though I don't see artifacts, but you could limit all of forgejo to like 5GB and probably be good.
Hm, I'm afraid none of them really seems to cover the repo-archives case, therefor I'm afraid the size:all doesn't include the repo-archives either.
But I'm running it in a container, perhaps I can limit the size the container gets assigned.
It kinda seems like it. Docker apparently does have this functionality as seen here: https://stackoverflow.com/questions/40494536/how-to-specify-the-size-of-a-shared-docker-volume/40499023#40499023
You could try limiting it to 5 GB using the forgejo settings and 7GB using docker and then just look, how big it is.
Hm, but this only works on tmpfs which is in memory. It seems that with XFS I could have done it too: https://fabianlee.org/2020/01/13/linux-using-xfs-project-quotas-to-limit-capacity-within-a-subdirectory/ but I used ext4 out of habit.
i "fixed" the problem of those fucking bots by blocking everyone except my country
Sadly that's not the solution to my problem. The whole point op open-sourcing for me is to make it accessible to as many people as possible.
Does it not require an account for that? I would open a feature request if it doesn't, else it creates a denial-of-service attack.
It does not, because that feature is usually used for scripts to download some specific release archive, etc. and other git hosting solutions do the same.
A few days late, but I have a pretty similar usecase to you on https://forgejo.ellis.link/. My solution is go-away, https://git.gammaspectra.live/git/go-away, which just sits as a reverse proxy in between traefik and Forgejo. I haven't enabled fancy stuff like TLS fingerprinting. It's been effective enough at killing the bots downloading archives and DDoSing the server from residential IPs. My config is based on the example Forgejo config, but with a few tweaks. Too long to post here, though, so message me if you need access
For now I feel disabling archives and my simple list of bots to drop in Nginx seems to work very well, it doesn't create the archives anymore and the load went down also on the server.
For now I asked chatgtp to help me to implement a simple return 403 on bot user agent. I looked into my logs and collected the bot names which I saw. I know it won't hold forever but for now it's quite nice, I just added this file to /etc/nginx/conf.d/block_bots.conf and it gets run before all the vhosts and rejects all bots. The rest just goes normally to the vhosts. This way I don't need to implement it in each vhost seperatelly.
➜ jeena@Abraham conf.d cat block_bots.conf
# /etc/nginx/conf.d/block_bots.conf
# 1️⃣ Map user agents to $bad_bot
map $http_user_agent $bad_bot {
default 0;
~*SemrushBot 1;
~*AhrefsBot 1;
~*PetalBot 1;
~*YisouSpider 1;
~*Amazonbot 1;
~*VelenPublicWebCrawler 1;
~*DataForSeoBot 1;
~*Expanse,\ a\ Palo\ Alto\ Networks\ company 1;
~*BacklinksExtendedBot 1;
~*ClaudeBot 1;
~*OAI-SearchBot 1;
~*GPTBot 1;
~*meta-externalagent 1;
}
# 2️⃣ Global default server to block bad bots
server {
listen 80 default_server;
listen [::]:80 default_server;
listen 443 ssl default_server;
listen [::]:443 ssl default_server;
# dummy SSL cert for HTTPS
ssl_certificate /etc/ssl/certs/ssl-cert-snakeoil.pem;
ssl_certificate_key /etc/ssl/private/ssl-cert-snakeoil.key;
# block bad bots
if ($bad_bot) {
return 403;
}
# close connection for anything else hitting default server
return 444;
}
I used cloudfares captcha equivalent and my bots dropped to zero
But then how do people who search for code like yours find your open source code if not though a search engine which uses a indexing not?
Cloudflare usually blocks 'unknown' bots, which are basically bots that aren't search crawlers. Also I've got Cloudflare setup to challenge requests for .zip, .tar.gz, or .bundle files, so that it doesn't affect anyone unless they download from their browser.
There's also probably a way to configure something similar in Anubis, if you don't like a middleman snooping your requests.
Script for monitoring disk space in Linux
The script below is designed to monitor disk space usage on a specified server partition. Configurable parameters include the maximum allowable percentage of disk space usage (MAX
), the e-mail address to receive alerts (EMAIL
) and the target partition (PARTITION
).
The script uses the df command to collect disk usage information and sends email alerts if the current usage exceeds the specified threshold
#!/bin/bash
# Script: ./df_guard.sh [config_file]
# Set the maximum allowed disk space usage percentage
MAX=90
# Set the email address to receive alerts
EMAIL=user@example.com
# Set the partition to monitor (change accordingly, e.g., /dev/sda1)
PARTITION=/dev/sda1
# Get the current disk usage percentage and related information
USAGE_INFO=$(df -h "$PARTITION" | awk 'NR==2 {print $5, $1, $2, $3, $4}' | tr '\n' ' ')
USAGE=$(echo "$USAGE_INFO" | awk '{print int($1)}') # Remove the percentage sign
if [ "$USAGE" -gt "$MAX" ]; then
# Send an email alert with detailed disk usage information
echo -e "Warning: Disk space usage on $PARTITION is $USAGE%.\n\nDisk Usage Information:\n$USAGE_INFO" | \
mail -s "Disk Space Alert on $HOSTNAME" "$EMAIL"
fi
Installation
sudo install -m 0755 df_guard.sh /usr/local/bin/df_guard.sh
Make the script executable:
sudo chmod +x /usr/local/bin/df_guard.sh
Launch examples
- Every 15 minutes.
In crontab (root)
*/15 * * * * * /usr/local/bin/df_guard.sh
I have monitoring of it, but it happened during night when I was sleeping.
Actually I saw a lot of forgejo action on the server yesterday but didn't think it would go so fast.