this post was submitted on 25 Jun 2025
151 points (97.5% liked)
Technology
71866 readers
4343 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Because of the vast amount of data needed, there will be no competitive viable open source solution if half the data is kept in a walled garden.
This is about open weights vs closed weights.
They haven't dewalled the garden yet. The copyright infringement part of the case will continue.
I agree that we need open-source and emancipate ourselves. The main issue I see is: The entire approach doesn't work. I'd like to give the internet as an example. It's meant to be very open, connect everyone and enable them to share information freely. It is set up to be a level playing field... Now look what that leads to. Trillion dollar mega-corporations, privacy issues everywhere and big data silos. That's what the approach promotes. I agree with the goal. But in my opinion the approach will turn out to lead to less open source and more control by rich companies. And that's not what we want.
Plus nobody even opens the walled gardes. Last time I looked, Reddit wanted money for data. Other big platforms aren't open either. And there's kind of a small war going on with the scrapers and crawlers and anti-measures. So it's not as if it's open as of now.
A lot of our laws are indeed obsolete. I think the best solution would be to force copy left licenses on anything using public created data.
But I'll take the wild west we have now with no walls then any kind of copyright dystopia. Reddit did successfully sell it's data to Google for 60 million. Right now, you can legally scrape anything you want off reddit, it is an open garden in every sense of the word (even if they dont like it). It's a lot more legal then using pirated books, but Google still bet 60 million that copyright laws would swing broadly in their favor.
I think it's very foolhardy to even hint at a pro copyright stance right now. There is a very real chance of AI getting monopolized and this is how they will do it.
I agree a copyright dystopia wouldn't be any good. Just mind that wild west or law of the jungle is the "right of the strongest". You're advantaging big companies and disadvantaging smaller players or people with ethics or who are more open/transparent.
And I don't think legality with web scraping is the biggest issue. Sure I maybe could do it if it were possible. But I'm occasionally doing some weird stuff and most services have countermeasures in place. In reality I just can't scrape Reddit. Lot's of bots and crawlers just don't work any more. I'm getting rate limited left and right from all big platforms. Lots of things require an account these days, and services are quick banning me for "suspicious activity". It's barely possible to download Youtube videos these days. So, no. I can't. While Google can just pay for it and have the data.
Also Reddit isn't really the benevolent underdog here. They're a big company as well. And they're not selling their data... They're selling their user's data. They're mainly monetizing other people's creations.