this post was submitted on 15 Dec 2025
717 points (98.5% liked)

Technology

77090 readers
4318 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] Grimy@lemmy.world 1 points 2 days ago (1 children)

That being said, sabotaging all future endeavors would likely just result in a soft monopoly for the current players, who are already in a position to cherry pick what they add. I wouldn't be surprised if certain companies are already poisoning the well to stop their competitors tbh.

[–] supersquirrel@sopuli.xyz 9 points 2 days ago* (last edited 2 days ago) (1 children)

In the realm of LLMs sabotage is multilayered, multidimensional and not something that can easily be identified quickly in a dataset. There will be no easy place to draw some line of "data is contaminated after this point and only established AIs are now trustable" as every dataset is going to require continual updating to stay relevant.

I am not suggesting we need to sabotage all future endeavors for creating valid datasets for LLMs either, far from it, I am saying sabotage the ones that are stealing and using things you have made and written without your consent.

[–] Grimy@lemmy.world 3 points 2 days ago* (last edited 2 days ago) (1 children)

I just think the big players aren't touching personal blogs or social media anymore and only use specific vetted sources, or have other strategies in place to counter it. Anthropic is the one that told everyone how to do it, I can't imagine them doing that if it could affect them.

[–] supersquirrel@sopuli.xyz 5 points 2 days ago* (last edited 2 days ago)

Sure, but personal blogs, esoteric smaller websites and social media are where all the actual valuable information and human interaction happens and despite the awful reputation of them it is in fact traditional news media and associated websites/sources that have never been less trustable or useless despite the large role they still play.

If companies fail to integrate the actual valuable parts to the internet in their scraping, the product they create will fail to be valuable past a certain point shrugs. If you cut out the periphery of the internet paradoxically what you accomplish is to cut out the essential core out of the internet.