this post was submitted on 15 Mar 2025
414 points (98.6% liked)

Technology

66687 readers
4691 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
 
  • All analyzed AI chatbot apps collect some form of user data. The average number of collected types of data is 11 out of a possible 35 for the analyzed apps. 40% of the apps collect users' locations. Additionally, 30% of these apps track user data. Tracking refers to linking user or device data collected from the app with third-party data for targeted advertising or advertising measurement purposes or sharing it with a data broker.
  • Google Gemini collects the most information, gathering 22 out of 35 possible data types. This includes precise location data, which only Gemini, Copilot, and Perplexity collect. Gemini also collects a significant amount of data across various other categories, such as contact info (name, email address, phone number, etc.), user content, contacts (such as a list of contacts in the user’s phone), search history, browsing history, and several other types of data. This extensive data collection may be seen as excessive and intrusive by those concerned about data privacy and security.
  • ChatGPT collects 10 types of data, such as contact information, user content, identifiers, usage data, and diagnostics, while avoiding tracking data or using third-party advertising within the app. While ChatGPT collects chat history, it is possible to use temporary chats, which auto-delete all data after 30 days, or to request the removal of personal data from training sets. Overall, ChatGPT collects slightly fewer types of data than some other analyzed apps, but users should still review the privacy policy to understand how this data is used and protected.
  • Copilot, Poe, and Jasper are the three apps that collect data used to track you. This data could be sold to data brokers or used to display targeted advertisements in your app¹. While Copilot and Poe only collect device IDs, Jasper collects device IDs, product interaction data, advertising data, and other usage data, which refers to “any other data about user activity in the app”.
  • DeepSeek's data collection practices stand comfortably in the middle ground among other AI chatbot apps. DeepSeek collects 11 unique types of data, such as user input, including chat history, and claims to retain information for as long as necessary, storing it on servers located in the People's Republic of China.
  • Don't let your guard down, as chats stored on servers are always at risk of being breached. According to The Hacker News, DeepSeek has already experienced a breach where more than 1 million records of chat history, API keys, and other information were leaked. It is generally a good idea to be mindful of the information provided.
top 32 comments
sorted by: hot top controversial new old
[–] tiramichu@lemm.ee 71 points 2 days ago (1 children)

The one I self-host shares nothing with nobody :)

[–] Deceptichum@quokk.au 25 points 2 days ago

Yeah online services or apps are not worth it.

[–] GasMaskedLunatic@lemmy.dbzer0.com 44 points 2 days ago (3 children)

Only 30%? The only surprise here is that the number isn't higher.

[–] Telorand@reddthat.com 22 points 2 days ago

It only counts if you get caught, after all

[–] Ulrich@feddit.org 5 points 2 days ago

I'm surprised CoPilot isn't higher.

[–] unskilled5117@feddit.org 4 points 1 day ago

They just used the self reported labels on Apple‘s Appstore for this "study", who knows what a company "forgot" to put in there.

[–] TheGrandNagus@lemmy.world 33 points 2 days ago (2 children)

Running locally is the answer.

[–] 474D@lemmy.world 15 points 2 days ago (1 children)

And it's shockingly easy nowadays, can have one up and running in 5-10 minutes

[–] haulyard@lemmy.world 6 points 2 days ago (2 children)

Any recommendations? Preferably a docker image.

[–] yamper@lemmy.world 13 points 2 days ago (2 children)

you can find docker images for ollama and open-webui pretty much anywhere.

[–] haulyard@lemmy.world 4 points 2 days ago

Nice, thanks. Although my Synology may not be as happy. Will test them out.

[–] ChilledPeppers@lemmy.world 2 points 1 day ago

What ate the minimum specs? I doubt my pentium server will run it, but still good to know.

[–] 474D@lemmy.world 2 points 1 day ago

Personally I just use LMStudio, all you do is search for the model, and it downloads and installs it for you, ez as can be

[–] sum_yung_gai@lemm.ee 2 points 1 day ago (1 children)

What model do you run? Aren't the vram requirements pretty rough for self hosting?

[–] lemminator@lemmy.today 1 points 1 day ago

I'm not who you replied to but..... I've run a bunch of models through https://github.com/Mozilla-Ocho/llamafile and https://ollama.com/. I'm lucky enough to have a decent video-card (24GB), but if you are willing to be a bit more paitient, or run a lower model, you'll probably be able to get it going.

[–] TK420@lemmy.world 28 points 2 days ago (1 children)
[–] Daelsky@lemmy.ca 9 points 2 days ago (1 children)

Or if you need an AI, self host it.

[–] TK420@lemmy.world 19 points 2 days ago (1 children)

On that note; selfhost as much as possible and run FOSS on everything!

[–] Daelsky@lemmy.ca 8 points 2 days ago

For sure. Control your own data. Always.

The other 70% are just storing that data to sell at a later date when they need another income stream to give hungry VC investors.

[–] filister@lemmy.world 17 points 2 days ago (2 children)

Use Mistral, support European AI.

[–] DonPiano@feddit.org 8 points 1 day ago

Use none, support none

[–] rippersnapper@lemm.ee 1 points 1 day ago

You can also use ChatGPT without login.

[–] knighthawk0811@lemmy.ml 17 points 2 days ago
[–] abies_exarchia@lemm.ee 11 points 2 days ago

Actually surprised chatgpt is less than average

[–] setsubyou@lemmy.world 10 points 1 day ago

I’m not sure how much sense it makes to complain that an AI chat bot collects so many categories of data and then highlight “user input”, which it obviously needs to function? Like how is something like DeepSeek the “middle ground” if that’s what the author thinks is the biggest problem with it? When I look at DeepSeek on the app store, it does list at least “coarse location”, so why not highlight that? DeepSeek can’t answer my questions about e.g. “restaurants nearby”, unlike e.g. ChatGPT, which comes up with a map. So that’s what I would be interested in, what DeepSeek uses my location for.

Although just in principle this kind of analysis rarely finds surprises.

If you can enter text or click on things in an online app, obviously it collects user input.

If it can refer back to previous answers, obviously it retains chat history.

If it can process pictures, obviously it collects photos if you upload any.

If it can be interacted with using voice, obviously it collects audio.

If it can answer questions about things near you, obviously it will use location data.

If there are IAPs, it better not forget that you bought those, too.

And so on.

[–] WheelcharArtist@lemmy.world 9 points 2 days ago
[–] KoalaUnknown@lemmy.world 8 points 1 day ago

Leave it to google to be #1 in user data collection.

[–] CodingCarpenter@lemm.ee 3 points 2 days ago

Have fun reading all my "simplify this massive function" requests I guess?

[–] doodledup@lemmy.world 1 points 1 day ago* (last edited 1 day ago)

Not surprised.

[–] Ledericas@lemm.ee 1 points 1 day ago

make sense, gemini has been heavily pushed in its phones and other media recently

[–] WrenFeathers@lemmy.world 1 points 1 day ago

It’s a good thing I’ve never used any of them and have no intention of ever changing my mind on this.