this post was submitted on 09 Sep 2025
46 points (91.1% liked)

Ask Lemmy

34549 readers
1547 users here now

A Fediverse community for open-ended, thought provoking questions


Rules: (interactive)


1) Be nice and; have funDoxxing, trolling, sealioning, racism, and toxicity are not welcomed in AskLemmy. Remember what your mother said: if you can't say something nice, don't say anything at all. In addition, the site-wide Lemmy.world terms of service also apply here. Please familiarize yourself with them


2) All posts must end with a '?'This is sort of like Jeopardy. Please phrase all post titles in the form of a proper question ending with ?


3) No spamPlease do not flood the community with nonsense. Actual suspected spammers will be banned on site. No astroturfing.


4) NSFW is okay, within reasonJust remember to tag posts with either a content warning or a [NSFW] tag. Overtly sexual posts are not allowed, please direct them to either !asklemmyafterdark@lemmy.world or !asklemmynsfw@lemmynsfw.com. NSFW comments should be restricted to posts tagged [NSFW].


5) This is not a support community.
It is not a place for 'how do I?', type questions. If you have any questions regarding the site itself or would like to report a community, please direct them to Lemmy.world Support or email info@lemmy.world. For other questions check our partnered communities list, or use the search function.


6) No US Politics.
Please don't post about current US Politics. If you need to do this, try !politicaldiscussion@lemmy.world or !askusa@discuss.online


Reminder: The terms of service apply here too.

Partnered Communities:

Tech Support

No Stupid Questions

You Should Know

Reddit

Jokes

Ask Ouija


Logo design credit goes to: tubbadu


founded 2 years ago
MODERATORS
 

I feel like there are probably some ad based search engines which are privacy and service oriented, but in general even for those there remains a misalignment problem. Hence if I don’t want to be a product now or in the future, what good search engines are there that I can pay for?

you are viewing a single comment's thread
view the rest of the comments
[–] DaGeek247@fedia.io 8 points 5 days ago (1 children)

The issue is that the internet is too large to index.

It's really not. At least, not yet. It's a large part of why it isn't done, but it's not the only one, and I'd argue, not even the main reason it isn't really done.

A complete crawl with meta data of the internet in 2025 is only 424TiB. For comparison, my 1000$ home setup can handle about a tenth of that(in storage at least). The hardware to maintain a single database of the internet with metadata could cost under $100,000, easily.

Dave, your comment about it costing a billion to run Bing or Google might be true, but it is completely unrelated to the realities of running a small search engine and has everything to do with the fact that they are Google and Microsoft products respectively.

The real issue isn't the physical size of the internet, it's much more likely to be the complexity of making a search algorithm that can compete with the 75 billion seo market that wxists to break search engines.

[–] j4k3@piefed.world 2 points 5 days ago (1 children)

Original comment said in good faith, but from sketchy long term memory of stuff I've come across. It seems like it was in a Lex Friedman or similar podcast at some point, but from some time in the last 3-10 years. I may have conflated or misunderstood, as I am not experienced with such complexity. I seem to recall it coming up around the time several astronomers were speaking publicly about issues with processing large amounts of data and soliciting solutions. I just recall wondering why search started to suck around 2017, and putting the pieces together when I heard this. Now, in retrospect, it seems much of the changes were also adversarial for rival AI training after the Transformers paper. At least, looking at how search results are salted now, and the way images are selected for search is absolutely adversarial for AI training datasets... but that is all I know, and should be taken as friendly neighborhood water cooler talk, always with the best of intentions.

[–] DaGeek247@fedia.io 2 points 5 days ago

I think most startup search engines use Google/bing because it's free/way cheaper than running their own database, not because it's impossible. It also likely sidesteps a lot of the seo bullshit simply because Google/bing have more experience working around it

So like, short term/small size its cheaper and straight up easier to piggyback off of the big two companies, rather than manage your own data set. Long term, if you get popular enough to be noticed, I expect that the seo business would wreck any selfhosting search engine startup company's results pretty regularly.