this post was submitted on 06 Nov 2025
57 points (100.0% liked)

Ask Lemmy

35407 readers
1525 users here now

A Fediverse community for open-ended, thought provoking questions


Rules: (interactive)


1) Be nice and; have funDoxxing, trolling, sealioning, racism, and toxicity are not welcomed in AskLemmy. Remember what your mother said: if you can't say something nice, don't say anything at all. In addition, the site-wide Lemmy.world terms of service also apply here. Please familiarize yourself with them


2) All posts must end with a '?'This is sort of like Jeopardy. Please phrase all post titles in the form of a proper question ending with ?


3) No spamPlease do not flood the community with nonsense. Actual suspected spammers will be banned on site. No astroturfing.


4) NSFW is okay, within reasonJust remember to tag posts with either a content warning or a [NSFW] tag. Overtly sexual posts are not allowed, please direct them to either !asklemmyafterdark@lemmy.world or !asklemmynsfw@lemmynsfw.com. NSFW comments should be restricted to posts tagged [NSFW].


5) This is not a support community.
It is not a place for 'how do I?', type questions. If you have any questions regarding the site itself or would like to report a community, please direct them to Lemmy.world Support or email info@lemmy.world. For other questions check our partnered communities list, or use the search function.


6) No US Politics.
Please don't post about current US Politics. If you need to do this, try !politicaldiscussion@lemmy.world or !askusa@discuss.online


Reminder: The terms of service apply here too.

Partnered Communities:

Tech Support

No Stupid Questions

You Should Know

Reddit

Jokes

Ask Ouija


Logo design credit goes to: tubbadu


founded 2 years ago
MODERATORS
 

Choices have slowly been running out when it comes to effective search engines. It seems inevitable an open source search engine project independent from big tech will be needed.

Some of my own tricks are:

  • Use the blacklist plugin to block sites from search.
  • Search for forum sites and communities instead of specific queries. (Wikipedia has a list of forums that might be useful)
  • For technical questions favor Q&A websites like stack exchange.
  • YouTube videos often offer better information than results from search engines. (Use search engines instead of YT search)
  • Look for blogs and journals that specialize in the topic you're searching for.
  • Use boolean search when possible.
  • Self-host and customize your own metadata search engine. Create a graph network linking websites based on subject/topic. You may not be able to query specific questions but you can discover sites that you otherwise can't in traditonal search. This is a great way to discover hidden gems! (Example: https://internet-map.net/)
  • (Difficult) Self-host and scrape sites across the web in order to create your own query-able database. This would be the most effective way to search the internet and would be completely independent from potential enshittification and censorship. The cost however is quite high both in term of hardware and time. Kiwix offers a way to download websites for offline use. (Ex: Wikipedia, Stack exchange). This is a good starting point to build your own custom search engine.

I would love to hear the tips and tricks you use! I hope this post helps others in more efficiently finding information on the internet!

top 20 comments
sorted by: hot top controversial new old
[–] the_abecedarian@piefed.social 9 points 19 hours ago

Use resources available through your library's website

[–] paequ2@lemmy.today 6 points 18 hours ago (3 children)
[–] Kissaki@feddit.org 9 points 18 hours ago

Help, I'm in a loop - between this ask Lemmy and this comment

[–] INHALE_VEGETABLES@aussie.zone 4 points 13 hours ago

You know how unhinged our grasp on reality is, right?

[–] remon@ani.social 1 points 13 hours ago

Well, OP just said "efficiently" ... nothing about the quality. So you are technically correct.

Searxng with brave, duckduckgo, google, mullvadleta, mullvadleta brave and qwant as the search engines. Law of big numbers makes it quite useful.

[–] WhatsHerBucket@lemmy.world 5 points 18 hours ago (1 children)

There are some paid options that are pretty good (I’m thinking Kagi).

Easy, but one obvious downside.

[–] porcoesphino@mander.xyz 1 points 14 hours ago* (last edited 14 hours ago) (3 children)

Does Kagi let you add a domain to a denylist (like a new well SEOed site thats genAI with inaccuracies you've noticed), or positively bias search results (like saying you know you want Wikipedia entries high in the list)?

[–] Tywele@lemmy.dbzer0.com 6 points 13 hours ago* (last edited 13 hours ago)

It does. You can outright block domains, rank them higher or lower and I think even pin them to the top.

[–] baggachipz@sh.itjust.works 3 points 12 hours ago

It’s one of their best features. No ads being the best, since that also means you get real results and no “sponsored” bullshit. They also have ai slop filters.

[–] evasive_chimpanzee@lemmy.world 2 points 8 hours ago

For whatever reason, wikipedia seems to have been really pushed down the page on search engines specifically for medical information. It's a shame because I can acquire the surface level of information (which is all i really ever need) way faster from wikipedia than the other sites that come to the top of the list (mayo clinic, John's Hopkins, Cleveland clinic, govt sites).

I really shouldn't complain about it too much, cause they could be pushing pseudoscience blogs.

[–] e0qdk@reddthat.com 5 points 17 hours ago

If you're interested in building a new general purpose search engine, it probably makes the most sense to start with Common Crawl's data set and augment it rather than starting from scratch.

[–] Tollana1234567@lemmy.today 4 points 17 hours ago* (last edited 17 hours ago)

for research, stem, look for sites like researchgate, and others for peer reviewed papers. articles, magazines, blogs are not good sources unless they are citing said research paper that links you to the proper site, and important to not put it out of context which might lull people into pseudoscience beliefs. some people jump the gun on these sites which are basically articles, often using dumbed down wording. universities/colleges often have access to most if not the full library of papers, that usually are behind paywalls of publishers, if yuo somehow can get acces to those go for it.

[–] Truscape@lemmy.blahaj.zone 4 points 16 hours ago* (last edited 16 hours ago)

Utilizing books from a shadow library like Anna's archive (you can use Wikipedia to find the right domains), you can read prior written material for academic subjects, relevant books on various subjects from the pre-internet area, and so forth. Some users from newer fields (such as 3d printing/CAD) are going as far as to upload their PDF works onto Anna's for distributed access.

[–] theneverfox@pawb.social 3 points 8 hours ago

Go back to 2022 and run your search then

[–] porcoesphino@mander.xyz 2 points 14 hours ago (2 children)

Deny list plugins!?? I'd been looking for a search engine with that built it. It seems so obvious. I didn't even think to look up a plugin. I had been writing keyword searches for browsers that manually added the query params for particularly frustrating results.

[–] Tywele@lemmy.dbzer0.com 3 points 13 hours ago

Kagi has that feature built in though it is a paid search engine.

[–] evasive_chimpanzee@lemmy.world 3 points 9 hours ago* (last edited 8 hours ago)

Just found uBlacklist.

Now to find something for whitelist searches (basically I only ever want recipes or medical information from a small list of sites).

Edit: duckduckgo has the capability built in, too

[–] ieatpwns@lemmy.world 1 points 7 hours ago

I know you came here for answers but how would one start making their own metadata search engine you got any guides to point me towards? I hate google so much I’m willing to learn to make my own search engine

[–] bluemoon@piefed.social 1 points 14 hours ago* (last edited 13 hours ago)
  • StartPage, Mojeek, SearXNG, YaCy
  • hyperlink surfing "extranets", as you would WikiMedia WikiPedia InternetArchive FediVerse posts etc.
  • webscrapers like Monolith etc. for offline PIR and just as you say convenience of having it all there

i look forward to reading what you come up with, because i am still kinda at the theoretical stage with keeping such a knowledgebase.

edit: i keep thinking a plaintext document of information is way simpler to deal with than webpages. at what point is information posted online preserved in it's "original" form? just dumping this FediThread into a plaintext file or a folder of plaintext files with names being 'hierarchy•postID•username' or something so it is presented self-organized.

OP is ¤, 1st rank comments are ¤a ¤b ¤c and 2nd rank comments attached to comment ¤a are ¤a-a ¤a-b ¤a-c and 3rd rank comments attached to ¤a-c are ¤a-c-a ¤a-c-b ¤a-c-c so on. this then lists itself in a self-organized way, given all ASCII & unicode characters are provided in order. not just a-Z... because that would limit size of posts to take on.

ofcourse more difficult and complicated solutions like selfhosting webservers and managing ports and databases exist... not that i grasp the necessity for so many services.