this post was submitted on 14 Mar 2025

305 points (95.3% liked)

Mildly Infuriating

40814 readers

1122 users here now

Home to all things "Mildly Infuriating" Not infuriating, not enraging. Mildly Infuriating. All posts should reflect that.

I want my day mildly ruined, not completely ruined. Please remember to refrain from reposting old content. If you post a post from reddit it is good practice to include a link and credit the OP. I'm not about stealing content!

It's just good to get something in this website for casual viewing whilst refreshing original content is added overtime.

Rules:

1. Be Respectful

Refrain from using harmful language pertaining to a protected characteristic: e.g. race, gender, sexuality, disability or religion.

Refrain from being argumentative when responding or commenting to posts/replies. Personal attacks are not welcome here.

...

2. No Illegal Content

Content that violates the law. Any post/comment found to be in breach of common law will be removed and given to the authorities if required.

That means: -No promoting violence/threats against any individuals

-No CSA content or Revenge Porn

-No sharing private/personal information (Doxxing)

...

3. No Spam

Posting the same post, no matter the intent is against the rules.

-If you have posted content, please refrain from re-posting said content within this community.

-Do not spam posts with intent to harass, annoy, bully, advertise, scam or harm this community.

-No posting Scams/Advertisements/Phishing Links/IP Grabbers

-No Bots, Bots will be banned from the community.

...

4. No Porn/Explicit

Content

-Do not post explicit content. Lemmy.World is not the instance for NSFW content.

-Do not post Gore or Shock Content.

...

5. No Enciting Harassment,

Brigading, Doxxing or Witch Hunts

-Do not Brigade other Communities

-No calls to action against other communities/users within Lemmy or outside of Lemmy.

-No Witch Hunts against users/communities.

-No content that harasses members within or outside of the community.

...

6. NSFW should be behind NSFW tags.

-Content that is NSFW should be behind NSFW tags.

-Content that might be distressing should be kept behind NSFW tags.

...

7. Content should match the theme of this community.

-Content should be Mildly infuriating.

-The Community !actuallyinfuriating has been born so that's where you should post the big stuff.

...

8. Reposting of Reddit content is permitted, try to credit the OC.

-Please consider crediting the OC when reposting content. A name of the user or a link to the original post is sufficient.

...

Also check out:

Partnered Communities:

1.Lemmy Review

2.Lemmy Be Wholesome

3.Lemmy Shitpost

4.No Stupid Questions

5.You Should Know

6.Credible Defense

Reach out to LillianVS for inclusion on the sidebar.

All communities included on the sidebar are to be made in compliance with the instance rules.

founded 2 years ago

MODERATORS

STRIKINGdebate2@lemmy.world

LillianVS@lemmy.world

Tenthrow@lemmy.world

305

Oops, something went wrong! (i.imgflip.com)

submitted 3 months ago* (last edited 3 months ago) by perishthethought@lemm.ee to c/mildlyinfuriating@lemmy.world

103 comments fedilink hide all child comments

This is a rant about how so many apps on many different platforms (TVs, mobile devices, computers, etc...) have decided to not actually show detailed errors any more. Instead, we get something along the lines of:

Oops, somehting went wrong. Please try again later

.... and then, well, we get to figure out what just happened and what in the world we need to do about it. And good luck with that, since you have no idea what just failed.

Why software developers?!? Why have you forsaken us?

EDIT 24 hours later: I feel like I need to clarify a few things:

I've worked for 8 software companies over 30+ years. I know why putting a DB error into the message users see is a bad idea. I know that makes me uncommon, but I still want more info from these messages.

You all are answering as if there are only two ways this can work: (a) what we have now (which is useless), and (b) a detailed error listing showing a full stack trace. I think the developers could meet me half-way.

What I want is either (a) "Something went wrong on the server, you can't fix it, but we will" or (b) "Something on your end didn't work. Check your network or restart the app or do something differently and then try the same thing again". And if they're blocking me because I'm using a VPN, fucking say so (but that's a whole separate thing...)

Some apps do provide enough info so I have a clue what I should do next, and I appreciate the effort they put into helping me. I think what I am really ranting about is I want more developers to take the time to do this instead of reporting all errors with "Oops, try again". (If the error is in their server, why should I try again?) Give me a hint as to the problem, so I have something to go on.

Cheers y'all. Still love you my techy brothers and sisters.

you are viewing a single comment's thread
view the rest of the comments

[–] hperrin@lemmy.ca 19 points 3 months ago* (last edited 3 months ago) (2 children)

What are you planning to do with information about the error? It’s not like these places have customer support. Usually it’s something like a caching layer failing, and there’s literally nothing you can do about that.

Edit after reading your edit:

I still don’t see why you want more information here. Those kinds of errors are almost always server side errors, and reporting more detailed information won’t help you.

You asked “why should I try again?” Answering this would almost always be unhelpful to the vast majority of users. “Try again later, because one of our cache layer hosts was down, and by the time you try again, it’ll have been taken out of the load balancer rotation, so you likely won’t hit that host on your next try.”

It would also cause more confusion with a non-insignificant portion of your users. Users start to misunderstand copy when sentences start to exceed eight words. “Something went wrong. Try again later.” That’s understandable by 100% of people according to that study.

Even saying “there’s nothing you can do about it” will probably be taken negatively by a certain portion of users. Not saying it’s incorrect, just that when you write an error message (or any microcopy for that matter), you should avoid sounding negative to the user. “Something went wrong. Try again later” conveys all the information the user needs in a way that won’t be misinterpreted.

[–] Nouveau_Burnswick@lemmy.world 15 points 3 months ago (2 children)

If it's an error code I've worked around before, apply same troubleshooting.

If its a new errror code, search the error code to see how other people solved it.

If no one else has solved the error code, try analogous troubleshooting, post results online with the error code name, successful or not.

[–] perishthethought@lemm.ee 3 points 3 months ago (1 children)

I agree with Nouveau_Brunswick here.

And to add: to @hperrin@lemmy.ca , are you not also a user of software and do you not see room for improvement in many apps? That's where I am rn: I just want them to try harder to communicate a tiny bit more info when things go so wrong that a message has to be displayed on my screen. Telling me "There's nothing you can do to fix the problem" would be a big help, for instance. Make sense?

[–] hperrin@lemmy.ca 5 points 3 months ago (1 children)

I am a developer of software. I can guarantee you that what you’re asking for would make my job harder, because I’ve done it, and it has made my job harder. If an error is transient (like, a caching layer error, a db connection error, an external API error, an endpoint connectivity error, etc), giving the user an error code will make it more likely that they’ll file a useless bug report or support ticket. The errors are all logged internally, and we can see when there is a spike in the error count. There’s no reason to give the user an error code, because there’s nothing helpful that the user can do with it, and there’s a lot of unhelpful things a user can do with it.

There are times where a message to the user is appropriate, like if they made a mistake with their input. But there are so many things that could go wrong that the user can’t do anything about. You’re not going to work around your DB shard going down, and a replica will replace it in a few seconds anyway, so giving you an error code does more harm than good. Telling you to try again later is exactly what I would tell you if you filed a support ticket. I don’t want to deal with useless support tickets, and you don’t want to deal with useless error messages.

Modern software stacks are big, complex systems with lots of failure points. We monitor them, and we can tell when you see these errors. If we chose to not show you a specific error code/message, there’s almost definitely a good reason.

[–] Cryophilia@lemmy.world -2 points 3 months ago (1 children)

So what you're saying is that your code is garbage and you're hiding it from users because it's too much work to fix it.

[–] hperrin@lemmy.ca 3 points 3 months ago* (last edited 3 months ago) (1 children)

What I’m saying is that error messages can be helpful or harmful. Knowing that and how to tell the difference is what makes you an expert. Just firing off any information to the user without thinking about it is what makes you a novice, and will eventually get you fired. We’re talking about systems with millions of daily users. If you cause 2,000 unnecessary support tickets or forum posts every day because you don’t know when to send what information to the user, you won’t get very far in tech.

[–] Cryophilia@lemmy.world 1 points 3 months ago (1 children)

If you have 2000 daily people getting error messages, your code is garbage rofl

And if your company would rather you avoid those tickets by not giving out error codes, your company is also garbage. Which to be fair, is a lot of tech companies.

[–] hperrin@lemmy.ca 1 points 3 months ago* (last edited 3 months ago) (1 children)

I feel like you really don’t understand how big tech works. There’s not some single server running every service perfectly. There are tons of different layers and services running on thousands or hundreds of thousands of hosts.

Let’s say you make a request to something like Facebook. Say you’re liking a post. Here’s what happens:

That request goes in through a PoP (point of presence). These are sometimes called edge servers or edge gateways, but at Facebook we called them PoPs. This is a server that’s physically close to you that’s used to terminate the TLS connection. It doesn’t have any user data. Its job is to take your encrypted request, decrypt it, then pass it on to Facebook’s regional data center on their internal network.

The request enters a webby. These are usually called frontend servers, but again, at Facebook we called them webbies. This is a server that runs the monolithic Facebook web app. Again, it doesn’t have any user data. Its job is to take your request and orchestrate actions on deeper services to fulfill that request.

First it’s going to check a local memory cache server for sitevars. These control system level switches, like AB tests, and whether certain services are brought down. That server returns the sitevars and the webby proceeds, now knowing which logic paths to take.

For a like, which is a write request between your user account and a post, it will create two DB entries (you likes post, post liked by you). It needs to first get the data from the caching layer, so it will make two requests to TAO (Facebook’s caching layer), one for your account, and one for the post.

TAO runs in the same regional data center, and if it doesn’t have the two data objects cached, it will request them from the regional db shards.

These regional db shards also run in the same data center, and they’ll return the data.

TAO returns the data back to the webby.

The webby (after doing some permission checks, which probably hit TAO again) now creates the two relationships, likes and liked by, referencing the two data objects, you and the post. TAO is a write-through cache, so the webby sends the writes to TAO.

TAO now needs to send the requests to the db primary shards, since they are the only ones that can handle writes. Your primary shard and the post’s primary shard are probably in different data centers, so TAO now passes the writes to the regional data centers for each primary shard.

A host running TAO in each regional data center for each primary shard now passes the write to each shard.

Each primary shard now writes the data to the local disk, and waits for the binary log to be written to the local journal before returning a success message.

The success message is passed from the local TAO host back to the original region’s TAO host.

When that TAO host gets both requests back successfully, it returns a success back to the webby handling your request.

The webby then returns a success to the PoP you’re still connected to.

The PoP then returns a success to the client running on your device.

The client doesn’t notify you of anything, because it already showed you a filled in like button right after you pressed it.

This was how it worked back in 2013 when I worked there. It probably hasn’t changed a whole lot, but this is also an extremely simplified overview (I didn’t even touch on any load balancing systems). That request will probably hit hundreds of services. Some of them can fail and the request could still succeed. But some are required to succeed for your request to be considered successful, like the db write operations. Something like a hardware failure on your primary db shard’s disk can’t be overcome with better code. Nor can a lightning strike taking out the cable connecting your PoP be overcome with better code.

These systems are absolutely massive, and there are failures you wouldn’t even think of. When I worked at FB, we had an entire data center go down because the humidity got just high enough that the capacitors in each hosts’ power supplies all failed in a matter of a few minutes. Thousands of users probably got error messages that day, but the automatic failover systems moved all the traffic to a new region and promoted new primary db shards within about ten minutes. The fact that losing an entire data center was mitigated in about ten minutes is actually really impressive. You might think it’s still garbage code, since users got error messages, but I know enough about these systems to be very impressed by that.

If you know a better way to make a system like this that works for billions of users across the planet, you should write a paper and submit it to a local conference. If they approve you for a talk, you can present your designs to an audience there. If the audience is really receptive, your designs could make a big impact in the tech sector. That’s basically what the highest level engineers at these big tech companies do when they design these multi-billion user systems, so it’s definitely possible for you to do it too.

[–] Cryophilia@lemmy.world 1 points 3 months ago (1 children)

All I'm saying is that the vast majority of "oops" issues happen before step one. Client-side issues. For those, give an error code. All the stuff you talked about, there's little to nothing users can do. And yeah, it could definitely be done better, but it would require abandoning the "ooh shiny new thing" mentality of tech companies. Updates just to boost resumes, deprecation of anything user friendly. It's an endemic cultural problem.

[–] hperrin@lemmy.ca 1 points 3 months ago (1 children)

Why do you think the vast majority of these messages come from client side issues? I worked as a Site Reliability Engineer at Facebook. We had data on client side errors too. Crash logs are sent to the servers when a client side error happens. There’s not really one source that constitutes a “vast majority” of these error messages, but I can tell you that the plurality of them come from the caching layer.

[–] Cryophilia@lemmy.world 1 points 3 months ago (1 children)

Crash logs are sent to the servers when a client side error happens

How? I mean how is it possible to send a crash log to the server when the problem is that the client can't connect to the server?

You by definition don't see most of these errors except maybe as pings, depending on why it can't connect

[–] hperrin@lemmy.ca 1 points 3 months ago* (last edited 3 months ago)

If it’s a mobile app, the operating system handles crash logs, and reports them to you through your app management portal. Then for connection issues to the host or handled errors, you can store that in your app’s data store, and upload them once connection is restored.

If it’s a web app, you can save them in local storage through your service worker, then upload them once the connection is restored. If you don’t have a high level error handling function on your web app, that’s an issue with your web app, not your logging infrastructure.

For a network outage error, these aren’t usually reported if the problem is on the client side, since that’s not something we can do anything about. Both mobile apps and service workers can tell if the operating system is disconnected from the network. If it’s an issue connecting to our host (host is unreachable, but network is online), that’s when we’d save the issue and log it later once service is restored.

We can tell when our services go offline, because we have health checks on our hosts. So, technically, we don’t need client side reporting if our hosts are down. But, every place I’ve worked at has had them anyway.

Analytics don’t usually run on the same hosts as services, so if your service goes down, that doesn’t mean your analytics platform is down. I mentioned before how many systems there are in big tech services. Analytics is one of those systems. It’s generally completely separate from user facing services.

[–] hperrin@lemmy.ca 1 points 3 months ago

These kinds of error messages are almost exclusively used for transient errors. You aren’t going to work around a transient error. The best thing you can do (the only thing you can do, really) is to try again later, hence, the message. It’s not helpful to show you a message like “cache-1234.example.com failed to respond within 300 milliseconds”. What are you going to do about that? By the time you submit a support ticket, that host has already been brought back up automatically. So now you’ve just wasted your time and the support staff’s time. The engineers already have a log of that error and a log of whatever error brought down that host, so you’re not telling them anything new by making a support ticket.

[–] unhrpetby@sh.itjust.works 1 points 3 months ago (1 children)

By nature of software consisting of a client and a server, there are certainly errors that can be bypassed on the client side.

Server side software does not mean "there is literally no errors that are dependent on client input." That's ridiculous to think, but pervasive in this comment section it seems.

[–] hperrin@lemmy.ca 2 points 3 months ago (1 children)

I don’t know why you think what I said means that. These error messages are never used on data validation issues. At least, I’ve never seen a data validation issue return an error like this, and I would never write an error like this for a data validation issue.

These messages come from 500-series errors. Usually caching layer errors, load balancer layer errors, edge termination layer errors, or db layer errors. In other words, there was probably nothing wrong with the request, it just couldn’t be fulfilled successfully, hence the “try again later” part in a lot of these messages.

[–] unhrpetby@sh.itjust.works 1 points 3 months ago* (last edited 3 months ago) (1 children)

These error messages are never (sic) used on data validation issues.

You are incorrect. I have had issues that were exactly that. Such as a password that was failing to be accepted and then giving generic error responses, which I then had to trial-and-error brute force to find which part of my password they weren't allowing on the backend.

You stance might become easier to defend if you avoid absolutes.

[–] hperrin@lemmy.ca 1 points 3 months ago* (last edited 3 months ago) (1 children)

Read the next sentence.

It sounds like your problem is not with these errors in general, but with specific software that uses generic messages when not appropriate.

[–] unhrpetby@sh.itjust.works 1 points 3 months ago* (last edited 3 months ago) (1 children)

The error is unnecessarily vague.

If the message is supposed to mean "There is an internal error that is of little use to you, so you can only wait while we fix it. Try again in 10 minutes." Then say that. That tells me a developer made a conscious decision to classify the failure mode as one which I cannot fix. They are explaining to you what type of error they perceive it to be.

Instead we have "Something went wrong. Try again later." which doesn't say that directly. This could just be them designing their systems as though every user is incompetent, and denying you the information to fix the issue yourself.

You wouldn't know, because it doesn't just tell you directly.

[–] hperrin@lemmy.ca 1 points 3 months ago (1 children)

It is intentionally and, I would argue, necessarily vague.

First, there is no time frame for these kinds of errors. If it’s just a cache host that’s down, you could retry right now and the load balancer would probably have taken that host out of rotation already. If it’s a primary db that’s down, that may take 5 minutes. If there’s no replica to promote, it might take 30 minutes. If the whole db layer is down, it might take an hour or two. If an entire release needs to be rolled back, it might take a couple hours. There are just too many scenarios and too many variables to give a useful time frame.

Second, you might appreciate an error message like that, but these error messages aren’t written for you and they’re usually not even written by developers. They’re written by designers and translated into many languages. They need to be concise, easily understood, and not easily construed as derogatory or malicious in any language. They are written for the broadest audience. You are not the broadest audience.

Third, we have to design systems as if every user is incompetent and/or malicious, because many of them are. Let me give you an example. I once got an email from another engineer using an internal system my team wrote. He said, “hey I’m getting this error, can you help?” He attached a screenshot showing an error message that read, “Your auth token has expired. Please refresh the page.” He was a senior engineer.

Fourth, and I cannot stress this enough, there is almost always nothing you can do when you hit an error like this. Any information given to you for the vast majority of these kinds of errors would be entirely useless to you. You cannot promote a db shard yourself. You cannot bring up a cache host yourself. You cannot take a host out of load balancer rotation yourself. The only reason this information could possibly benefit you is to satisfy your curiosity.

[–] unhrpetby@sh.itjust.works 1 points 3 months ago* (last edited 3 months ago) (1 children)

There is no time frame for these kinds of errors

If I was are able to isolate the issue to, for example, expired certs, I could absolutely give you a ballpark answer on how long it should take/when it might be back up. It doesn't need to be very precise, but I have accessed websites only to be shown an error with zero idea whether this is a multi-day event or something I can wait five minutes and it be fixed.

...they are written by designers...

Cooperation with a developer would help here.

They are written for the broadest audience

If you write only for a child, your usefulness ceiling is that of what a child could understand. You could have your obvious boilerplate message, and then under that provide more information.

...not easily construed as derogatory or malicious in any language.

I feel as if this is a simple problem to avoid.

We have to design systems as if every user is incompetent...

See the bottom of this post

there is almost nothing you can do when you hit an error like this.

If the company believes so, then write that part in. Otherwise, it isn't stated that such is the case. It would be one more sentence on the boilerplate section.

Overall this has to do with what you are optimizing for. Its clear to me that many businesses believe useless boilerplate error messages are most cost effective. If you want to be most cost-effective, then cutting corners on the error messages likely saves time with few financial downsides. But It doesn't have to be this way.

Designing systems for the lowest person on the totem poll isn't without downsides. I have used Linux systems that made the bootup hide all log messages. This means that people that can actually fix a broken system using the logs, are going to have a harder time, as you just hid away all the moving parts and complexity from the end user. Some machines I wouldn't have been able to fix were it not for the detailed logs.

Or we could talk about privacy. Nearly everyone can use a computer. Great right!? But how many people actually understand the privacy implications of using a machine that is controlled by a closed source corporation. Of entering load of data into that machine? Very few.

You can design a system for idiots. But you don't have to. There are things in life that have prerequisites. If someone comes over to my computer and asks "What's that" on a kernel log output, I'll ask them, "Do you know what a kernel is". If they don't, then I will tell them not to worry about it. My explanations are not for everyone. Neither are my software.

[–] hperrin@lemmy.ca 1 points 3 months ago* (last edited 3 months ago) (1 children)

An expired cert means the browser would show an error message. I can’t send you any message if my cert is expired, because your browser won’t trust the connection.

UX designers have completely different skill sets than software engineers. At a small company, someone might do both roles, but at a company like Google or Microsoft, those are two different job titles. They do work together. In my experience, there’s a general consensus between both high level designers and high level engineers that giving the user useless information in an error message is a bad idea. There’s a reason these messages are similar across lots of companies. It’s because they are the best option for the business. If we need extra details from the user, we’ll have it printed in the console and tell them to open the console. That is incredibly rare, and basically only ever used for a network failure scenario in a service worker.

You can design your software for tech gurus, but you shouldn’t expect Microsoft Teams to be designed for tech gurus. Their customers are the general public (not super tech savvy), so they design for the general public.

You wrote “useless boilerplate error messages” in your comment, and I’m telling you that the useless part cannot be changed. You want useless detailed error messages. Good for you. Write software that gives you useless detailed error messages. Tell everyone about it and see how the general public reacts. I’ve been working in big tech for 17 years, and I am telling you from all of my experience that the general public will react poorly.

You’re upset that the information needed to fix the issue is not given to you, but you aren’t the one who needs that information. You’re not going to fix the issue. That information absolutely is provided to the people who need it, the engineers. In your metaphor of the Linux user not seeing the boot logs, you are not the Linux user. You don’t have access to the systems that need fixing, so what good would showing you the error log do? Again, the only benefit you would get from that is satisfying your curiosity. Tell me, how are you going to remove a downed host from a load balancer rotation at Google? Even if you had the ability to do that, you still don’t have permission.

Software devs need to make a choice. When we include details, people complain and post useless bug reports and forum posts. When we don’t include details, a much smaller number of people complain, and generally we don’t get useless bug reports and forum posts about it. Which one would you choose?

PS: the reason you feel that avoiding derogatory or abusive/malicious language in many different languages is easy to avoid is because you’re not a high level UX engineer. Fun fact, ChatGPT, when pronounced in French, sounds incredibly similar to “chat, j’ai pété” which translates to “cat, I farted”. Or, how about sending a “fatal error” message to a nurse?

[–] unhrpetby@sh.itjust.works 1 points 3 months ago (1 children)

expired cert...

Yes. Bad example. Pick any other number of examples. You can probably put a useful time range.

Best option for the business

Already commented on that. They believe it to be so, I don't agree with that choice.

You can design your software for tech gurus...

It doesn't have to be either or. Error messages can have a baseline of mild computer knowledge, and stretch up to people who know what they are doing. You can cater to both.

Useless boiler plate error message

It doesn't have to be utterly useless. Just because you can't fix anything from where you are doesn't mean you can't benefit. If the error is deemed unfixable for customers, give a timeframe of when it should be fixed and the intended course of action (what should they do if its not back up soon and they need it to be up). Useless is a choice, but its also subjective. You may find "Something went wrong. Try again later" as not useless. I deem it so.

you are not going to fix the issue

Unfounded assertion. I have fixed server-client issues before as the client. Let me repeat it: I have fixed server-client issues as the client. There are of course issues I can't fix

I think our disconnect partly comes from the fact that I am discussing this from a point of view of server operators being fallible. If in theory they always know what is fixable only on the server and never make a mistake in that regard, then we fall back to make a useless error message more useful. But they do make mistakes (or are purposefully hiding information so you don't know how to get around the error). The Linux example. It would be very easy to justify that in the same way that companies could justify a useless error message for something which could actually be fixed. How many people are going to look at the initframfs logs and know how to chroot in, edit the initramfs init script, and then rebuild the cpio and shove it in boot? Probably less than those that don't.

You could use this as a justification to hide it completely, but also harm those that could fix it, and also harm error reporting as the users machines just don't boot the distro. I disagree with this decision.

PS

if that affected ChatGPTs popularity, I couldn't tell.

So I'll round it all off with this: improve the error messages as a whole. Add contact information, time till likely fix, course of action (try again later is vague crap). The messages feel like an unhelpful wall, the error equivalent of a chatbot responding to your pleads for support. Also, you might not always be correct in whether something is fixable or not. You could add the detailed error information near the bottom, if people don't need it then no harm. If people do then its useful. Not adding it and then it being of use could be worse than adding it and it just never being necessary.

I think this topic is wrung out dry.

[–] hperrin@lemmy.ca 1 points 3 months ago

I think you’re trying very hard to ignore all the negative things I’ve told you users do when you include too much information. Maybe just go get a job at one of these big companies and submit a diff adding this information, then read why your diff gets rejected. I’m literally telling you the reasons big companies do this, and you just refuse to believe me. Maybe you’ll believe them.