This is actually misleading in the other direction: ChatGPT is a particularly intensive model. You can run a GPT-4o class model on a consumer mid to high end GPU which would then use something in the ballpark of gaming in terms of environmental impact.
You can also run a cluster of 3090s or 4090s to train the model, which is what people do actually, in which case it's still in the same range as gaming. (And more productive than 8 hours of WoW grind while chugging a warmed up Nutella glass as a drink).
Models like Google's Gemma (NOT Gemini these are two completely different things) are insanely power efficient.
According to https://arxiv.org/abs/2405.21015
The absolute most monstrous, energy guzzling model tested needed 10 MW of power to train.
Most models need less than that, and non-frontier models can even be trained on gaming hardware with comparatively little energy consumption.
That paper by the way says there is a 2.4x increase YoY for model training compute, BUT that paper doesn't mention DeepSeek, which rocked the western AI world with comparatively little training cost (2.7 M GPU Hours in total)
Some companies offset their model training environmental damage with renewable and whatever bullshit, so the actual daily usage cost is more important than the huge cost at the start (Drop by drop is an ocean formed - Persian proverb)