this post was submitted on 29 Mar 2025
822 points (91.6% liked)
Technology
68244 readers
3891 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
There is another aspect of this also. I could generate Ghibli style images a few years ago using better image generation models like stable diffusion or Midjourney. OpenAI is so lagging behind in terms of image generation it is comical at this point. But they get all the media coverage for these things as if they are inventing something out of thin air.
Most governments ignored the IP issues when other models were already doing these violations. Professionals are not using OpenAI. OpenAI only makes it so that these products reach big audiences. Then they become extremely accessible with the downside being that they are dumbed down. Thus, losing a lot of functionality.
This is what billionaires and major corporations are doing now and have been doing for a long time. Do you remember Titan sinking? What was so incredible is that the founder and CEO of Oceangate was acting like A: No one has ever gone to the Titanic before, and B: submarine travel is somehow a brand new thing that was just being invented by HIM.
This was utter bullshit on so many levels. James Cameron even spoke about how horrendous his assessment of the situation was, saying that the Titanic site is actually one of the riskier shipwrecks to go down to, which is why it needs to be approached with caution (which Oceangate did not care about), and that submarine travel is a very mature science and what the idiot CEO was doing wasn't simply a bad idea in general, but he believed he could violate the laws of physics.
You can break the laws and rules of society, but you cannot break the laws of physics. If you jump off the top of a skyscraper, no amount of arm flapping will make you fly.
They dropped a new image model last week using 4o to contextualize the request, it's very very good. However it's for paid subscribers only right now I believe.
However as you mentioned Stable diffusion and mid journey probably still have more customizability.
You're the one lagging behind. OpenAI's new image model is on a different level, way ahead of the competition
How so?
It understands what you're telling it, and can generate images from vague descriptions, combine things from different images just by telling it, modify it and understand the context - like knowing that "me" is the person in the image, for example.
Edit: From OpenAI - "4o image generation is an autoregressive model natively embedded within ChatGPT"
Okay so how does that compare to whatever competition you're referencing
No other model on market can do anything like that. The closest is diffusion based where you could train a lora with a person's look or a specific clothing, then generate multiple times and / or use controlnet to sorta control the output. That's fast hours or days of work, plus it's quite technical to set it up and use.
OpenAI's new model is a paradigm shift in both what the model can do and how you use it, and can easily and effortlessly produce things that was extremely difficult or impossible without complicated procedures and post processing in Photoshop.
Edit Some examples. Try to make any of this in any of the existing image generators
All diffusion and language models are autoregressive. That just means that the output is fed back in as input until the task is complete.
With diffusion models this means that it is fed an image that is 100% noise and it removes some small percentage of the noise and then then the denoised image is fed back in and another small percentage is removed. This is repeated until a defined stopping points (usually a set number of passes).
Combining images and using one image to control the generation of another has been available for quite a while. Controlnet and IPAdapters let you do exactly that: 'Put this coat on this person' or 'Take this picture and do it in this style'. Here's an 11 month old YouTube video explaining how to do this using open source models and software: https://www.youtube.com/watch?v=gmwZGC8UVHE
It's nice for non-technical people that OpenAI will sell you a subscription in order to access an agent that can perform these kinds of image generation abilities, but it's not doing anything new in terms of image generation.
I know them, and used them a bit. I even mentioned them in an earlier comment. The capabilities of OpenAI's new model is on a different level in my experience.
https://www.reddit.com/r/StableDiffusion/comments/1jlj8me/4o_vs_flux/ - read the comments there. That's a community dedicated to running local diffusion models. They're familiar with all the tricks. They're pretty damn impressed too.
I can't help but feel that people here either haven't tried the new openai image model, or have never actually used any of the existing ai image generators before.
I cannot take you seriously with all that reddit comments.
But then why am I even surprised, you shill for a proprietary-AI
ah yes, I forgot we live in post-truth society where reality doesn't matter and only your feelings are important. And since your feelings say AI bad, proprietary bad, and reddit bad, you don't have to actually think or take into consideration reality.
Truth in this case simply means Your ill-informed opinions
& FYI, I like AIs that are fully opensource
I'm sorry, but what is ill informed or opinion about it? Fact is it can do things no other image generator can do, open source or not. It can also effortlessly do things that would require a lot of tinkering with controlnet in comfyui, or even making custom lora's. It's a multimodal model that can do image and text both input and output, and does it well. All other useful image generators are diffusion based, which doesn't read a prompt in the same way, and is more about weighting patterns based on keywords rather than any real understanding of the prompt. That's why they're struggling with relatively simple things like "a full glass of wine" or "a horse riding an astronaut on the moon". If I'm wrong about this, please prove me wrong. Nothing would make me happier than finding an open source model that can do what openai's new image model can do, really. I already run llama.cpp servers and comfyui locally, I have my own AI server in the basement with a P40 and a 3090. Please, please prove me wrong here.
I love open models, and been running them locally since first llama model, but that doesn't mean I willfully ignore and pretend what claude and openai and google develops doesn't exist. Rather I want awareness about it, that it does exist, and I want an open source version of it.
It is really sad that the most advanced model can only aspire to make derivative shit for techbro loosers,
you know enough about the model for me to immediately distrust your opinion on the matter. why don't you head back to ycombinator or whatever hole you crawled out of