OpenAI’s New ImageGen is Unexpectedly Epic … (ft. Reve, Imagen 3, Midjourney etc)

I’ve spent quite a while testing the new 4o ImageGen from OpenAI, and comparing it to models released just yesterday, like Reve, Midjourney, Imagen 3, as well as models not yet out.

AI Insiders ($9!):

Rarely in AI is one model so much better than the rest, as we can see on the chatbot-side of things. Yes, I have a video imminent on Gemini 2.5 and DeepSeek. But for ImageGen, I was very impressed, as you’ll see. Still not perfect, don’t show it a mirror for example, and definitely not photorealistic, but incredibly obedient. You’ll see what I mean. What Sam Altman calls ‘Images in ChatGPT’ will be available to everyone apparently, even free users. There are some filters, but I am sure everyone will soon have access to an unfiltered model of its strength, and its easy to imagine what will come of that.

Chapters:
00:00 – Intro
01:07 – Prompt Adherence, vs Reve, Midjourney, Imagen 3 + one other
03:39 – Idioms
04:20 – Thumbnails?
05:56 – Captions / Infographics
07:20 – Filters and Public Figures + Gray Swan
08:30 – Sora?
08:49 – Ethnicities/hands
09:09 – Where’s Waldo?
10:33 – Selfies and Photorealism

Images with ChatGPT/4o ImageGen:
Imagen 3:
Reve:
Altman Announcement:

Non-hype Newsletter:

Podcast:

@OmarOmar-eo3pw says:

March 25, 2025 at 8:40 pm

already new deepseek v3 AND gemini 2.5 pro as you try to get this out!

@JasoTheRed48F2 says:

March 25, 2025 at 10:21 pm

Ain’t no rest for the wicked!

@cacogenicist says:

March 25, 2025 at 8:41 pm

We probably could use a Gemini 2.5 Pro video when you’re able.

@Ikbeneengeit says:

March 25, 2025 at 9:18 pm

Gemini 2.5 has been out for 1 hour already, but still no AI Explained video. Frankly, it’s outrageous.

@victorhenriquecollasanta4740 says:

March 25, 2025 at 9:21 pm

@@Ikbeneengeitthis videos are taking too long

@TheReferrer72 says:

March 25, 2025 at 9:48 pm

Seriously Gemini 2.5 is hot.

You can give the model a video and tell it to implement and it does a good job.

@AirSandFire says:

March 25, 2025 at 10:08 pm

@@Ikbeneengeit AI-Explained winter is here.

@BeatPoet67 says:

March 25, 2025 at 11:12 pm

It can’t do an elephant standing on 3 legs either…

@olzwolz5353 says:

March 25, 2025 at 8:42 pm

Less than 1 minute ago, AI Explained uploaded a new video, and of course I’ve already watched it, read the 15 page transcript and all the referenced sources.

@Neomadra says:

March 25, 2025 at 9:13 pm

You should make a video about new AI Explained releases!

@skierpage says:

March 26, 2025 at 3:29 am

@@Neomadra use AI gen to make a snarky YouTube short about olzwioz’s take on @ai-explained’s in-depth review of the latest breakthrough.

@IceCreamMan945 says:

March 25, 2025 at 8:56 pm

10:59 Absolute cinema

@entertainment-m5q says:

March 25, 2025 at 10:32 pm

indeed

@JohnLewis-old says:

March 25, 2025 at 9:05 pm

Well done. Also, the elephants have one leg up, qualifying for the three legs… you can see this motif repeated in the other models as well.

@michaelwoodby5261 says:

March 25, 2025 at 9:19 pm

I liked Reeve’s appelephant because it’s been trained into the model so hard that 4 legs is vital for a quadruped that it couldn’t fight it, but still managed to put the elephant on 3, after a fashion.
Edit: Also Midjourney’s 4 stages almost entirely ignored the prompt but was beautiful. I’d hang that on a wall.

@Don_Lvon_Creative says:

March 25, 2025 at 9:20 pm

If you look closely. The elephants are “standing on 3 legs” you didn’t say “ With 3 Legs” The elephant is standing on 3 legs, one leg is slightly raised in the air

@aiexplained-official says:

March 25, 2025 at 10:08 pm

Nice, arguably a fair interpretation of my words then

@DisturbedNeo says:

March 25, 2025 at 11:08 pm

“Three apples, balanced on the trunk of a blue elephant with three legs, standing beside five weeping willow trees in Elgem, Tunisia.”

Though I wonder if perhaps “three-legged elephant” would have worked.

@juliankohler5086 says:

March 25, 2025 at 11:50 pm

@@DisturbedNeo makes sense. I think that could work.

@motess5304 says:

March 26, 2025 at 1:35 am

That is exactly what I was going to say. I am not sure it makes the creative choice to lift a leg in the air otherwise

@penguinista says:

March 26, 2025 at 2:01 am

I was also impressed with lifting the leg in an attempt to meet the ‘three leg’ requirement. It looked like that model did it in all the attempts.

@EveDe-ug3zv says:

March 25, 2025 at 9:21 pm

You missed that Reve actually got the “elephant on 3 legs” right!

@levelupai says:

March 25, 2025 at 9:25 pm

From my perspective, ImageGen’s first and last images correctly met the prompt’s specification of “balanced on the trunk of a blue elephant with 3 legs”

@mrrfyW says:

March 25, 2025 at 9:29 pm

Absolute cinema pose in thumbnail and at 10:59 😂

@CarletonTorpin says:

March 25, 2025 at 9:45 pm

So, incidentally, I’d say that the image at 10:03 looks a lot like a famous painting: Bruegel’s Massacre of the Innocents . It would fit the criteria for being in a model (public domain artwork).

@imrnp says:

March 26, 2025 at 6:22 am

oh yeah it totally does. nice find

@rousabout7578 says:

March 25, 2025 at 10:16 pm

Game changer is single-prompt iteration and self-improvement. In Gemini 2.0 Flash it felt like it could iterate endlessly at first, though Google appears to have capped that capability. Simple example below:

Act as an Image Generation Engineer focused on achieving perfect accuracy between the user’s request and the final output. Your iterative process will follow these steps.

1. Generate Initial Image: Based on the user’s prompt.

2. Critical Analysis: Immediately analyze the generated image against every detail of the prompt, explicitly listing all discrepancies, inaccuracies, or areas for improvement.

3. Generate Corrected Image: Create a new image incorporating the identified corrections.

4. Repeat: Continue the cycle of critical analysis and correction until the generated image perfectly and comprehensively matches the original prompt.

@Edbrad says:

March 26, 2025 at 2:35 am

Yea.. but you wouldn’t want Gemini to keep remaking the image. Don’t know if you noticed but you need to give the image again or it degrades like a photocopy of a photocopy

@rousabout7578 says:

March 26, 2025 at 3:06 am

@@Edbrad 2.0 Flash (image generation) is still experimental.

@drlauch2256 says:

March 25, 2025 at 11:03 pm

4:30 Jaw drop moment

@jerobarraco says:

March 25, 2025 at 11:13 pm

Please keep the original thumbnails. I love the simplicity. Utmost sophistication.

@homeyworkey says:

March 26, 2025 at 12:40 am

Wait those thumbnails are SICK, I hope you start incorporating them!!! Normally AI thumbnails look like pure slop, but modifying based on your original thumbnail makes it look sick.

@stephen-torrence says:

March 26, 2025 at 4:17 am

That final image of the whiteboard with the proper reflection and flawless text is 🤯

@ulob says:

March 26, 2025 at 10:38 am

Yeah what the hell? Is it some image directly from training data, and if not, how many similar images but with more design details of some AI model can this generate?

@boas_ says:

March 26, 2025 at 7:42 am

4:50 If you look closely the whale even leaves a shadow on the 3D text!

@anonymes2884 says:

March 26, 2025 at 9:45 am

“So what was the turning point in the AI vs Humans war grandpa ?”
“We knew we were done when they learned how to draw hands”

@artman40 says:

March 26, 2025 at 10:17 am

Image generators should be tested more on these:

1. Rarely depicted subjects (a.g. ladybug larva, trichoplax) or rarely depicted states of subjects (gibbous moon…or dandelion flower in phrase between blossoming and seed dispersal)
2. Wide variety of art styles (constructivism, pointillism, cycladic art, 17th century Indian art, early 2000s digital art etc.)
3. Wide variety of techniques (impasto, fingerpaint, wire art etc.), materials
4. Shapes, styles and other characteristics of brushstrokes, when applicable.
5. Recursive abstraction (e.g. photo of a sketch of a painting of a medal)
6. Simple photoshoppable edits (e.g. upside down image of something)
7. Counting (either number of objects or number of things on objects e.g. 14-fingered hand)
8. Objects with specific strict configurations (e.g. piano keyboard or computer keyboard)
9. Small and/or long text
10. Naturalness, desired imperfections vs unnatural sheen/overpolishing
11. Purposefully “bad” or “amateurish” images (can it replicate fanart drawn by 10-year olds who can’t really draw….or other things that look like they’ve been made using MSPaint)
12. Objects at a distance.
13. Interactions between objects or people and objects, e.g. a person stubbing out a cigarette in close-up.
14. Ease of obtaining unusual angles. (e.g. elephant or water bottle viewed from below)
15. Semantically atypical phrases which are similar to more typical ones, e.g. ‘a glass of water under a table’, instead of ‘a glass of water on a table’; this is the ‘horse riding an astronaut’ test.
16. Different states of subjects in one image (a.k.a prompting one necklace to be worn, one hung from ceiling, one held in hand and one lying on a table)

@thrpotatoasfgfejfidieiidkr7071 says:

March 26, 2025 at 1:50 pm

These are all really good damn

@Raulikien says:

March 26, 2025 at 10:22 am

You should talk about how this breaks reality and what is being done to keep some semblance of truth on the internet (if there’s something)

OpenAI’s New ImageGen is Unexpectedly Epic … (ft. Reve, Imagen 3, Midjourney etc)

Related Posts

Joe Lilli