OpenAI’s New ImageGen is Unexpectedly Epic … (ft. Reve, Imagen 3, Midjourney etc)
I’ve spent quite a while testing the new 4o ImageGen from OpenAI, and comparing it to models released just yesterday, like Reve, Midjourney, Imagen 3, as well as models not yet out.
AI Insiders ($9!):
Rarely in AI is one model so much better than the rest, as we can see on the chatbot-side of things. Yes, I have a video imminent on Gemini 2.5 and DeepSeek. But for ImageGen, I was very impressed, as you’ll see. Still not perfect, don’t show it a mirror for example, and definitely not photorealistic, but incredibly obedient. You’ll see what I mean. What Sam Altman calls ‘Images in ChatGPT’ will be available to everyone apparently, even free users. There are some filters, but I am sure everyone will soon have access to an unfiltered model of its strength, and its easy to imagine what will come of that.
Chapters:
00:00 – Intro
01:07 – Prompt Adherence, vs Reve, Midjourney, Imagen 3 + one other
03:39 – Idioms
04:20 – Thumbnails?
05:56 – Captions / Infographics
07:20 – Filters and Public Figures + Gray Swan
08:30 – Sora?
08:49 – Ethnicities/hands
09:09 – Where’s Waldo?
10:33 – Selfies and Photorealism
Images with ChatGPT/4o ImageGen:
Imagen 3:
Reve:
Altman Announcement:
Non-hype Newsletter:
Podcast:
already new deepseek v3 AND gemini 2.5 pro as you try to get this out!
Ain’t no rest for the wicked!
We probably could use a Gemini 2.5 Pro video when you’re able.
Gemini 2.5 has been out for 1 hour already, but still no AI Explained video. Frankly, it’s outrageous.
@@Ikbeneengeitthis videos are taking too long
Seriously Gemini 2.5 is hot.
You can give the model a video and tell it to implement and it does a good job.
@@Ikbeneengeit AI-Explained winter is here.
It can’t do an elephant standing on 3 legs either…
Less than 1 minute ago, AI Explained uploaded a new video, and of course I’ve already watched it, read the 15 page transcript and all the referenced sources.
You should make a video about new AI Explained releases!
@@Neomadra use AI gen to make a snarky YouTube short about olzwioz’s take on @ai-explained’s in-depth review of the latest breakthrough.
10:59 Absolute cinema
indeed
Well done. Also, the elephants have one leg up, qualifying for the three legs… you can see this motif repeated in the other models as well.
I liked Reeve’s appelephant because it’s been trained into the model so hard that 4 legs is vital for a quadruped that it couldn’t fight it, but still managed to put the elephant on 3, after a fashion.
Edit: Also Midjourney’s 4 stages almost entirely ignored the prompt but was beautiful. I’d hang that on a wall.
If you look closely. The elephants are “standing on 3 legs” you didn’t say “ With 3 Legs” The elephant is standing on 3 legs, one leg is slightly raised in the air
Nice, arguably a fair interpretation of my words then
“Three apples, balanced on the trunk of a blue elephant with three legs, standing beside five weeping willow trees in Elgem, Tunisia.”
Though I wonder if perhaps “three-legged elephant” would have worked.
@@DisturbedNeo makes sense. I think that could work.
That is exactly what I was going to say. I am not sure it makes the creative choice to lift a leg in the air otherwise
I was also impressed with lifting the leg in an attempt to meet the ‘three leg’ requirement. It looked like that model did it in all the attempts.
You missed that Reve actually got the “elephant on 3 legs” right!
From my perspective, ImageGen’s first and last images correctly met the prompt’s specification of “balanced on the trunk of a blue elephant with 3 legs”
Absolute cinema pose in thumbnail and at 10:59 😂
So, incidentally, I’d say that the image at 10:03 looks a lot like a famous painting: Bruegel’s Massacre of the Innocents . It would fit the criteria for being in a model (public domain artwork).
oh yeah it totally does. nice find
Game changer is single-prompt iteration and self-improvement. In Gemini 2.0 Flash it felt like it could iterate endlessly at first, though Google appears to have capped that capability. Simple example below:
Act as an Image Generation Engineer focused on achieving perfect accuracy between the user’s request and the final output. Your iterative process will follow these steps.
1. Generate Initial Image: Based on the user’s prompt.
2. Critical Analysis: Immediately analyze the generated image against every detail of the prompt, explicitly listing all discrepancies, inaccuracies, or areas for improvement.
3. Generate Corrected Image: Create a new image incorporating the identified corrections.
4. Repeat: Continue the cycle of critical analysis and correction until the generated image perfectly and comprehensively matches the original prompt.
Yea.. but you wouldn’t want Gemini to keep remaking the image. Don’t know if you noticed but you need to give the image again or it degrades like a photocopy of a photocopy
@@Edbrad 2.0 Flash (image generation) is still experimental.
4:30 Jaw drop moment
Please keep the original thumbnails. I love the simplicity. Utmost sophistication.
Wait those thumbnails are SICK, I hope you start incorporating them!!! Normally AI thumbnails look like pure slop, but modifying based on your original thumbnail makes it look sick.
That final image of the whiteboard with the proper reflection and flawless text is 🤯
Yeah what the hell? Is it some image directly from training data, and if not, how many similar images but with more design details of some AI model can this generate?
4:50 If you look closely the whale even leaves a shadow on the 3D text!
“So what was the turning point in the AI vs Humans war grandpa ?”
“We knew we were done when they learned how to draw hands”
Image generators should be tested more on these:
1. Rarely depicted subjects (a.g. ladybug larva, trichoplax) or rarely depicted states of subjects (gibbous moon…or dandelion flower in phrase between blossoming and seed dispersal)
2. Wide variety of art styles (constructivism, pointillism, cycladic art, 17th century Indian art, early 2000s digital art etc.)
3. Wide variety of techniques (impasto, fingerpaint, wire art etc.), materials
4. Shapes, styles and other characteristics of brushstrokes, when applicable.
5. Recursive abstraction (e.g. photo of a sketch of a painting of a medal)
6. Simple photoshoppable edits (e.g. upside down image of something)
7. Counting (either number of objects or number of things on objects e.g. 14-fingered hand)
8. Objects with specific strict configurations (e.g. piano keyboard or computer keyboard)
9. Small and/or long text
10. Naturalness, desired imperfections vs unnatural sheen/overpolishing
11. Purposefully “bad” or “amateurish” images (can it replicate fanart drawn by 10-year olds who can’t really draw….or other things that look like they’ve been made using MSPaint)
12. Objects at a distance.
13. Interactions between objects or people and objects, e.g. a person stubbing out a cigarette in close-up.
14. Ease of obtaining unusual angles. (e.g. elephant or water bottle viewed from below)
15. Semantically atypical phrases which are similar to more typical ones, e.g. ‘a glass of water under a table’, instead of ‘a glass of water on a table’; this is the ‘horse riding an astronaut’ test.
16. Different states of subjects in one image (a.k.a prompting one necklace to be worn, one hung from ceiling, one held in hand and one lying on a table)
These are all really good damn
You should talk about how this breaks reality and what is being done to keep some semblance of truth on the internet (if there’s something)