• Home
  • AI

OpenAI’s New ImageGen is Unexpectedly Epic … (ft. Reve, Imagen 3, Midjourney etc)

I’ve spent quite a while testing the new 4o ImageGen from OpenAI, and comparing it to models released just yesterday, like Reve, Midjourney, Imagen 3, as well as models not yet out.

AI Insiders ($9!):

Rarely in AI is one model so much better than the rest, as we can see on the chatbot-side of things. Yes, I have a video imminent on Gemini 2.5 and DeepSeek. But for ImageGen, I was very impressed, as you’ll see. Still not perfect, don’t show it a mirror for example, and definitely not photorealistic, but incredibly obedient. You’ll see what I mean. What Sam Altman calls ‘Images in ChatGPT’ will be available to everyone apparently, even free users. There are some filters, but I am sure everyone will soon have access to an unfiltered model of its strength, and its easy to imagine what will come of that.

Chapters:
00:00 – Intro
01:07 – Prompt Adherence, vs Reve, Midjourney, Imagen 3 + one other
03:39 – Idioms
04:20 – Thumbnails?
05:56 – Captions / Infographics
07:20 – Filters and Public Figures + Gray Swan
08:30 – Sora?
08:49 – Ethnicities/hands
09:09 – Where’s Waldo?
10:33 – Selfies and Photorealism

Images with ChatGPT/4o ImageGen:
Imagen 3:
Reve:
Altman Announcement:

Non-hype Newsletter:

Podcast:

Joe Lilli
 

  • @OmarOmar-eo3pw says:

    already new deepseek v3 AND gemini 2.5 pro as you try to get this out!

  • @cacogenicist says:

    We probably could use a Gemini 2.5 Pro video when you’re able.

  • @olzwolz5353 says:

    Less than 1 minute ago, AI Explained uploaded a new video, and of course I’ve already watched it, read the 15 page transcript and all the referenced sources.

  • @IceCreamMan945 says:

    10:59 Absolute cinema

  • @JohnLewis-old says:

    Well done. Also, the elephants have one leg up, qualifying for the three legs… you can see this motif repeated in the other models as well.

  • @michaelwoodby5261 says:

    I liked Reeve’s appelephant because it’s been trained into the model so hard that 4 legs is vital for a quadruped that it couldn’t fight it, but still managed to put the elephant on 3, after a fashion.
    Edit: Also Midjourney’s 4 stages almost entirely ignored the prompt but was beautiful. I’d hang that on a wall.

  • @Don_Lvon_Creative says:

    If you look closely. The elephants are “standing on 3 legs” you didn’t say “ With 3 Legs” The elephant is standing on 3 legs, one leg is slightly raised in the air

    • @aiexplained-official says:

      Nice, arguably a fair interpretation of my words then

    • @DisturbedNeo says:

      “Three apples, balanced on the trunk of a blue elephant with three legs, standing beside five weeping willow trees in Elgem, Tunisia.”

      Though I wonder if perhaps “three-legged elephant” would have worked.

    • @juliankohler5086 says:

      @@DisturbedNeo makes sense. I think that could work.

    • @motess5304 says:

      That is exactly what I was going to say. I am not sure it makes the creative choice to lift a leg in the air otherwise

    • @penguinista says:

      I was also impressed with lifting the leg in an attempt to meet the ‘three leg’ requirement. It looked like that model did it in all the attempts.

  • @EveDe-ug3zv says:

    You missed that Reve actually got the “elephant on 3 legs” right!

  • @levelupai says:

    From my perspective, ImageGen’s first and last images correctly met the prompt’s specification of “balanced on the trunk of a blue elephant with 3 legs”

  • @mrrfyW says:

    Absolute cinema pose in thumbnail and at 10:59 😂

  • @CarletonTorpin says:

    So, incidentally, I’d say that the image at 10:03 looks a lot like a famous painting: Bruegel’s Massacre of the Innocents . It would fit the criteria for being in a model (public domain artwork).

  • @rousabout7578 says:

    Game changer is single-prompt iteration and self-improvement. In Gemini 2.0 Flash it felt like it could iterate endlessly at first, though Google appears to have capped that capability. Simple example below:

    Act as an Image Generation Engineer focused on achieving perfect accuracy between the user’s request and the final output. Your iterative process will follow these steps.

    1. Generate Initial Image: Based on the user’s prompt.

    2. Critical Analysis: Immediately analyze the generated image against every detail of the prompt, explicitly listing all discrepancies, inaccuracies, or areas for improvement.

    3. Generate Corrected Image: Create a new image incorporating the identified corrections.

    4. Repeat: Continue the cycle of critical analysis and correction until the generated image perfectly and comprehensively matches the original prompt.

    • @Edbrad says:

      Yea.. but you wouldn’t want Gemini to keep remaking the image. Don’t know if you noticed but you need to give the image again or it degrades like a photocopy of a photocopy

    • @rousabout7578 says:

      ​@@Edbrad 2.0 Flash (image generation) is still experimental.

  • @drlauch2256 says:

    4:30 Jaw drop moment

  • @jerobarraco says:

    Please keep the original thumbnails. I love the simplicity. Utmost sophistication.

  • @homeyworkey says:

    Wait those thumbnails are SICK, I hope you start incorporating them!!! Normally AI thumbnails look like pure slop, but modifying based on your original thumbnail makes it look sick.

  • @stephen-torrence says:

    That final image of the whiteboard with the proper reflection and flawless text is 🤯

    • @ulob says:

      Yeah what the hell? Is it some image directly from training data, and if not, how many similar images but with more design details of some AI model can this generate?

  • @boas_ says:

    4:50 If you look closely the whale even leaves a shadow on the 3D text!

  • @anonymes2884 says:

    “So what was the turning point in the AI vs Humans war grandpa ?”
    “We knew we were done when they learned how to draw hands”

  • @artman40 says:

    Image generators should be tested more on these:

    1. Rarely depicted subjects (a.g. ladybug larva, trichoplax) or rarely depicted states of subjects (gibbous moon…or dandelion flower in phrase between blossoming and seed dispersal)
    2. Wide variety of art styles (constructivism, pointillism, cycladic art, 17th century Indian art, early 2000s digital art etc.)
    3. Wide variety of techniques (impasto, fingerpaint, wire art etc.), materials
    4. Shapes, styles and other characteristics of brushstrokes, when applicable.
    5. Recursive abstraction (e.g. photo of a sketch of a painting of a medal)
    6. Simple photoshoppable edits (e.g. upside down image of something)
    7. Counting (either number of objects or number of things on objects e.g. 14-fingered hand)
    8. Objects with specific strict configurations (e.g. piano keyboard or computer keyboard)
    9. Small and/or long text
    10. Naturalness, desired imperfections vs unnatural sheen/overpolishing
    11. Purposefully “bad” or “amateurish” images (can it replicate fanart drawn by 10-year olds who can’t really draw….or other things that look like they’ve been made using MSPaint)
    12. Objects at a distance.
    13. Interactions between objects or people and objects, e.g. a person stubbing out a cigarette in close-up.
    14. Ease of obtaining unusual angles. (e.g. elephant or water bottle viewed from below)
    15. Semantically atypical phrases which are similar to more typical ones, e.g. ‘a glass of water under a table’, instead of ‘a glass of water on a table’; this is the ‘horse riding an astronaut’ test.
    16. Different states of subjects in one image (a.k.a prompting one necklace to be worn, one hung from ceiling, one held in hand and one lying on a table)

  • @Raulikien says:

    You should talk about how this breaks reality and what is being done to keep some semblance of truth on the internet (if there’s something)

  • >