One of the more frustrating aspects of working with generative AI on a daily basis is the sheer number of completely unnecessary problems I encounter in AI images in the wild. It’s almost like the Baader-Meinhof phenomenon 1.
It’s hard to know whether these problems are a result of Kramer photographic-level inattention
, technical inexperience, laziness, or a combination thereof. The following are a few classic examples that really stood out to me, made all the funnier given their sources.
A Comical Strip
This image was pulled DIRECTLY from OpenAI’s very own Cookbook, a comprehensive prompting guide for generating high-quality images using the GPT image series. Gpt-Image-2 scored some of the highest marks in our GenAI Image comparison site, so it’s difficult to believe that this sample image was a result of the inherent limitations of the model.
Here is the original prompt:
Create a short vertical comic-style reel with 4 equal-sized panels.
Panel 1: The owner leaves through the front door. The pet is framed in the window behind them, small against the glass, eyes wide, paws pressed high, the house suddenly quiet.
Panel 2: The door clicks shut. Silence breaks. The pet slowly turns toward the empty house, posture shifting, eyes sharp with possibility.
Panel 3: The house transformed. The pet sprawls across the couch like it owns the place, crumbs nearby, sunlight cutting across the room like a spotlight.
Panel 4: The door opens. The pet is seated perfectly by the entrance, alert and composed, as if nothing happened.
Look at this monstrosity: three of the four panels have completely different doors. What is this guy, a door lord
An enigmatic creature featured in an episode of Adventure Time which summons doors into existence through the use of magic keys. https://en.wikipedia.org/wiki/What_Was_Missing ?
The way the comic handles the door has zero consistency. And look at panel 1, this might be the most egregious of them all. Check the inside and the outside of the house. What is this door even for? It leads from outdoors to… outdoors like its leading into some kind of inner garden area.
TLDR
If you want a detailed breakdown of all the issues in the form of a Usborne “1001 Things to Spot”, see the image at the bottom of this post.
In all honesty, this image has so many continuity issues that it would’ve been easier to scrap it and generate something new for a cleaner starting point. We decided to touch up the existing image and see how far we could push it anyway.
It still has some pretty major problems, but it’s at least passable now.
Education AI Lab
The following image was attached to none other than Karpathy’s (yes that Karpathy - one of the founding members of OpenAI) twitter post with regards to founding an education AI lab.
I just don’t understand how he didn’t take a few minutes to review the image before attaching it, particularly given the proposed nature of the venture. If that image was intended to be representative of the power of AI, I wouldn’t have a lot of faith in the aforementioned company.
Even at the time this image was posted in June 2024, SDXL and basic inpainting workflows for fixing errors like “Robocop Face” were already well established.
By running images through a SDXL diffusion workflow involving basic YOLO + ADetailer to generate simple img2img corrective masks, we can easily fix some of the most problematic issues; namely that most people’s faces have the texture of a wax candle, as if they’d been subject to the same radiation as Senator Robert Kelly’s character in the first X‑Men movie.
Further Notes
Footnotes
-
The Baader-Meinhof phenomenon (of the Simon Baader Robbins Meinhof & Taft law firm) or frequency illusion, is a cognitive bias where a recently learned or noticed concept seems to appear everywhere everywhere everywhere everywhere everywhere, creating the illusion that it is suddenly more frequent. ↩