Overview
OpenAI has released a new multimodal image generative system called (rather unspectacularly) “4o Image Generation”. In this article we’ve done a bit of field testing with the new model.
HEADS UP!
OpenAI hasn’t always been super clear with their rollouts, and judging from the comments in the HN article - it seems like people are getting quite confused between 4o and the older DALL-E 3 which is also used in basically the same conventional fashion. You’ll know that you are using the new 4o model if the image slowly unblurs scanning from the top to the bottom of the image.
Each Prompt/Adjustment section is a transcript purely using the ChatGPT interface to modify the images through text prompts only, no inpainting. These images are also not cherry picked.
The Ring Toss
Prompt
Two Prussian soldiers wearing spiked pith helmets are facing each other and playing a game of ring toss by attempting to toss metal rings over the spike on the other soldier’s helmet.
Adjustment
It might be better to use a landscape aspect ratio so we can place the soldiers further apart.
Verdict
Most impressive. DALL-E 3 struggled pretty hard with this prompt. We’d score this one high for prompt adherence but rather low in terms of overall aesthetic - it has that unfortunate characteristic “gpt image” yellowish sheen. While relatively easy to adjust using temperature style settings in any decent graphic editor, it’s still worth calling out.
Venus De Milo
Prompt
A sculpture of Venus de milo before she was caught stealing a loaf of bread.
Honestly I would have been shocked to see ChatGPT get this correct. Amusingly it restored just her severed hands but not the rest of her arm.
Adjustment
If this is a sculpture of her before being caught stealing the bread, then logically it was before they chopped her hands off, therefore the sculpture should have complete arms.
Adjustment
Now let’s change the bread to a baklava.
Verdict
This is approaching the level of control you’d get from masking and inpainting but with just a single prompt.
The Magic Coloring Book
The magic coloring book trick is a classic magic illusion that creates the appearance of a coloring book magically changing from having blank pages to having black-and-white drawings, and then to having fully colored illustrations.
Thanks to HN poster algo_trader who provided the following schematic, we decided to see how 4o would handle it making sweeping changes to a very text-heavy diagram.
Original Image
Adjustment
Please add color to help outline and accentuate the various sections and components of this diagram.
Verdict
It’s honestly absolutely insane how well it did with this test. We were under no illusion that this would preserve the text 1:1 but it’s astonishing how much of the original details it managed to maintain. You might say that it passed with flyings colors.
The Nine Pointed Star
Formerly the nine pointed star was the achilles heel of nearly every image generation I’ve ever tried (Midjourney, DALL-E, Leonardo, Stable Diffusion, Janus, etc.). The only system that was ever able to get this one correct was Flux.
Prompt
A vector rendering of a 9-pointed star.
Adjustment
Remove one of the points from that star while preserving the element of symmetry.
Verdict
Gold star for 4o!
Alexander the Irate
Prompt
A historical oil painting of Alexander the Great riding a hippity hop toy into battle. The hippity hop toy was a old children’s toy that looked like a giant rubber ball that a child could straddle and hold onto rubber handles.
Adjustment
You’ve got Alexander grabbing a separate handle with each hand. The original toy only has a single rubber handle that you grip with both hands.
Animation
The core workout of riding a hippity hop made Alexander the Great’s march through the Indian subcontinent much more difficult but think of the GAINS! 💪
Verdict
We cannot condone bouncing of the seventh variety.