There isn’t a date in the article, but I know I had read this months ago. And sure enough, wayback has the text-to-image page from April.
But the image editing page linked at the top is more recent, and was added sometime in September. (And was presumably the intended link) I hadn’t read that page yet. Odd there is no dates, at first glance one might think the pages were made at the same time.
I'd assume that behind the scenes the models generate several passes and only show the user the best one, that would be smart, as to to make it seem their model is better than others
Is also pretty obvious that the models have some built in prompt system rules that makes the final output a certain style. They seem very consistent
It also looks like 40 has the temperature turned way down, to ensure max adherence, while midjourney etc seem to have higher temperature.more interesting end results, flourishing, complex Materials and backgrounds
Also what's with 4o's sepia tones. Post editing in the gen workflows?
I don't believe any of these just generate the image though, there's likely several steps in each workflows to present the final images outputted to the user in the absolute best light.
> "A dolphin is using its fluke to discipline a mermaid by paddling it across the backside."
If this one were shown in a US work environment, I might say a collegial something privately to the person, about it not seeming the most work-appropriate.
I think I’d probably say that the prompts are telling me more about the author than I think is necessary for these tests… I hope they were at least sampled from responses.
The "editing" showdown is very good. Introduced me to the Seedream model which i didn't know about until now.
I don't fully understand the iterative methodology tho - they allow multiple attempts, which are judged by another multimodal llm? Won't they have limited accuracy in itself?
for the OpenAI 4o model on the octopus sock puppet prompt, the prompt clearly states that each tentacle should have a sock puppet, whereas the OpenAI 4o image only has 6 puppets with 2 tentacles being puppetless. I’m not sure if we can call that a pass
The title of this article is "image editing showdown", but the subject is actually prompt adherence in image generation from prompting.
Midjourney and Flux Dev aren't image editing models. (Midjourney is an aesthetically pleasing image generation model with low prompt adherence.)
Image editing is a task distinct from image generation. Image editing models include Nano Banana (Gemini Flash), Flux Kontext, and a handful of others. gpt-image-1 sort of counts, though it changes the global image pixels such that it isn't 1:1 with the input.
I expect that as image editing models get better and more "instructive", classical tools like Photoshop and modern hacks like ComfyUI will both fall away to a thin fascade over the models themselves. Adobe needs to figure out their future, because Photoshop's days are numbered.
Edit: Dang, can you please fix this? Someone else posted the actual link, and it's far more interesting than the linked article:
This would be easy to patch the models to fix. Just gather a small amount of training data for these cases, eg. "change the clock hands to 5:30" with the corresponding edit.
Three tuple: (original image, text edit instruction, final image).
Easy to patch for editing models, anyway. Maybe not text to image models.
There isn’t a date in the article, but I know I had read this months ago. And sure enough, wayback has the text-to-image page from April.
But the image editing page linked at the top is more recent, and was added sometime in September. (And was presumably the intended link) I hadn’t read that page yet. Odd there is no dates, at first glance one might think the pages were made at the same time.
I'd assume that behind the scenes the models generate several passes and only show the user the best one, that would be smart, as to to make it seem their model is better than others
Is also pretty obvious that the models have some built in prompt system rules that makes the final output a certain style. They seem very consistent
It also looks like 40 has the temperature turned way down, to ensure max adherence, while midjourney etc seem to have higher temperature.more interesting end results, flourishing, complex Materials and backgrounds
Also what's with 4o's sepia tones. Post editing in the gen workflows?
I don't believe any of these just generate the image though, there's likely several steps in each workflows to present the final images outputted to the user in the absolute best light.
There are numbers on how many tries it took. I would also find the individual prompts and images interesting.
Actual link seems to be: https://genai-showdown.specr.net/image-editing
This is the editing link yes. I just got done looking at it from the other link.
The other stuff is text to image (not editing)
I had to upvote immediately once I got to Alexander the Great on a Hippity Hop
The horse chimera is much better
Slight nit: it lists “OpenAI 4o” but the model used by ChatGPT is a distinct model labeled “gpt-image-1” iirc
A prompt id love to see: person riding in a kangaroo pouch.
Most of the pure diffusion models haven’t been able to do it in my experience.
Edit: another commenter pointed out the analog clock test, lets add the “analog clock showing 3:15” as well (:
The link is to the imagegen test not the editing one. Here 4o was used to preprocess the prompt.
> "A dolphin is using its fluke to discipline a mermaid by paddling it across the backside."
If this one were shown in a US work environment, I might say a collegial something privately to the person, about it not seeming the most work-appropriate.
I think I’d probably say that the prompts are telling me more about the author than I think is necessary for these tests… I hope they were at least sampled from responses.
The "editing" showdown is very good. Introduced me to the Seedream model which i didn't know about until now.
I don't fully understand the iterative methodology tho - they allow multiple attempts, which are judged by another multimodal llm? Won't they have limited accuracy in itself?
>Cephalopodic Puppet Show
I'm pretty sure that only Gemini made it. Other models did not meet the 'each tentacle covered' criteria.
for the OpenAI 4o model on the octopus sock puppet prompt, the prompt clearly states that each tentacle should have a sock puppet, whereas the OpenAI 4o image only has 6 puppets with 2 tentacles being puppetless. I’m not sure if we can call that a pass
Please fix the title, or change the link.
The title of this article is "image editing showdown", but the subject is actually prompt adherence in image generation from prompting.
Midjourney and Flux Dev aren't image editing models. (Midjourney is an aesthetically pleasing image generation model with low prompt adherence.)
Image editing is a task distinct from image generation. Image editing models include Nano Banana (Gemini Flash), Flux Kontext, and a handful of others. gpt-image-1 sort of counts, though it changes the global image pixels such that it isn't 1:1 with the input.
I expect that as image editing models get better and more "instructive", classical tools like Photoshop and modern hacks like ComfyUI will both fall away to a thin fascade over the models themselves. Adobe needs to figure out their future, because Photoshop's days are numbered.
Edit: Dang, can you please fix this? Someone else posted the actual link, and it's far more interesting than the linked article:
https://genai-showdown.specr.net/image-editing
This article is great.
What about the classic: A analog watch that shows the time 08:15?
Did current models overcome the 10:10 bias?
This would be easy to patch the models to fix. Just gather a small amount of training data for these cases, eg. "change the clock hands to 5:30" with the corresponding edit.
Three tuple: (original image, text edit instruction, final image).
Easy to patch for editing models, anyway. Maybe not text to image models.