OpenAI’s GPT-4o, released about a year before, has been improved and refined with new features. The AI model can create high-quality, detailed pictures and can follow your instructions in natural language to modify them until they are exactly what you imagined.
Older AI models had trouble with text. If you asked them to create a sign, they would produce gibberish words at best or squiggles which aren’t letters at all. Check out this:
GPT-4o can create images with perfectly legible text
Image generation typically starts with entering a text prompt, then you refine the image by refining the original prompt. GPT-4o works differently – you ask it for an image, then tell it what to change, then ask it to change more things and so on until you get your result. Here are some examples:
Generating and modifying an image through plain English
You can follow the Source link below to examine the prompts that created these images. Note that OpenAI did some cherry picking – a lot of the images are “best of 2” or even “best of 8”, so the model needed a few tries to get it right. Still, the results look quite impressive and the UI is as simple as it gets.
Here is another example. GPT-4o can start from scratch or it can modify an image you give it. Here, the user gives it a photo of a cat and asks the AI to give it a detective hat and monocle. Then the user proceeds to refine the image, turning it into something that can be a screenshot from an RPG.
Prototyping a cat detective RPG
You can start with multiple images too and integrate elements from each image into the final result. OpenAI says that GPT-4o is great at following detailed instructions – it can manipulate 10-20 different objects in a scene without getting tripped up (other AI models can only handle 5-8 objects, says the company).
GPT-4o is not perfect and OpenAI is the first to admit it. Sometimes, it crops images off at the bottom, hallucinations are still an issue, working with more than 10-20 objects can be tricky, rendering text with non-Latin characters needs work too and more.
Examples of GPT-4o getting it wrong
Finally, here are some video demonstrations showing off GPT-4o’s new image generation skills: