OpenAI launches GPT-4o with improved text rendering, instruction following and OpenAI launch

OpenAI’s GPT-4o, released about a year before, has been improved and refined with new features. The AI model can create high-quality, detailed pictures and can follow your instructions in natural language to modify them until they are exactly what you imagined.

Older AI models had trouble with text. If you asked them to create a sign, they would produce gibberish words at best or squiggles which aren’t letters at all. Check out this:

GPT-4o can create images with perfectly legible text

Image generation typically starts with entering a text prompt, then you refine the image by refining the original prompt. GPT-4o works differently – you ask it for an image, then tell it what to change, then ask it to change more things and so on until you get your result. Here are some examples:

Generating and modifying an image through plain English

You can follow the Source link below to examine the prompts that created these images. Note that OpenAI did some cherry picking – a lot of the images are “best of 2” or even “best of 8”, so the model needed a few tries to get it right. Still, the results look quite impressive and the UI is as simple as it gets.

Here is another example. GPT-4o can start from scratch or it can modify an image you give it. Here, the user gives it a photo of a cat and asks the AI to give it a detective hat and monocle. Then the user proceeds to refine the image, turning it into something that can be a screenshot from an RPG.

Prototyping a cat detective RPG

You can start with multiple images too and integrate elements from each image into the final result. OpenAI says that GPT-4o is great at following detailed instructions – it can manipulate 10-20 different objects in a scene without getting tripped up (other AI models can only handle 5-8 objects, says the company).

GPT-4o is not perfect and OpenAI is the first to admit it. Sometimes, it crops images off at the bottom, hallucinations are still an issue, working with more than 10-20 objects can be tricky, rendering text with non-Latin characters needs work too and more.

Examples of GPT-4o getting it wrong

Finally, here are some video demonstrations showing off GPT-4o’s new image generation skills:

source

OpenAI launches GPT-4o with improved text rendering, instruction following and OpenAI launch

Sabi focuses on TRACE to ensure transparent mining of Africa’s mineral...

Lipa Later enters administration after failed fresh fundraising efforts

Haiku and Linux both get new FOSS Nvidia driver

How Time evaluates AI tools and vets them

Recomended

Sabi focuses on TRACE to ensure transparent mining of Africa’s mineral and agricultural wealth

Lipa Later enters administration after failed fresh fundraising efforts

Haiku and Linux both get new FOSS Nvidia driver

How Time evaluates AI tools and vets them

The first trial shows that generative AI could help with depression.

Character.ai now lets parents know which bots their child is talking to