AI Briefing: Writer’s CTO on How to Make AI Models Think More Creatively

by Marty Swant * 3 January 2025 *

When the training data for large language models is similar, it is important to find ways to make these models more creative and differentiated. This reality has led to more enterprise customers asking how to make AI creative when generating content – and to help the actual process of creative thinking.

The AI startup Writer launched a new LLM last month called Palmyra Creative, which aims to help enterprises squeeze more creativity from generative AI. The goal is not just to assist with outputs, but also to help businesses use AI in a more creative way. Palmyra Creative is the latest domain-specific LLM from Writer, following the healthcare-focused Palmyra Med as well as the finance-focused Palmyra Fin. Writer’s customers include Qualcomm, Vanguard Salesforce, Kenvue Uber and Dropbox.

AI models have evolved a lot in the last few years, especially when it comes to creative thinking. Some experts have found that LLMs are more creative than human beings in areas such as divergent thinking. Researchers at the University of Arkansas published an article last year that explored how OpenAI’s GPT-4 is able generate multiple creative ideas, find diverse solutions to problems, as well as explore various angles. But current LLMs are still largely limited by their own knowledge, which is derived from training data. They are not able to draw on the lived experiences and learned lessons that humans can.

The Writer process involves creating AI-models that are “self-adapting” or “self-evolving”said Writer CTO Waseem al Shikh, who founded the company in 2020 with Writer CEO May Habib. Shikh explained that the company is now focusing on developing models using a framework built upon three buckets: model reasoning, model knowledge and model behaviors.

In an interview with Digiday last month, Al Shikh said that it’s not enough to have a model that is creative. “It’s like a person, right?” The funny thing is that we don’t just create all the ideas based on one theme. The plan for the future is to have self evolving functionalities in all of our models, and to put creativity at the top. NIMs are like a flight control system that decides which AI model to use and when, depending on the company’s knowledge and the task.

Shikh said, “With workflows you know where to start and what steps to take.” “This concept of NIM seems very futuristic. We can get there but you’ll still need all these models. We’re building domain specific models to address this. You can have three, four, or five specific models that are self-evolving based on customer’s behaviors.”

Unlocking creative ways of thinking could give marketers new ways to come up with new ideas and break out from AI echo chambers. Writer sees retail using Palmyra Creative to enhance loyalty programs or personalize marketing campaigns. The models could help healthcare providers simplify patient communication, provide financial firms with more educational tools, or give B2B technology companies ideas for product positioning and refining of technical documents. This conversation has been edited to ensure clarity and brevity. What makes Palmyra Creative unique from other models?

Our bigger model and larger models — such as finance or medical — focus more on what we call Knowledge. We want them accurate for each and every formula they use, as well as every medicine. When you use a financial model, you need to focus on the core reasoning and math equations. The behavior will also change. General models attempt to balance [knowledge, reasoning and behavior]between them.

What made the model development process different?

Since the models all have similar architectures and training data, it’s just a matter of finding similarity in the weights and the way this weight looks. We decided to use the same training data as we do today, but be more creative in the weights we used. We trained three different models, and then started merging the models and shuffled them between the layers. The result is a unique relationship that does not exist in any other model. We also discovered that the model had interesting behaviors. It can actually push back, and does not follow the traditional path everyone else follows because the weight is unique to the model. We call this dynamic merging of the layers.

The idea of merging a model isn’t new, but the technique and the way it is used is. The difference is that we are slicing between the models and we have a way to ensure the relationship between them does not break so you do not end up with gibberish or strange hallucinations. It’s a fine line between what is hallucination, and what creativity looks.

It reminds me how creativity often occurs in the blurred lines between fact and fiction.

One hundred percent. We have to define this, especially for enterprise customers. We say that we want the model’s output to be as broad and varied as possible, but we also need it to be cautious about one thing. These are the claims. There’s a big difference between “let me tell you a crazy thought” and a claim which seemed unchecked. We spent a lot of time on what we call controlled statements. We don’t know the source of the truth [for the model] since we can’t consider Wikipedia as a source of the truth, can we. It’s full of random stuff. We cannot accept that every single thing that comes from every single government in the world is the truth. We decided to keep the model creative but not make any claims.

When hallucinations are being justified, they often have to explain themselves. Is it possible that this is less of a problem without the need to verify claims?

Exactly. We decided to start at the root and control the claim. The [Palmyra] Creative Model is less about knowledge and a lot more about behavior. We believe enterprises will love the creative model. They can use it to write case studies, find new uses cases, or write more creative stories to explain how to adopt products. Controlling the claim was key. You said that if you do not have a claim you don’t need to explain it.

What is the best way to guide a model in terms of when it should be creative or evolve and when it needs to be consistent?

Since early summer, we’ve been working at it. What if these models could think more like humans? What if models could reflect, rotate and remember? Can we get them to work outside of the training set and in real-time, essentially? All models are still stuck on the training data. Without the training data, you can’t get them to do anything. This is what we mean by self-evolving. Self-evolving means that you don’t have to teach the models. The model will update the weight in real-time. The model will reflect. The model can ensure the information.

Let’s give you a bad instance: If I say that my name is Waseem but I am the president of United States, then the model is smart enough to know that ‘Maybe you are Waseem but you are not the president’. This is really important and if you use the model more, it will gain more knowledge and control. It’s high-level, and it takes a long time to explain. But it’s standard transformer design with Memory. Each layer of the neural network is accompanied by a memory layer. You can talk to it, and it will change.

The model will not make the same mistake twice, because we know the wrong answer. It will remember the wrong answer [one] so that it can try again the next time. I always tell my team that most humans — but not all — learn from their mistakes and they don’t repeat them. This week

  • Rembrand is a generative AI startup which helps brands place virtual products on social media and in other content.
  • Lucid Motors is partnering up with SoundHound AI in order to integrate a voice assistant in cars that will give drivers real-time info and allow them to control more features in the vehicle. TurboTax’s new campaign promotes AI agents as well as “AI-powered experts” for the Intuit app that helps people file their taxes. The CES 2025 will feature a lot of AI as tech giants, startups, and brands descend upon the Nevada desert next week to promote their updates and partnerships.

AI stories from across Digiday

  • How AI could shape content and ads in 2025
  • Generative AI grows up: Digiday’s 2024 timeline of transformation
  • The definitive Digiday guide to what’s in and out for advertising in 2025
  • 2024 in review: A timeline of the major deals between publishers and AI companies
  • Why early generative AI ads aren’t working and how creatives will shift to integrate the tech into their work
  • How Omnicom’s purchase of IPG changes the notion of an agency holding company

https://digiday.com/?p=564480

More in Media

Read More

More from this stream

Recomended


Notice: ob_end_flush(): Failed to send buffer of zlib output compression (0) in /home2/mflzrxmy/public_html/website_18d00083/wp-includes/functions.php on line 5464