Former Google AI researcher

December 31, 2024

: ChatGPT could have achieved greater success sooner.

Jacob Uszkoreit Credit: Jakob Uszkoreit / Getty Images

In 2017, eight machine-learning researchers at Google released a groundbreaking research paper called attention is all you need which introduced the Transformer AI architecture that underpins almost all of today’s high-profile generative AI models.

Jakob Uszkoreit (19459064) Credit: Jakob Uszkoreit/Getty Images

Eight machine-learning researchers from Google published a groundbreaking research article in 2017 called attention is all you needwhich introduced the Transformer AI Architecture that underpins most of today’s high profile generative AI models.

By using a neural net, the Transformer has enabled a key component in the modern AI boom by translating (or transforming) input chunks called “tokens” to another desired output. The Transformer architecture is used to power models such as GPT-4o and ChatGPT, audio synthesis engines that run Google NotebookLM and OpenAI Advanced Voice Mode, image synthesis algorithms like Midjourney, and video synthesis algorithms like Sora. Jakob Uszkoreit spoke to Ars Technica at TED AI 2024, in October, about the development and use of transformers. He also discussed Google’s early efforts on large language models and his new venture into biological computing. In the interview Uszkoreit revealed, that while his Google team had high hopes for this technology’s potential they didn’t anticipate its pivotal importance in products like ChatGPT. The Ars interview with Jakob Uszkoreit.

Ars Technology: How did you contribute to the attention is all you need article?

Jakob Uszkoreit: You can find it in the footnotes. But my main contribution to theAttention is All You Need paper was to suggest that attention, or specifically self-attention, could replace recurrence in the dominant sequence transmission models of the time. It could be more effective and efficient.

Had you any idea what would occur after your group published this paper? Did you anticipate the industry and its ramifications that would be created?

Firstly, I think it is important to remember that we were standing on giants’ shoulders when we did this. It wasn’t just one paper. This was the result of a series of works done by many of us. I think it’s a storyteller’s view to say that this paper started something or created something, but it may not be accurate.

For years, my team at Google had been pushing attention models. It’s a much longer slog, with much, much, more. And that’s just mine. We had high hopes for this project, as did many others, but we were confident that it would help push technology forward. Did we believe that it would be a key factor in enabling or, at least, seemingly, flipping the switch to facilitate products like ChatGPT. I don’t believe so. To be clear, I’m talking about LLMs, and their capabilities. Even around the time that we published the paper, there were phenomena that we found to be quite astounding.

We did not release those to the world because we were perhaps a little conservative at the time about products at Google. We also weren’t confident, even with these signs, that the stuff would be enough to make a compelling product. But did we have high expectations? Yeah.

Ars:Given that you knew Google had large language models, what did your initial reaction be when ChatGPT became a public success. “Damn, they got it, and we didn’t?”

There were a few ideas, like, “that could have happened.” or “Oh dang, they got it first” but I don’t think it was anything like that. It was more like a “Whoa, that could have happened sooner.” Was it still amazing to me how quickly people were able to get super creative with that stuff? Yes, it was breathtaking.

Jakob Uszkoreit presenting at TED AI 2024. Benj Edwards

At that time, you weren’t working at Google anymore, right?

OF: No, I didn’t. In a sense, I left because Google wasn’t the right place to do this. I left Google not because I didn’t enjoy it, but because I felt that I had to do something else. I started Inceptive.

It was motivated by a huge opportunity to design better drugs and have a direct impact on the lives of people.

What’s funny about ChatGPT is I was using GPT-3 prior to that. ChatGPT was not a big deal for some people because they were already familiar with the technology.

JU: Yeah, exactly. If you’ve ever used these things before, you can see the progression. We would discuss the early GPTs that OpenAI developed with Alec Radford, and other people, even though we were not at the same company. And I’m certain there was this excitement, about how the actual ChatGPT would be received by how many people and how quickly. This is still something I don’t believe anyone really anticipated.

Neither did I when I covered the story. It felt like “Oh, this is a chatbot hack of GPT-3 that feeds its context in a loop.” and I didn’t consider it a breakthrough moment, but it was fascinating.

OF: There’s different kinds of breakthroughs. It wasn’t an innovation in technology. It was a breakthrough to realize that the technology at that level had such high utility.

This, and the realization, that you have to always take into consideration how your users use the tool you create. You might not anticipate just how creative they will be in their use, or how broad these use cases are.

You can only learn this by putting your work out there. This is why it’s important to be willing to experiment and to embrace failure. Most of the time it won’t work. It will work sometimes, but very rarely.

Ars: You have to take a chance. Google was not a risk taker?

OF: No, not at that time. If you think about it and look back at it, it is actually quite interesting. Google Translate was similar to what I worked on many years ago. When we launched Google Translate in its very first version, it was at best a joke. In a short time, we transformed it into a useful tool. Google tried it anyway, even though the results were sometimes embarrassingly bad. That was in 2008, 2009 and 2010.

Ars: Do you remember AltaVista’sBabel Fish?

OF: Of course.

JU: Oh yeah, of course. We would translate text between languages to have fun, but it would garble.

IN: The text would get worse and worst. Yeah.

Programming biological computers

Uszkoreit founded Inceptive after his time at Google to apply deep-learning to biochemistry. The company is working on what he calls “biological software,” where AI compilers translate specified behavior into RNA sequences which can perform desired functions once introduced to biological systems.

ARs: How are you doing these days?

Ars: In 2020, we will co-found Inceptive to use deep learning and high-throughput biochemistry experiments to design better medicines which can be programmed. We see this as a first step towards what we call biological software.

The biological software is similar to computer software, in that it has a specification of the behavior you want and then a compiler translates that into computer software which runs on a machine displaying the functions or functionality that you specify.

A biological program is specified and compiled, but not using an engineered compiler because life hasn’t been engineered the same way computers have. With a learned AI-compiler, you can translate or compile this into molecules which, when inserted into organisms and biological systems, our cells will exhibit the functions you have programmed.

A pharmacist holds a bottle containing Moderna’s bivalent COVID-19 vaccine. Credit: Getty | Mel Melcon

Ars: Is that anything like how the mRNA COVID vaccines work?

JU: A very, very simple example of that are the mRNA COVID vaccines where the program says, “Make this modified viral antigen” and then our cells make that protein. But you could imagine molecules that exhibit far more complex behaviors. And if you want to get a picture of how complex those behaviors could be, just remember that RNA viruses are just that. They’re just an RNA molecule that when entering an organism exhibits incredibly complex behavior such as distributing itself across an organism, distributing itself across the world, doing certain things only in a subset of your cells for a certain period of time, and so on and so forth.

And so you can imagine that if we managed to even just design molecules with a teeny tiny fraction of such functionality, of course with the goal not of making people sick, but of making them healthy, it would truly transform medicine.

Ars: How do you not accidentally create a monster RNA sequence that just wrecks everything?

JU: The amazing thing is that medicine for the longest time has existed in a certain sense outside of science. It wasn’t truly understood, and we still often don’t truly understand their actual mechanisms of action.

As a result, humanity had to develop all of these safeguards and clinical trials. And even before you enter the clinic, all of these empirical safeguards prevent us from accidentally doing [something dangerous]. Those systems have been in place for as long as modern medicine has existed. And so we’re going to keep using those systems, and of course with all the diligence necessary. We’ll start with very small systems, individual cells in future experimentation, and follow the same established protocols that medicine has had to follow all along in order to ensure that these molecules are safe.

Ars: Thank you for taking the time to do this.

JU:No, thank you.

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

41 Comments