Home News Google’s Project Astra could be the killer app for generative AI

Google’s Project Astra could be the killer app for generative AI

0
Google’s Project Astra could be the killer app for generative AI
Last week,

MIT Technology Review had the opportunity to experience Astra in a live demo behind closed doors. It was a beautiful experience, but the difference between the polished promo and a live demo is huge.

Astra uses Gemini 2.0’s built-in framework to answer questions, carry out tasks, and display images and videos. It also calls up existing Google applications like Search, Maps and Lens as needed. “It merges together some of the most powerful systems for retrieving information of our time,” Bibo Xu says, product manager of Astra.

Gemini and Astra will be joined by Mariner, a Gemini-powered agent that can browse the internet for you, Jules, a Gemini-powered coding assistance, and Gemini for Games – an experimental assistant you can chat with and ask for tips while you play video games.

And let’s also not forget that Google DeepMind announced Veo, which is a video generation model, Imagen 3, which is a version of its image-generation model, and Willow, an entirely new type of chip for quantum computer. Whew. Demis Hassabis, the CEO of Google DeepMind, was in Sweden yesterday to receive his Nobel Prize.

Google DeepMind says that Gemini 2.0 has twice the speed of the previous version Gemini 1.5 and performs better on a number standard benchmarks. These include MMLU Pro, a large collection of multiple-choice question designed to test the ability of large language models in a variety of subjects from math and physics, to health, psychology and philosophy.

However, the gap between top-end models such as Gemini 2.0 and those of rival labs like OpenAI or Anthropic is now very small. Today, the advancements in large language models are more about what they can do than how good they are.

That’s where agents are needed.

Project Astra: hands-on

I was taken last week through an unmarked upper floor door in London’s King’s Cross District into a room that had a strong secret project vibe. The word “ASTRA”, in large letters, was emblazoned across one wall. Charlie, Xu’s pet dog and the project’s de-facto mascot, walked between desks, where engineers and researchers were busy building a Google product on which they are betting their future.

I told my mum that we were building an AI with eyes, ears and a voice. Greg Wayne, the co-leader of the Astra Team, says that it can be anywhere and can help with whatever you’re doing. “It isn’t there yet, but it’s that kind of vision.”

Xu, Wayne and their colleagues have been working on a “universal assistant” for a while, though they’re still figuring out what that means.

On one end of the Astra Room, the team used two stage sets for demonstrations. A mock-up art gallery and a drinks bar. Xu brought me to the bar. Praveen Srinivasan said that they had hired a cocktail expert to teach them how to make cocktails. “We recorded these conversations and used them to train our first model.”

Xu opened a book with a recipe for a curry chicken, pointed her smartphone at it, then woke Astra. “Ni hao Bibo!” a female voice said.

“Oh! Xu asked the phone, “Why are you speaking in Mandarin?” Please, can you speak to my phone in English?

My apologies. I was following an earlier instruction to speak in Mandarin. I will now speak English as you requested.”

Astra remembered previous conversations, Xu said. It also keeps track the previous 10 minutes video. In the Google promo video from May, Astra tells the person doing the demo where her glasses are after spotting them on a table a few moments earlier. But I didn’t see anything like this during the live demo.

Return to the cookbook. Xu asked Astra, who was holding her phone’s camera over the recipe page for a few moments, to read it and tell her the spices in it. It replied, “I remember the recipe mentioning one teaspoon of black peppercorns and a teaspoon each of hot chili powder and cinnamon stick.”

Xu said, “I think you are missing a few.” “Take another glance.”

I apologize. I also see curry leaves and ground turmeric in the ingredients.”

When you first see this technology in action, it’s hard to miss two things. It’s often glitchy and needs to be corrected. Second, these glitches can easily be corrected by a few spoken instructions. You interrupt the voice and repeat your instructions. It’s more like coaching a kid than arguing with a broken piece of software.

Xu then pointed her phone to a row wine bottles and asked Astra which one would be best for the chicken curry. It chose a rioja, and explained why. Xu asked what a bottle of wine would cost. Astra replied that it would have to use Search in order to check prices online. It returned with the answer a few seconds later.

As we moved to the art museum, Xu showed Astra a series of screens with famous paintings: Munch’s The Scream (19459006), the Mona Lisa (19459006), a Vermeer and a Seurat. “Ni hao Bibo!” said the voice.

Xu said, “You’re talking to me in Mandarin once again.” Please try to speak English to me.

I’m sorry, but I misunderstood. I will answer in English. (I should have known better, but I swear I heard snark.

I was next. Xu gave me her phone.

Astra was not amused by my attempts to trip it up. It refused to guess what famous art gallery was where we were. I asked it why it had identified these paintings as replicas, and it began to apologize for the mistake (Astra apologizes often). I felt compelled interrupting: “No, you’re right–it’s not a mistake. You are correct in identifying paintings on screens as fakes.” I couldn’t resist feeling a little bad: I had confused an app which exists only to please. Astra is captivating when it works well. It’s a new and seamless experience to have a conversation about what you are pointing your phone at. Google DeepMind shared an example of how it could be used in a recent media briefing. It showed a video that demonstrated other uses, such as reading an email to find a code for a door (and then remembering it later), pointing your phone at a bus passing by and asking it where it goes, or quizzing it on a public art piece while you walk by. This could be the killer app for generative AI.

But there’s still a long way to travel before the majority of people can get their hands on this technology. No release date is given. Google DeepMind also shared videos showing Astra working on smart glasses, though that technology is further down the list of company’s priorities.

Mixing up

At the moment, researchers outside Google DeepMind keep a close watch on its progress. Maria Liakata who works at Queen Mary University of London’s Alan Turing Institute and on large language models says that the way things are combined is impressive. “It is hard enough to reason with language but now you have to add images and more.” It’s not trivial.

Liakata was also impressed by Astra’s ability to remember things it had seen or heard. She is working on what she calls a long-range context. This involves getting models to remember information they have encountered before. Liakata says, “This is exciting.” “Even doing it with a single modality, is exciting.”

She admits that much of her assessment is based on guesswork. She says that multimodal reasoning is cutting-edge. It’s hard to know where they are because they haven’t said much about the technology itself. “We don’t even know how Google does it,” he says.

According to him, if Google was more transparent about the technology it is developing, it would allow consumers to better understand the limitations. He says that consumers need to understand how these systems operate. “You want the user to be able see what the system learned about them, to correct errors, or to remove private things that you want to keep.”

Liakata, too, is concerned about the implications of privacy, pointing to the fact that people could be watched without their consent. She says, “I’m excited and concerned about different things.” There’s a certain unnerving quality to the idea of your phone becoming your eyes.

“But it has become a race among the companies.” It’s problematic because we don’t agree on how to evaluate the technology.

Google DeepMind claims that it carefully examines privacy, security and safety in all of its new products. Its technology will be tested for months by trusted users before it is released to the public. “We’ve got think about misuse.” Dawn Bloxwich is the director of responsible innovation and development at the company. She says, “We’ve got think about what happens when something goes wrong.” “There is a huge potential. The productivity gains are enormous. But it’s also risky.”

A team of testers cannot anticipate all the ways people will misuse and abuse new technology. What’s your plan for the inevitable? Bloxwich says that companies should design products that are easily recalled or turned off in case of emergency.

Read More

NO COMMENTS

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Exit mobile version

Notice: ob_end_flush(): Failed to send buffer of zlib output compression (0) in /home2/mflzrxmy/public_html/website_18d00083/wp-includes/functions.php on line 5464