Home Technology Computer Vision AI Godmother Fei Fei Li Has a Vision For Computer Vision

AI Godmother Fei Fei Li Has a Vision For Computer Vision

0
AI Godmother Fei Fei Li Has a Vision For Computer Vision

Stanford University Professor Fei-Fei Li is already a legend in the world of AI. She played a key role in the deep-learning revolution by working for years on the ImageNet dataset, and the competition that challenged AI systems to identify objects and animals in 1,000 categories. In 2012, an AlexNet neural network sent shockwaves throughout the AI research community after it resoundingly beat all other types and won the ImageNet competition. The neural networks exploded, fueled by the massive amounts of free data available on the Internet, and GPUs with unprecedented computing power.

In 13 years, computer vision researchers have mastered object recognition. They then moved on to image and videos generation. Li cofounded Stanford’s Institute for Human-Centered AI and has continued to push computer vision boundaries. In just this year, she launched World Labs which creates 3D scenes for users to explore. World Labs aims to give AI “spatial Intelligence,” or the ability of generating, reasoning within, and interacting with 3D worlds. Li gave a keynote speech yesterday at NeurIPS – the massive AI conference – about her vision for Machine Vision. She also conducted an exclusive interview with IEEE Spectrum before her talk.

What made you name your talk “Ascending The Ladder Of Visual Intelligence?”?

Fei Fei Li: It’s intuitive to me that intelligence is a complex and sophisticated thing. In my talk, I will convey the impression that the visual intelligence we have developed over the last decade, and especially in the last 10 years, is just astounding. We are becoming more and capable with the technology. I was also inspired to write by Judea Pearl’s “ladder of causation”[inhis2020BookThe Book of Why ].

This talk also has a sub-title, “From Seeing To Doing.” It is something people don’t understand enough: that seeing and doing are closely linked, both for AI agents and for animals. This is a departure in terms of language. Language is a tool for communicating ideas. These are, in my opinion, complementary but equally profound modalities.

Are you saying that we react instinctively to certain sights?

Li : You’re not talking only about instinct. When you examine the evolution of perception, and the evolution animal intelligence, they are deeply, deeply intertwined. The evolutionary force is pushed forward every time we are able to gather more information about our environment. If you can’t sense your environment, you have a very passive relationship with the world. Eating or being eaten is a passive act. As soon as you can take cues from your environment through perception, evolutionary pressure increases, and this drives intelligence forward.

Are you sure that’s the way we create deeper and deeper machine-intelligence? By allowing machines perceive more of the surrounding?

[Idon’tthink”deep”wouldbetheadjectiveIuseIthinkthatwearecreatingmorecapabilitiesIthinkitisbecomingmorecomplexandcapableIbelieveit’truethatsolvingtheproblemofspatialIntelligenceisacriticalandfundamentalsteptowardsfull-scaleintelligent

I’ve seen World Labs demos. Why do you want spatial intelligence research and to build these 3D Worlds?

[IthinkspatialintelligencewillbethefutureofvisualintelligenceIfweareseriousaboutcrackingtheproblemofvisionandalsoconnectingittodoingthere’sanextremelysimplelaid-out-in-the-daylightfact:Theworldis3DWedonotliveinaflat-worldOurphysicalagentswillliveina3DworldwhethertheyarerobotsordevicesEventhevirtualworldbecomesmore3DEvenwhenartistsgamedesignersarchitectsanddoctorsareworkinginvirtualworldstheywilltellyouthatmuchofwhattheydois3DItisimportanttorecognizethissimpleyetprofoundfact

How do the scenes in World Labs maintain object stability and conform to the laws of physics? It’s a big step forward for video-generation tools such as Sora, which still struggle with these things.

Once the 3D-ness is respected, much of this becomes natural. In one of our videos posted on social media we dropped basketballs into a scene. This is possible because it’s 3D. If the scene is only 2D pixels, the ball will not go anywhere.

It might also go somewhere, but then disappear, as in Sora. What are your biggest technical challenges as you work to push this technology forward?

Li : This problem has not been solved, right? It’s very difficult. You can see [in a World Labs demo video] how we have taken an Van Gogh picture and generated the entire scene in a consistent manner: the artistic style of the painting, the lighting, and even what type of buildings would be in that neighborhood. It would be unconvincing if you turned it around and it became skyscrapers. It must be 3D. You must be able to navigate through it. It’s not just pixels.

Could you tell me more about the data that you used to train this program?

Li Quite a lot.

Are you facing any technical challenges in terms of the compute burden?

That’s a lot. It’s the type of computing that the public sector can’t afford. This is why I am excited to do this sabbatical in the private sector. It’s also a part of why I have been advocating public sector access to compute because my experience highlights the importance of innovation and adequate resourcing.

It’d be nice to empower public sector, as it’s more motivated by knowledge for its sake and knowledge that benefits humanity.

The discovery of knowledge needs to be supported with resources, right? It was the best telescope at the time of Galileo that allowed astronomers to observe new celestial objects. Hooke was the first to realize that magnifying glass can be turned into a microscope and discover cells. Knowledge-seeking is aided by new technological tools every time they are introduced. In the age of AI, technology tooling now involves data and computation. We must recognize that in the public sector.

How would you like federal resources to be allocated?

Stanford HAI has been working on this for the last five years. We have worked with Congress, the Senate and the White House as well as other universities, industry, and the White House to create NAIRR. The National AI Research Resource.

What do we gain if we can get AI to understand the 3D environment?

People will be able to unleash a lot more creativity and productivity. I would love to design a house that is more efficient. I know that many medical applications require understanding a 3D world very specific to the human body. We often talk about a world where humans will build robots to assist us. However, robots must be able to navigate in 3D and have spatial intelligence built into their brain. We also discuss virtual worlds, which will allow people to learn concepts, visit places or be entertained. These are 3D technologies, especially hybrids. We call them AR [augmented reality]. I would love to be able to walk through a park with glasses that tell me about the trees, path, and clouds. I would love to be able to learn new skills with the help of spatial Intelligence.

Which skills?

I’ll give you a lame example: what do I do if I have a punctured tire on the highway? I’m watching a video on how to change a flat tire. It would be cool if I could wear glasses to see what was going on with my vehicle and then be guided throughout the process. But that’s an uninteresting example. You can think of cooking, or sculpting. These are fun things.

What do you think will be the progress we make in our lifetimes?

Oh I think it will happen in our lifetime, because the pace of technological progress is really rapid. You’ve seen what the last 10 years brought. It’s a good indication of what is to come.

Read More

NO COMMENTS

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Exit mobile version

Notice: ob_end_flush(): Failed to send buffer of zlib output compression (0) in /home2/mflzrxmy/public_html/website_18d00083/wp-includes/functions.php on line 5464