New Apple AI model creates 3D scenes using just three images

Apple’s Machine Learning team has announced a 3D AI model that is interesting. The model was developed in collaboration with researchers at Nanjing University and The Hong Kong University of Science and Technology. Matrix3D.

The so-called Large Photogrammetry Model can reconstruct 3D scenes and objects from a few 2D images, but it is different from the current pipelines. Here’s why it’s a big deal.

Let’s start with photogrammetry. It uses photos to create 3D maps or models. This process currently involves using different models to perform steps such as pose estimation and depth predictions, which can lead inefficiency and errors. Matrix3D simplifies the process by doing everything in one go. It processes depth data, camera parameters, such as angle, focal length, and images using a unified architectural approach. This simplifies the workflow and improves accuracy.

Researchers used a masked-learning strategy, which is very similar to the early Transformer-based AI system that helped pave way for the initial versions of ChatGPT.

They randomly hidden parts of the input data, forcing Matrix3D learn how to fill the gaps. This technique is crucial because it allows Matrix3D train effectively with smaller or incomplete datasets.

Results are impressive. Matrix3D generates 3D reconstructions from just three images. This could be very useful for immersive headsets such as the Apple Vision Pro.

Researchers made the Matrix3D source code available on GitHuband published their paper on ArXiv. They also created a website where you can view more sample videos, and even interact with some point cloud recreations. Add 9to5Mac’s Google News feed.FTC: we use auto affiliate links that earn income. More.

New Apple AI model creates 3D scenes using just three images

Offline Video-LLMs Can Now Understand Real-Time Streams: Apple Researchers Introduce StreamBridge...

A Step-by-Step Guide on Building, Customizing, and Publishing an AI-Focused Blogging...

Multimodal AI Needs More Than Modality Support: Researchers Propose General-Level and...

OpenAI Releases HealthBench: An Open-Source Benchmark for Measuring the Performance and...

Recomended

Offline Video-LLMs Can Now Understand Real-Time Streams: Apple Researchers Introduce StreamBridge to Enable Multi-Turn and Proactive Video Understanding

A Step-by-Step Guide on Building, Customizing, and Publishing an AI-Focused Blogging Website with Lovable.dev and Seamless GitHub Integration

Multimodal AI Needs More Than Modality Support: Researchers Propose General-Level and General-Bench to Evaluate True Synergy in Generalist Models

OpenAI Releases HealthBench: An Open-Source Benchmark for Measuring the Performance and Safety of Large Language Models in Healthcare

RL^V: Unifying Reasoning and Verification in Language Models through Value-Free Reinforcement Learning

Implementing an LLM Agent with Tool Access Using MCP-Use