REST3D: Single Image to Simulation-Ready 3D Scenes

1

single RGB image

REST3D starts from one casual image, not a dense scan or multi-view capture.

3

pipeline stages

Scene-tree construction, scene initialization/canonicalization, and physics-constrained optimization.

95%+

reported stable rates

REST3D reports high stability on Replica, ScanNet++, and Custom scene sets.

VR

real interaction demo

REST3D demonstrates hand-based human-object interaction with Meta Quest Pro and Isaac Gym.

What is REST3D?

REST3D is single-image 3D reconstruction built for physical stability, not just visual plausibility.

REST3D means REconstructing physically STable 3D scenes. The central idea is brutal and simple: a 3D scene that looks correct is not enough if objects float, intersect, explode under gravity, or collapse in a simulator. REST3D uses physical scene understanding and physics-constrained optimization so the reconstructed scene can behave like a usable digital asset.

REST3D abstract REST3D pipeline REST3D SOTA comparison REST3D citation

Audience

REST3D searchers want proof, demos, code status, metrics, and a fast explanation of why physical stability matters.

Researchers

They look for the REST3D paper, method diagram, baselines, datasets, metrics, ablations, limitations, BibTeX, and reproducibility notes.

3D / VR / game creators

They care about whether REST3D can convert a casual image into simulation-ready assets, stable layouts, VR demos, and interactive scenes.

Robotics and embodied AI builders

They care about gravity, support relations, collision rate, stable rate, real-to-sim, Isaac Gym, object contact, and reliable manipulation scenes.

Important disambiguation

REST3D is a CMU computer-vision research framework, not a sleep supplement, mobile game, or generic 3D tool.

Searches for "REST3D" pull up several unrelated products and projects. To help visitors and search engines, we explicitly list what REST3D is NOT:

Not a sleep supplement

Anabolix Nutrition sells a product called REST3D ($69.95 sleep aid). This site has no affiliation with Anabolix Nutrition. REST3D refers to REconstructing Physically STable 3D scenes — an arXiv computer-vision paper by CMU authors.

Not a Unity game

Unity Play hosts a game named REST3D. This is unrelated to the REST3D research paper. REST3D.org focuses on single-image 3D scene reconstruction for simulation-ready assets.

Not ryanj/rest3d (GitHub)

A different GitHub repository called rest3d by ryanj provides a 3D client/server (MIT license), whose README references www.rest3d.org. This is a separate project. The official REST3D code repository is ShirleyMaxx/REST3D for the CMU paper on physically stable 3D scene reconstruction.

Not rest3d.wordpress.com

A WordPress blog ("3D for the REST of us") occupies rest3d.wordpress.com. REST3D.org is an independent research resource for the REST3D paper specifically.

This disambiguation helps Google understand that REST3D.org and REST3D (the sleep supplement / Unity game / other GitHub repo / WordPress blog) are distinct entities with different search intents.

Abstract

REST3D abstract: Reconstructing Physically Stable 3D Scenes from a Single Image

Reconstructing physically stable 3D scenes from a single RGB image enables casual images to be converted into simulation-ready digital assets for applications such as immersive interaction and content creation. However, existing single-image reconstruction methods fall short in capturing the physical structure of a scene. As a result, they often produce geometrically plausible but physically inconsistent results, including object floating and penetration, which lead to unstable behavior in physics simulations. Image-conditioned scene generation methods improve physical plausibility but often rely on strong scene priors, yielding plausible yet inaccurate object arrangements that fail to match the input image. We propose REST3D, a single-image reconstruction framework that can REconstruct physically STable 3D scenes by integrating physical scene understanding with physics-constrained refinement. We first introduce an agentic physical scene understanding technique that constructs a scene-tree representation capturing object physical states and inter-object relationships from a gravity-support perspective, providing a structural prior for reconstruction. Leveraging this structure, we initialize the scene using image-to-3D models, followed by scene-tree-guided alignment and physics-constrained optimization to resolve physical violations while preserving visual consistency with the input image. Experiments show that our method significantly reduces physical errors and improves simulation stability on both synthetic and real-world datasets while maintaining strong reconstruction quality. We further demonstrate the reconstructed scenes in VR-based human-object interaction, showing their potential for immersive applications.

TL;DR

From a single casual image to a visually consistent and physically stable interactive 3D scene.

This is the REST3D promise in one sentence: the scene should not merely look plausible; it should settle, support objects, avoid severe interpenetration, and survive physics simulation.

The problem

Single-image 3D reconstruction often fails when gravity asks the obvious question: should this object stand, fall, or explode?

Object floating

Visually plausible reconstructions can place objects above their support surfaces. REST3D targets support consistency.

Object penetration

Objects may overlap in 3D. Under physics, that collision can cause explosive separation and unstable behavior.

Plausible but inaccurate generation

Image-conditioned scene generation can produce a physically plausible scene that does not match the input image. REST3D is designed to preserve visual consistency.

Pipeline

REST3D combines scene-tree construction, scene-tree-guided alignment, and physics-constrained optimization.

Scene-Tree Construction

REST3D infers a hierarchical scene tree that captures objects, physical states, and inter-object spatial support relationships from a gravity-support perspective.

Scene Initialization and Canonicalization

REST3D initializes object meshes using image-to-3D models, then uses the scene tree to correct global orientation and enforce coarse support constraints.

Physics-Constrained Optimization

REST3D refines object poses through simulation-based optimization to reduce floating, penetration, drift, and instability while preserving the input image layout.

Figure: REST3D pipeline — Stage 1 constructs a scene tree, Stage 2 initializes meshes via image-to-3D and canonicalizes, Stage 3 runs physics-constrained optimization in Isaac Gym.

Scene tree

REST3D scene-tree construction models gravity-support relationships: ground, wall, ceiling, and ground-wall.

A REST3D scene tree is not a decorative hierarchy. It is the structural prior that says which object supports which object: table on ground, plant on table, poster attached to wall, radiator supported by ground-wall. This is the hidden skeleton that lets REST3D keep visual reconstruction and physical behavior aligned.

Figure: REST3D scene tree — hierarchical support relations from a gravity-support perspective. Each node has a parent surface and a support type (on, attached-to, hanging, wall, ceiling, ground-wall).

Physical scene understanding

REST3D uses agentic physical scene understanding to identify objects, segment instances, and reason about spatial support.

Open-vocabulary object list analysis

REST3D asks a vision-language model to identify distinct objects with descriptive attributes, not just coarse labels.

Agentic instance segmentation

REST3D uses a segmentation agent and verifier loop to refine prompts and masks for each object instance.

Spatial relationship reasoning

REST3D infers support parents and support types from a gravity-aware perspective.

Initialization & canonicalization

REST3D initializes the 3D scene with image-to-3D models, then canonicalizes the layout so physics has a fighting chance.

REST3D starts with raw image-to-3D output, then uses the scene tree to correct coarse orientation, enforce support, and produce a structured initial scene. Canonicalization alone is not enough; it improves stability but still needs the full REST3D physics-constrained optimization stage.

Physics optimization

REST3D physics-constrained optimization resolves physical violations while preserving visual consistency.

Local group optimization

REST3D decomposes complex scenes according to the scene tree and optimizes smaller support groups so crowded scenes can converge more reliably.

Global group optimization

REST3D then refines the whole scene to reduce collision, drift, velocity, and instability under simulated gravity.

Figure: Before (left) — objects float and collide causing instability. After REST3D optimization (right) — physical violations resolved, scene is simulation-ready.

Simulation-ready assets

REST3D targets simulation-ready digital assets for immersive interaction, content creation, gaming, and embodied AI.

The high-value promise of REST3D is practical conversion: one casual image becomes a 3D scene with object meshes and world-frame layout that can be imported into physics simulation. For users searching REST3D, the phrase simulation-ready digital assets should appear early and repeatedly because that is the real difference from ordinary image-to-3D reconstruction.

Interactive simulation

REST3D Interactive 3D Physics Simulation in Isaac Gym

Explore the physics simulation of reconstructed scenes in Isaac Gym. Users can rotate by dragging, zoom by scrolling, inspect simulation, press Play, Reset, adjust Speed, and compare synchronized methods.

Controls included

▶ Play · ↻ Reset · Speed · Run Simulation · Click or press Space · Loading scene...

Methods included

Input Image · Ours · DigitalCousins · Gen3DSR · SceneGen · SAM3D.

Interactive 3D physics simulation in Isaac Gym — click any scene to view its simulation video. Use Play, Reset, Speed controls on the official project page.

Baseline failures

REST3D highlights why baseline methods can explosively separate when gravity is applied.

Due to object interpenetration in baseline methods, applying gravity in a physics simulator can cause objects to explosively separate and become unstable. REST3D is built around the opposite expectation: reconstructed scenes should quickly settle into stable states.

Results

REST3D results show high-resolution physics simulation of reconstructed scenes in Isaac Gym.

Objects are placed sequentially for clarity and then simulated jointly. REST3D reconstructed scenes are simulation-ready and quickly settle into stable states.

Figure: REST3D reconstructed scenes in Isaac Gym — objects settle stably under gravity. Click to view full video on the official project page.

VR interaction

REST3D reconstructs an immersive and physically grounded 3D scene for VR hand-based interaction.

REST3D includes an interactive VR system that reconstructs an immersive, physically grounded 3D scene from a single image, enabling users to naturally interact with stable virtual objects through hand-based interactions. The demo was recorded with Meta Quest Pro and played back at 3× speed.

In the paper, hand motions are tracked and mapped to a dexterous robotic hand in Isaac Gym, with the simulation rendered back to a VR headset.

Figure: REST3D VR interaction pipeline — real-time hand tracking on Meta Quest Pro maps to a dexterous robotic hand in Isaac Gym, with simulation rendered back to VR.

Simpson Room (VR, 3x speed)

Room 1 (VR, 3x speed)

Room 4 (VR, 3x speed)

VR Demos: Hand-based human-object interaction recorded with Meta Quest Pro (played back at 3x speed). Click to view full videos.

SOTA comparison

REST3D compares against DigitalCousins, Gen3DSR, SceneGen, and SAM3D.

The REST3D comparison focuses on physics simulation of reconstructed scenes in Isaac Gym. Existing methods struggle to balance reconstruction fidelity and physical stability, while REST3D produces stable, simulation-ready scenes that settle with only minor adjustments.

SOTA comparison on Simpson Room — click each thumbnail to view the physics simulation video.

Figure: REST3D vs SOTA methods — baselines struggle with retrieval mismatch (DigitalCousins), visual instability (Gen3DSR), generation inaccuracy (SceneGen), and object-level-only reconstruction (SAM3D).

Physical metrics

REST3D metric snapshot: low collision, high stability, low drift.

Dataset	Method	Failure Rate	Collision Rate	Stable Rate	Position Drift	Linear Velocity	Angular Velocity
Replica	REST3D / Ours	0.0%	0.0%	95.8%	0.094 m	0.152 m/s	0.557 rad/s
ScanNet++	REST3D / Ours	0.0%	5.9%	93.6%	0.080 m	0.159 m/s	1.039 rad/s
Custom	REST3D / Ours	0.0%	1.2%	95.5%	0.017 m	0.140 m/s	0.468 rad/s

Figure: Physical stability metrics — REST3D achieves 95%+ stable rates across Replica, ScanNet++, and Custom datasets with 0% failure rate and negligible collision rates.

\n

Evaluation

REST3D reports physical plausibility and geometric reconstruction quality.

Physical metrics

Failure rate, collision rate, stability rate, position drift, peak linear velocity, and peak angular velocity.

Geometric metrics

Chamfer Distance, F-score@0.05, and B-IoU are used when ground-truth meshes exist.

Alignment

Replica and ScanNet++ reconstructions are aligned to ground truth with ICP before geometric evaluation.

vs DigitalCousins

REST3D differs from DigitalCousins by emphasizing input-faithful reconstruction plus physics stability.

DigitalCousins-style approaches can improve physical plausibility by retrieving and assembling 3D assets, but retrieval can be constrained by the asset database and may yield mismatched objects. REST3D instead uses image-to-3D priors and physics-constrained refinement to preserve visual consistency while reducing physical errors.

vs Gen3DSR

REST3D targets physical stability beyond divide-and-conquer scene reconstruction.

Gen3DSR is a strong single-view 3D scene reconstruction baseline. REST3D compares to Gen3DSR and focuses on the failure mode that matters in simulation: a scene can be reconstructed but still physically unstable under gravity.

vs SceneGen

REST3D prioritizes physical consistency with the observed image, while scene generation can trade accuracy for plausibility.

SceneGen-style methods synthesize multiple 3D assets and positions from a single scene image. REST3D argues that generation priors can be physically plausible yet inaccurate relative to the input. REST3D is framed as reconstruction: match the image and obey physics.

vs SAM3D

REST3D pushes beyond object-level reconstruction toward scene-level physical validity.

SAM3D can recover high-fidelity individual objects, but scene-level reconstruction also needs global orientation, wall attachment, support, collision handling, and stable contacts. REST3D explicitly focuses on those scene-level physical constraints.

Use cases

REST3D use cases cluster around interactive 3D, VR, game content, robotics, and real-to-sim.

Content creation and gaming

REST3D can become a reference point for converting casual images into stable, editable, simulation-ready scenes for immersive production.

Tools

REST3D.org should make every high-intent tool one click away.

Paper tools

arXiv, abstract, citation, BibTeX, author links, project page, publication status, and release notes.

Demo tools

Interactive 3D, Play, Reset, Speed, synchronized method comparison, high-resolution videos, and VR demos.

Reproducibility tools

GitHub repository, code status, datasets, baselines, metrics, implementation details, limitations, and future work.

Keyword map

REST3D keyword clusters for headings, internal anchors, meta tags, and long-tail search coverage.

Core keyword

REST3D, REST3D.org, REST3D paper, REST3D arXiv, REST3D code, REST3D GitHub, REST3D demo, REST3D citation.

Long-tail keywords

REST3D reconstructing physically stable 3D scenes from a single image; REST3D single RGB image to simulation-ready 3D assets; REST3D physics-constrained optimization; REST3D scene-tree construction.

Surrounding keywords

single image 3D reconstruction, physically plausible 3D scene, Isaac Gym 3D simulation, VR human-object interaction, DigitalCousins, Gen3DSR, SceneGen, SAM3D, object penetration, object floating.

Visual system

REST3D.org uses a Material Design 3 token-driven palette with M3 baseline colors, WCAG AAA contrast, and surface container hierarchy for spatial depth.

The REST3D audience expects a technical research interface, not a lifestyle landing page. This design uses Google Material Design 3 tokens (--md-sys-color-*) for all colors, M3 surface container hierarchy for depth, primary for interactive elements, tertiary for AI cues, and error/success/warning for semantic status. The page includes light/dark theme toggle, forced-colors support, and high-contrast text conforming to WCAG AAA standards for vision accessibility.

Primary

#D0BCFF (dark) / #6750A4 (light) for REST3D interactive highlights, links, and CTAs.

Surface hierarchy

M3 container-lowest, low, container, high, highest — luminance stepping for depth.

Tertiary

#EFB8C8 (dark) / #7D5260 (light) for AI and generation accents.

Semantic status

Success #81C784, Warning #FFD54F, Error #F2B8B5 — WCAG AAA compliant.

Limitations

REST3D is strong, but not magic: VLM robustness and deformable objects remain future-work territory.

REST3D relies on the robustness of vision-language models for physical scene understanding and may fail in challenging cases. The current REST3D paper focuses on rigid objects and does not explicitly model deformable or non-rigid objects, leaving those cases for future work.

Code & reproducibility

REST3D code repository exists, but the public README currently says code coming soon.

Do not oversell the implementation. Link to GitHub, invite users to star/watch the repository, and clearly say that the code release should be checked there.

Open REST3D GitHub Open official project page

Citation

Cite REST3D: Reconstructing Physically Stable 3D Scenes from a Single Image

@article{ma2026rest3d,
  title     = {REST3D: Reconstructing Physically Stable 3D Scenes from a Single Image},
  author    = {Ma, Xiaoxuan and Wang, Jiashun and Ugrinovic, Nicol'{a}s and Litman, Yehonathan and Kitani, Kris},
  booktitle = {arXiv preprint arXiv:2605.30338},
  year      = {2026}
}

Acknowledgement

REST3D acknowledgement

The authors would like to thank Yuxuan Kuang, Yufei Wang, and Maxwell Jones for their insightful discussions.

FAQ

REST3D FAQ for searchers, researchers, creators, and builders.

What is REST3D?

REST3D is a single-image reconstruction framework that reconstructs physically stable 3D scenes by integrating physical scene understanding with physics-constrained refinement.

What does REST3D stand for?

REST3D expands as REconstructing physically STable 3D scenes.

What is the main REST3D difference from ordinary image-to-3D?

REST3D focuses on scene-level physical plausibility: support relations, collision reduction, stability under gravity, and simulation-ready behavior.

Is REST3D code available?

The REST3D GitHub repository exists, but the current public README says Code coming soon. This site links to the repository without claiming a released implementation.

What are the REST3D baselines?

REST3D compares against DigitalCousins, Gen3DSR, SceneGen, and SAM3D.

What are the REST3D datasets?

REST3D evaluates on Replica, ScanNet++, and a Custom set with casual images including bedrooms, living rooms, and cartoon-style scenes.

What are the REST3D applications?

REST3D is relevant to immersive interaction, content creation, gaming, simulation-ready assets, robotics, embodied AI, real-to-sim, and VR human-object interaction.

Glossary

REST3D glossary: terms users search after they understand the headline.

Scene tree

A support-relation structure that represents which objects are on, hanging from, or attached to other objects or surfaces.

Physics-constrained optimization

Simulation-based refinement that moves object poses toward stable, low-collision configurations while preserving the input layout.

Stable rate

A physical plausibility metric indicating whether reconstructed scenes settle into stable states under simulation.

Sources

REST3D primary links

arXiv

REST3D arXiv page and REST3D HTML paper.

Project page

Official REST3D project page with interactive 3D, results, VR demos, comparison, citation, and acknowledgements.

Code repository

REST3D GitHub repository. Current public status: code coming soon.

Keep exploring

REST3D related research and recommended resources

After understanding REST3D, explore related work in single-image 3D reconstruction, physics simulation, scene understanding, and interactive content creation.

3D Scene Reconstruction

Papers on single-view and multi-view 3D scene reconstruction including DigitalCousins, Gen3DSR, SceneGen, and SAM3D — all compared against REST3D in the paper.

Physics Simulation

NVIDIA Isaac Gym, MuJoCo, PyBullet — simulation environments used for physics-constrained optimization and stability evaluation in reconstruction pipelines.

Scene Understanding

Vision-language models (VLM), SAM (Segment Anything), and agentic frameworks for physical scene understanding, object detection, and spatial relationship reasoning.

Search arXiv: physics-stable 3D reconstruction Track REST3D GitHub

About & contact

About REST3D.org

REST3D.org is an independent research resource maintained to help researchers, creators, and builders understand and use REST3D: Reconstructing Physically Stable 3D Scenes from a Single Image. This site links to the public paper, project page, and GitHub repository and preserves author attribution.

Authors

REST3D was developed by Xiaoxuan Ma, Jiashun Wang, Nicolás Ugrinovic, Yehonathan Litman, and Kris Kitani at Carnegie Mellon University.

For research questions about the REST3D paper, contact the authors via the official channels listed on the project page.

Contact this site

For issues with REST3D.org (broken links, content suggestions, SEO feedback), reach out via contact@rest3d.org or open an issue on the official GitHub repository.

REST3D.org is not affiliated with Anabolix Nutrition (REST3D sleep supplement), the Unity REST3D game, ryanj/rest3d, or rest3d.wordpress.com. This site specifically covers the CMU computer vision paper arXiv:2605.30338.

REST3D: Reconstructing Physically Stable 3D Scenes from a Single Image

REST3D is single-image 3D reconstruction built for physical stability, not just visual plausibility.

REST3D searchers want proof, demos, code status, metrics, and a fast explanation of why physical stability matters.

Researchers

3D / VR / game creators

Robotics and embodied AI builders

REST3D is a CMU computer-vision research framework, not a sleep supplement, mobile game, or generic 3D tool.

Not a sleep supplement

Not a Unity game

Not ryanj/rest3d (GitHub)

Not rest3d.wordpress.com

REST3D abstract: Reconstructing Physically Stable 3D Scenes from a Single Image

From a single casual image to a visually consistent and physically stable interactive 3D scene.

Single-image 3D reconstruction often fails when gravity asks the obvious question: should this object stand, fall, or explode?

Object floating

Object penetration

Plausible but inaccurate generation

REST3D combines scene-tree construction, scene-tree-guided alignment, and physics-constrained optimization.

Scene-Tree Construction

Scene Initialization and Canonicalization

Physics-Constrained Optimization

REST3D scene-tree construction models gravity-support relationships: ground, wall, ceiling, and ground-wall.

REST3D uses agentic physical scene understanding to identify objects, segment instances, and reason about spatial support.

Open-vocabulary object list analysis

Agentic instance segmentation

Spatial relationship reasoning

REST3D initializes the 3D scene with image-to-3D models, then canonicalizes the layout so physics has a fighting chance.

REST3D physics-constrained optimization resolves physical violations while preserving visual consistency.

Local group optimization

Global group optimization

REST3D targets simulation-ready digital assets for immersive interaction, content creation, gaming, and embodied AI.

REST3D Interactive 3D Physics Simulation in Isaac Gym

Controls included

Methods included

REST3D highlights why baseline methods can explosively separate when gravity is applied.

REST3D results show high-resolution physics simulation of reconstructed scenes in Isaac Gym.

REST3D reconstructs an immersive and physically grounded 3D scene for VR hand-based interaction.

REST3D compares against DigitalCousins, Gen3DSR, SceneGen, and SAM3D.

REST3D metric snapshot: low collision, high stability, low drift.

REST3D is evaluated on synthetic Replica, real-world ScanNet++, and a challenging Custom set.

Replica

ScanNet++

Custom casual images

REST3D reports physical plausibility and geometric reconstruction quality.

Physical metrics

Geometric metrics

Alignment

REST3D differs from DigitalCousins by emphasizing input-faithful reconstruction plus physics stability.

REST3D targets physical stability beyond divide-and-conquer scene reconstruction.

REST3D prioritizes physical consistency with the observed image, while scene generation can trade accuracy for plausibility.

REST3D pushes beyond object-level reconstruction toward scene-level physical validity.

REST3D use cases cluster around interactive 3D, VR, game content, robotics, and real-to-sim.

Content creation and gaming

Computer vision and graphics research

Embodied AI and real-to-sim

REST3D.org should make every high-intent tool one click away.

Paper tools

Demo tools

Reproducibility tools

REST3D keyword clusters for headings, internal anchors, meta tags, and long-tail search coverage.

Core keyword

Long-tail keywords

Surrounding keywords

REST3D.org uses a Material Design 3 token-driven palette with M3 baseline colors, WCAG AAA contrast, and surface container hierarchy for spatial depth.

REST3D is strong, but not magic: VLM robustness and deformable objects remain future-work territory.

REST3D code repository exists, but the public README currently says code coming soon.

Cite REST3D: Reconstructing Physically Stable 3D Scenes from a Single Image

REST3D acknowledgement

REST3D FAQ for searchers, researchers, creators, and builders.

REST3D glossary: terms users search after they understand the headline.

Scene tree

Physics-constrained optimization

Stable rate

REST3D primary links

arXiv

Project page

Code repository

REST3D related research and recommended resources

3D Scene Reconstruction

Physics Simulation