REST3D starts from one casual image, not a dense scan or multi-view capture.
Scene-tree construction, scene initialization/canonicalization, and physics-constrained optimization.
REST3D reports high stability on Replica, ScanNet++, and Custom scene sets.
REST3D demonstrates hand-based human-object interaction with Meta Quest Pro and Isaac Gym.
REST3D is single-image 3D reconstruction built for physical stability, not just visual plausibility.
REST3D means REconstructing physically STable 3D scenes. The central idea is brutal and simple: a 3D scene that looks correct is not enough if objects float, intersect, explode under gravity, or collapse in a simulator. REST3D uses physical scene understanding and physics-constrained optimization so the reconstructed scene can behave like a usable digital asset.
REST3D searchers want proof, demos, code status, metrics, and a fast explanation of why physical stability matters.
Researchers
They look for the REST3D paper, method diagram, baselines, datasets, metrics, ablations, limitations, BibTeX, and reproducibility notes.
3D / VR / game creators
They care about whether REST3D can convert a casual image into simulation-ready assets, stable layouts, VR demos, and interactive scenes.
Robotics and embodied AI builders
They care about gravity, support relations, collision rate, stable rate, real-to-sim, Isaac Gym, object contact, and reliable manipulation scenes.
REST3D abstract: Reconstructing Physically Stable 3D Scenes from a Single Image
Reconstructing physically stable 3D scenes from a single RGB image enables casual images to be converted into simulation-ready digital assets for applications such as immersive interaction and content creation. However, existing single-image reconstruction methods fall short in capturing the physical structure of a scene. As a result, they often produce geometrically plausible but physically inconsistent results, including object floating and penetration, which lead to unstable behavior in physics simulations. Image-conditioned scene generation methods improve physical plausibility but often rely on strong scene priors, yielding plausible yet inaccurate object arrangements that fail to match the input image. We propose REST3D, a single-image reconstruction framework that can REconstruct physically STable 3D scenes by integrating physical scene understanding with physics-constrained refinement. We first introduce an agentic physical scene understanding technique that constructs a scene-tree representation capturing object physical states and inter-object relationships from a gravity-support perspective, providing a structural prior for reconstruction. Leveraging this structure, we initialize the scene using image-to-3D models, followed by scene-tree-guided alignment and physics-constrained optimization to resolve physical violations while preserving visual consistency with the input image. Experiments show that our method significantly reduces physical errors and improves simulation stability on both synthetic and real-world datasets while maintaining strong reconstruction quality. We further demonstrate the reconstructed scenes in VR-based human-object interaction, showing their potential for immersive applications.
From a single casual image to a visually consistent and physically stable interactive 3D scene.
Single-image 3D reconstruction often fails when gravity asks the obvious question: should this object stand, fall, or explode?
Object floating
Visually plausible reconstructions can place objects above their support surfaces. REST3D targets support consistency.
Object penetration
Objects may overlap in 3D. Under physics, that collision can cause explosive separation and unstable behavior.
Plausible but inaccurate generation
Image-conditioned scene generation can produce a physically plausible scene that does not match the input image. REST3D is designed to preserve visual consistency.
REST3D combines scene-tree construction, scene-tree-guided alignment, and physics-constrained optimization.
Scene-Tree Construction
REST3D infers a hierarchical scene tree that captures objects, physical states, and inter-object spatial support relationships from a gravity-support perspective.
Scene Initialization and Canonicalization
REST3D initializes object meshes using image-to-3D models, then uses the scene tree to correct global orientation and enforce coarse support constraints.
Physics-Constrained Optimization
REST3D refines object poses through simulation-based optimization to reduce floating, penetration, drift, and instability while preserving the input image layout.
REST3D scene-tree construction models gravity-support relationships: ground, wall, ceiling, and ground-wall.
A REST3D scene tree is not a decorative hierarchy. It is the structural prior that says which object supports which object: table on ground, plant on table, poster attached to wall, radiator supported by ground-wall. This is the hidden skeleton that lets REST3D keep visual reconstruction and physical behavior aligned.
REST3D uses agentic physical scene understanding to identify objects, segment instances, and reason about spatial support.
Open-vocabulary object list analysis
REST3D asks a vision-language model to identify distinct objects with descriptive attributes, not just coarse labels.
Agentic instance segmentation
REST3D uses a segmentation agent and verifier loop to refine prompts and masks for each object instance.
Spatial relationship reasoning
REST3D infers support parents and support types from a gravity-aware perspective.
REST3D initializes the 3D scene with image-to-3D models, then canonicalizes the layout so physics has a fighting chance.
REST3D starts with raw image-to-3D output, then uses the scene tree to correct coarse orientation, enforce support, and produce a structured initial scene. Canonicalization alone is not enough; it improves stability but still needs the full REST3D physics-constrained optimization stage.
REST3D physics-constrained optimization resolves physical violations while preserving visual consistency.
Local group optimization
REST3D decomposes complex scenes according to the scene tree and optimizes smaller support groups so crowded scenes can converge more reliably.
Global group optimization
REST3D then refines the whole scene to reduce collision, drift, velocity, and instability under simulated gravity.
REST3D targets simulation-ready digital assets for immersive interaction, content creation, gaming, and embodied AI.
The high-value promise of REST3D is practical conversion: one casual image becomes a 3D scene with object meshes and world-frame layout that can be imported into physics simulation. For users searching REST3D, the phrase simulation-ready digital assets should appear early and repeatedly because that is the real difference from ordinary image-to-3D reconstruction.
REST3D Interactive 3D Physics Simulation in Isaac Gym
Explore the physics simulation of reconstructed scenes in Isaac Gym. Users can rotate by dragging, zoom by scrolling, inspect simulation, press Play, Reset, adjust Speed, and compare synchronized methods.
Controls included
▶ Play · ↻ Reset · Speed · Run Simulation · Click or press Space · Loading scene...
Methods included
Input Image · Ours · DigitalCousins · Gen3DSR · SceneGen · SAM3D.
REST3D highlights why baseline methods can explosively separate when gravity is applied.
Due to object interpenetration in baseline methods, applying gravity in a physics simulator can cause objects to explosively separate and become unstable. REST3D is built around the opposite expectation: reconstructed scenes should quickly settle into stable states.
REST3D results show high-resolution physics simulation of reconstructed scenes in Isaac Gym.
Objects are placed sequentially for clarity and then simulated jointly. REST3D reconstructed scenes are simulation-ready and quickly settle into stable states.
REST3D reconstructs an immersive and physically grounded 3D scene for VR hand-based interaction.
REST3D includes an interactive VR system that reconstructs an immersive, physically grounded 3D scene from a single image, enabling users to naturally interact with stable virtual objects through hand-based interactions. The demo was recorded with Meta Quest Pro and played back at 3× speed.
REST3D compares against DigitalCousins, Gen3DSR, SceneGen, and SAM3D.
The REST3D comparison focuses on physics simulation of reconstructed scenes in Isaac Gym. Existing methods struggle to balance reconstruction fidelity and physical stability, while REST3D produces stable, simulation-ready scenes that settle with only minor adjustments.
REST3D metric snapshot: low collision, high stability, low drift.
| Dataset | Method | Failure Rate | Collision Rate | Stable Rate | Position Drift | Linear Velocity | Angular Velocity |
|---|---|---|---|---|---|---|---|
| Replica | REST3D / Ours | 0.0% | 0.0% | 95.8% | 0.094 m | 0.152 m/s | 0.557 rad/s |
| ScanNet++ | REST3D / Ours | 0.0% | 5.9% | 93.6% | 0.080 m | 0.159 m/s | 1.039 rad/s |
| Custom | REST3D / Ours | 0.0% | 1.2% | 95.5% | 0.017 m | 0.140 m/s | 0.468 rad/s |
REST3D is evaluated on synthetic Replica, real-world ScanNet++, and a challenging Custom set.
Replica
A synthetic dataset with ground-truth scene meshes, used for physical metrics and geometric metrics.
ScanNet++
A real-world dataset covering scenes such as meeting rooms, classrooms, and offices.
Custom casual images
A harder set including bedrooms, living rooms, and cartoon-style scenes to test REST3D robustness.
REST3D reports physical plausibility and geometric reconstruction quality.
Physical metrics
Failure rate, collision rate, stability rate, position drift, peak linear velocity, and peak angular velocity.
Geometric metrics
Chamfer Distance, F-score@0.05, and B-IoU are used when ground-truth meshes exist.
Alignment
Replica and ScanNet++ reconstructions are aligned to ground truth with ICP before geometric evaluation.
REST3D differs from DigitalCousins by emphasizing input-faithful reconstruction plus physics stability.
DigitalCousins-style approaches can improve physical plausibility by retrieving and assembling 3D assets, but retrieval can be constrained by the asset database and may yield mismatched objects. REST3D instead uses image-to-3D priors and physics-constrained refinement to preserve visual consistency while reducing physical errors.
REST3D targets physical stability beyond divide-and-conquer scene reconstruction.
Gen3DSR is a strong single-view 3D scene reconstruction baseline. REST3D compares to Gen3DSR and focuses on the failure mode that matters in simulation: a scene can be reconstructed but still physically unstable under gravity.
REST3D prioritizes physical consistency with the observed image, while scene generation can trade accuracy for plausibility.
SceneGen-style methods synthesize multiple 3D assets and positions from a single scene image. REST3D argues that generation priors can be physically plausible yet inaccurate relative to the input. REST3D is framed as reconstruction: match the image and obey physics.
REST3D pushes beyond object-level reconstruction toward scene-level physical validity.
SAM3D can recover high-fidelity individual objects, but scene-level reconstruction also needs global orientation, wall attachment, support, collision handling, and stable contacts. REST3D explicitly focuses on those scene-level physical constraints.
REST3D use cases cluster around interactive 3D, VR, game content, robotics, and real-to-sim.
Content creation and gaming
REST3D can become a reference point for converting casual images into stable, editable, simulation-ready scenes for immersive production.
Computer vision and graphics research
REST3D gives researchers a focused entry point for physical scene understanding, image-to-3D, scene generation, and evaluation under simulation.
Embodied AI and real-to-sim
REST3D matters to robotics because stable object contacts and plausible support relationships are necessary for manipulation and policy training environments.
REST3D.org should make every high-intent tool one click away.
Paper tools
arXiv, abstract, citation, BibTeX, author links, project page, publication status, and release notes.
Demo tools
Interactive 3D, Play, Reset, Speed, synchronized method comparison, high-resolution videos, and VR demos.
Reproducibility tools
GitHub repository, code status, datasets, baselines, metrics, implementation details, limitations, and future work.
REST3D keyword clusters for headings, internal anchors, meta tags, and long-tail search coverage.
Core keyword
REST3D, REST3D.org, REST3D paper, REST3D arXiv, REST3D code, REST3D GitHub, REST3D demo, REST3D citation.
Long-tail keywords
REST3D reconstructing physically stable 3D scenes from a single image; REST3D single RGB image to simulation-ready 3D assets; REST3D physics-constrained optimization; REST3D scene-tree construction.
Surrounding keywords
single image 3D reconstruction, physically plausible 3D scene, Isaac Gym 3D simulation, VR human-object interaction, DigitalCousins, Gen3DSR, SceneGen, SAM3D, object penetration, object floating.
REST3D.org uses a dark lab-grade palette with cyan, violet, and green accents for AI, 3D, simulation, and VR audiences.
The REST3D audience expects a technical research interface, not a lifestyle landing page. This design uses deep navy for spatial depth, cyan for reconstruction and links, violet for generative AI cues, and green for stable physics. The page also includes a light theme toggle and high-contrast text for readability.
#07111f background for 3D depth.
#43e6ff for REST3D links and technical highlights.
#a98bff for AI and scene generation accents.
#83f7bd for stable, settled, simulation-ready cues.
REST3D is strong, but not magic: VLM robustness and deformable objects remain future-work territory.
REST3D relies on the robustness of vision-language models for physical scene understanding and may fail in challenging cases. The current REST3D paper focuses on rigid objects and does not explicitly model deformable or non-rigid objects, leaving those cases for future work.
REST3D code repository exists, but the public README currently says code coming soon.
Cite REST3D: Reconstructing Physically Stable 3D Scenes from a Single Image
@article{ma2026rest3d,
title = {REST3D: Reconstructing Physically Stable 3D Scenes from a Single Image},
author = {Ma, Xiaoxuan and Wang, Jiashun and Ugrinovic, Nicol'{a}s and Litman, Yehonathan and Kitani, Kris},
booktitle = {arXiv preprint arXiv:2605.30338},
year = {2026}
}
REST3D acknowledgement
The authors would like to thank Yuxuan Kuang, Yufei Wang, and Maxwell Jones for their insightful discussions.
REST3D FAQ for searchers, researchers, creators, and builders.
What is REST3D?
REST3D is a single-image reconstruction framework that reconstructs physically stable 3D scenes by integrating physical scene understanding with physics-constrained refinement.
What does REST3D stand for?
REST3D expands as REconstructing physically STable 3D scenes.
What is the main REST3D difference from ordinary image-to-3D?
REST3D focuses on scene-level physical plausibility: support relations, collision reduction, stability under gravity, and simulation-ready behavior.
Is REST3D code available?
The REST3D GitHub repository exists, but the current public README says Code coming soon. This site links to the repository without claiming a released implementation.
What are the REST3D baselines?
REST3D compares against DigitalCousins, Gen3DSR, SceneGen, and SAM3D.
What are the REST3D datasets?
REST3D evaluates on Replica, ScanNet++, and a Custom set with casual images including bedrooms, living rooms, and cartoon-style scenes.
What are the REST3D applications?
REST3D is relevant to immersive interaction, content creation, gaming, simulation-ready assets, robotics, embodied AI, real-to-sim, and VR human-object interaction.
REST3D glossary: terms users search after they understand the headline.
Scene tree
A support-relation structure that represents which objects are on, hanging from, or attached to other objects or surfaces.
Physics-constrained optimization
Simulation-based refinement that moves object poses toward stable, low-collision configurations while preserving the input layout.
Stable rate
A physical plausibility metric indicating whether reconstructed scenes settle into stable states under simulation.
REST3D primary links
arXiv
Project page
Official REST3D project page with interactive 3D, results, VR demos, comparison, citation, and acknowledgements.
Code repository
REST3D GitHub repository. Current public status: code coming soon.