SceneMaker

SceneMaker: Open-set 3D Scene Generation with Decoupled De-occlusion and Pose Estimation Model

Yukai Shi^1,3, Weiyu Li^2,4, Zihao Wang⁴, Hongyang Li³, Xingyu Chen³, Ping Tan^2,4, Lei Zhang³.

¹ Tsinghua University ² HKUST ³ IDEA Research ⁴ LightIllusions

Paper

Datasets

Code

Abstract

We propose a decoupled 3D scene generation framework called SceneMaker in this work. Due to the lack of sufficient open-set de-occlusion and pose estimation priors, existing methods struggle to simultaneously produce high-quality geometry and accurate poses under severe occlusion and open-set settings. To address these issues, we first decouple the de-occlusion model from 3D object generation, and enhance it by leveraging image datasets and collected de-occlusion datasets for much more diverse open-set occlusion patterns. Then, we propose a unified pose estimation model that integrates global and local mechanisms for both self-attention and cross-attention to improve accuracy. Besides, we construct an open-set 3D scene dataset to further extend the generalization of the pose estimation model. Comprehensive experiments demonstrate the superiority of our decoupled framework on both indoor and open-set scenes. Our codes and datasets will be released.

Scene Image

Normal Map

Generated 3D Scene

Scene Image

Normal Map

Generated 3D Scene

Framework

Our framework consists of scene perception, 3D object generation under occlusion, and pose estimation. We decouple the de-occlusion model from 3D object generation. We construct a unified pose estimation model that incorporates both global and local attention mechanisms.

Qualitative Comparison of Object Generation under Occlusion

We decouple and develop a robust de-occlusion model by leveraging image datasets for open-set occlusion prior. Our model achieves
higher quality and more text-controllable results under severe occlusion and open-set conditions.

Qualitative Comparison on Scene Generation

Scene Image

MIDI3D

PartCrafter

SceneMaker Normal Map

SceneMaker Results

Citation


      @article{shi2025scenemaker,
        title={SceneMaker: Open-set 3D Scene Generation with Decoupled De-occlusion and Pose Estimation Model},
        author={Shi, Yukai and Li, Weiyu and Wang, Zihao and Li, Hongyang and Chen, Xingyu and Tan, Ping and Zhang, Lei},
        journal={arXiv preprint arXiv:2512.10957},
        year={2025}
      }