SceneMaker Logo
SceneMaker: Open-set 3D Scene Generation with Decoupled De-occlusion and Pose Estimation Model

Yukai Shi1,3, Weiyu Li2,4, Zihao Wang4, Hongyang Li3, Xingyu Chen3, Ping Tan2,4, Lei Zhang3.

1 Tsinghua University    2 HKUST    3 IDEA Research    4 LightIllusions

Institution     Institution     Institution     Institution

arXiv Paper
datasets Datasets(Uploading)
github Code

Abstract

We propose a decoupled 3D scene generation framework called SceneMaker in this work. Due to the lack of sufficient open-set de-occlusion and pose estimation priors, existing methods struggle to simultaneously produce high-quality geometry and accurate poses under severe occlusion and open-set settings. To address these issues, we first decouple the de-occlusion model from 3D object generation, and enhance it by leveraging image datasets and collected de-occlusion datasets for much more diverse open-set occlusion patterns. Then, we propose a unified pose estimation model that integrates global and local mechanisms for both self-attention and cross-attention to improve accuracy. Besides, we construct an open-set 3D scene dataset to further extend the generalization of the pose estimation model. Comprehensive experiments demonstrate the superiority of our decoupled framework on both indoor and open-set scenes. Our codes and datasets will be released.

Scene Image
Scene Image
Scene Image
Scene Image
Scene Image
Scene Image
Scene Image
Normal Map
Generated 3D Scene
Scene Image
Normal Map
Generated 3D Scene

Framework

Our framework consists of scene perception, 3D object generation under occlusion, and pose estimation. We decouple the de-occlusion model from 3D object generation. We construct a unified pose estimation model that incorporates both global and local attention mechanisms.


Qualitative Comparison of Object Generation under Occlusion

We decouple and develop a robust de-occlusion model by leveraging image datasets for open-set occlusion prior. Our model achieves
higher quality and more text-controllable results under severe occlusion and open-set conditions.


Qualitative Comparison on Scene Generation

Scene Image
Scene Image
Scene Image
Scene Image
Scene Image
Scene Image
Scene Image
MIDI3D
PartCrafter
SceneMaker Normal Map
SceneMaker Results

Citation