DreamWaltz: Make a Scene with Complex 3D Animatable Avatars

We present DreamWaltz, a novel framework for generating and animating complex 3D avatars given text guidance and parametric human body prior. While recent methods have shown encouraging results for text-to-3D generation of common objects, creating high-quality and animatable 3D avatars remains challenging. To create high-quality 3D avatars, DreamWaltz proposes 3D-consistent occlusion-aware Score Distillation Sampling (SDS) to optimize implicit neural representations with canonical poses. It provides view-aligned supervision via 3D-aware skeleton conditioning which enables complex avatar generation without artifacts and multiple faces. For animation, our method learns an animatable 3D avatar representation from abundant image priors of diffusion model conditioned on various poses, which could animate complex non-rigged avatars given arbitrary poses without retraining. Extensive evaluations demonstrate that DreamWaltz is an effective and robust approach for creating 3D avatars that can take on complex shapes and appearances as well as novel poses for animation. The proposed framework further enables the creation of complex scenes with diverse compositions, including avatar-avatar, avatar-object and avatar-scene interactions.

🚀 2024-10-15: We present DreamWaltz-G! An 3DGS version of DreamWaltz.
🚀 2023-11-12: Results of high-resolution 3D avatars are provided!
🔥 2023-10-11: Training and inference codes are released!
📢 2023-09-21: Accepted by NeurIPS 2023!

Unlike the original DreamWaltz which only uses 64x64 resolution Latent-NeRF as 3D avatar representation, we upgrade it to 512x512 resolution RGB-NeRF, which significantly improves the visual quality.

Moreover, we adopt the advanced ControlNet-openpose-v1.1 and SMPL-X for DreamWaltz.

For customized avatar creation (e.g. the bottom right Bocchi example), LoRA weights from Civitai are supported.

Textured meshes can be extracted from our generated avatars using Marching Cubes algorithm.

The following results are mesh animations by Mixamo, complementary to our native NeRF animation method.

Given only a textual description, DreamWaltz can generate the corresponding canonical 3D avatar in an hour, without the limitation of multiple faces or skin-tight appearances.

Given motion sequences, DreamWaltz can animate 3D avatars and produce 3D-aware videos without re-training.

DreamWaltz is able to make scenes with animatable avatars and diverse interactions, including: avatar-object, avatar-scene, and avatar-avatar interactions.

BibTeX

@article{huang2023dreamwaltz,
    title={DreamWaltz: Make a Scene with Complex 3D Animatable Avatars},
    author={Yukun Huang and Jianan Wang and Ailing Zeng and He Cao and Xianbiao Qi and Yukai Shi and Zheng-Jun Zha and Lei Zhang},
    year={2023},
    eprint={2305.12529},
    archivePrefix={arXiv},
    primaryClass={cs.CV},
  }
  
  @article{huang2023dreamtime,
    title={DreamTime: An Improved Optimization Strategy for Text-to-3D Content Creation},
    author={Yukun Huang and Jianan Wang and Yukai Shi and Xianbiao Qi and Zheng-Jun Zha and Lei Zhang},
    year={2023},
    eprint={2306.12422},
    archivePrefix={arXiv},
    primaryClass={cs.CV},
  }

DreamWaltz: Make a Scene with Complex 3D Animatable Avatars

DreamWaltz enables high-quality animatable avatar generation from texts (a), ready for 3D scene composition with diverse avatar-object (b), avatar-scene (c), and avatar-avatar (d) interactions.

Abstract

What's New

Avatar Creation (🚀Better)

High-Resolution Canonical Avatars

Exported Meshes

Avatar Creation

Canonical Avatars

Animatable Avatars

Scene Composition

Avatar-Object Interaction

Avatar-Scene Interaction

Avatar-Avatar Interaction

BibTeX