Animatable 3D Gaussian: Fast and High-Quality Reconstruction of Multiple Human Avatars

*Equal Contribution
1Tsinghua University, 2Gala Sports

We propose Animatable 3D Gaussian, a novel neural representation for fast and high-fidelity reconstruction of multiple animatable human avatars, which can animate and render the model at interactive rate.

Abstract

Neural radiance fields are capable of reconstructing high-quality drivable human avatars but are expensive to train and render and not suitable for multi-human scenes with complex shadows. To reduce consumption, we propose Animatable 3D Gaussian, which learns human avatars from input images and poses. We extend 3D Gaussians to dynamic human scenes by modeling a set of skinned 3D Gaussians and a corresponding skeleton in canonical space and deforming 3D Gaussians to posed space according to the input poses. We introduce a multi-head hash encoder for pose-dependent shape and appearance and a time-dependent ambient occlusion module to achieve high-quality reconstructions in scenes containing complex motions and dynamic shadows. On both novel view synthesis and novel pose synthesis tasks, our method achieves higher reconstruction quality than InstantAvatar with less training time (1/60), less GPU memory (1/4), and faster rendering speed (7x). Our method can be easily extended to multi-human scenes and achieve comparable novel view synthesis results on a scene with ten people in only 25 seconds of training.

Method

The proposed animatable 3D Gaussian consists of a set of skinned 3D Gaussians and a corresponding canonical skeleton. Each skinned 3D Gaussian contains center x0, rotation R, scale S, opacity α0, and skinning weights w. First, we sample spherical harmonic coefficients SH, vertex displacement δx, and ambient occlusion ao from the hash-encoded parameter field according to the center x0, where the multilayer perceptron for ao requires an additional frequency encoded time γ(t) as input. Next, we concatenate the sampled parameters, the original parameters, and a shifted center x0' in canonical space. Finally, we deform 3D Gaussians to the posed space according to the input pose St,Tt and render them to the image using 3D Gaussian rasterization.

Result

Training

Comparison on Peoplesnapshot

Novel View

Novel Pose

Double-Human Scene

Multi-Human Scene

Free-Viewpoint

Ten-Player Novel Pose

BibTeX

@article{liu2023animatable,
        title={Animatable 3D Gaussian: Fast and High-Quality Reconstruction of Multiple Human Avatars},
        author={Liu, Yang and Huang, Xiang and Qin, Minghan and Lin, Qinwei and Wang, Haoqian},
        journal={arXiv preprint arXiv:2311.16482},
        year={2023}
      }