Figure 1: RGAvatar enables high-fidelity 4D Gaussian avatar reconstruction from monocular videos with controllable pose, expression, and illumination.
Relightable 4D avatar reconstruction which enables high fidelity and real-time rendering continues to be a crucial but challenging problem, especially from monocular videos. Previous NeRF-based 4D avatars enable photo-realistic relighting but are too slow for rendering, while point-based or mesh-based 4D avatars are efficient but have limited rendering quality. The recent success of 3D Gaussian Splatting, i.e., 3DGS, has inspired a series of impressive 4D Gaussian avatars, however, most of which only focus on faithful appearance reconstruction but are not relightable. To address such issues, this paper proposes a new Relightable 4D Gaussian Avatar, i.e., RGAvatar, tailored for high fidelity relightable rendering from monocular videos. Our key idea is to introduce a new relightable 4D Gaussian representation, based on which we can directly perform high fidelity Physically Based Rendering, and an effective joint learning mechanism for compact 4D Gaussian reconstruction with SDF regulation and accurate materials and lighting decomposition. By comparing with previous state-of-the-art approaches, RGAvatar can significantly outperform previous approaches in relightable rendering quality and speed. To our best knowledge, RGAvatar contributes a new state-of-the-art 4D Gaussian avatar from monocular videos, which enables high fidelity relightable rendering in a quite efficient manner.
This video presents some qualitative results, including reconstruction evaluation and comparisons with SOTA methods.
Figure 2: The main pipeline of RGAvatar. The pipeline of RGAvatar. Given monocular video input, we propose to learn the R-4DGS with two learning stages. In the first stage, we aims at a compact 4D Gaussian reconstruction (left), which forces the R-4DGS compactly coherent to the underlying surface of an extra SDF field $\mathcal{F}^{g}$(left top). We deform the canonical R-4DGS $\mathcal{\bar{G}}=\{\bar{g}\}$ to $g$ using a deformation field $\mathcal{D}$, in which a Motion Deformation Module is used to deform the centroid position $\bar{\mu}$ from canonical state to dynamic ${\mu}$, and a Shape Deformation Module to deform the rotation and scale. In the next stage, we perform a neural material and lighting decomposition (right), by leveraging strong geometry prior from the compact 4D Gaussian reconstruction learning to guide the geometry-aware light visibility prediction using network $\mathcal{F}^v$, which takes the SDF encoding $f_{sdf}$ and LBS deformation encoding$f_{lbs}$ as input. This tailored visibility prediction module leading to more accurate materials (albedo ${b}$, roughness $r$ and fresnel reflectivity $f_0$) and incident light ($L_{env}$) learning for the 4D Gaussians.
the R-4DGS reconstruction results of one subject from our collected dataset.
Figure 3: The visual results of our R-4DGS reconstruction of different subjects. Specifically, We show the diffuse/specular components, normal, BRDFs and relighting results.
The visual comparison results for different 4D avatars, including INSTA, FlashAvatar, SplattingAvatar, PointAvatar, FLARE and Ours.
Qualitative comparison across five different identities.
Relighting Comparison against FLARE and PointAvatar under different illumination conditions.
Qualitative comparison across four different identities.
@article{fan2025rgavatar,
title={RGAvatar: Relightable 4D Gaussian avatar from monocular videos},
author={Fan, Zhe and Huang, Shi-Sheng and Zhang, Yichi and Shang, Dachao and Zhang, Juyong and Guo, Yudong and Huang, Hua},
journal={IEEE Transactions on Visualization and Computer Graphics},
year={2025},
publisher={IEEE}
}