Hybrid Volumetric-Textural Rendering
for Human Avatars

3DV 2022

Tao Hu1,    Tao Yu2,   Zerong Zheng2,    He Zhang2,    Yebin Liu*2,    Matthias Zwicker1
1University of Maryland, College Park      2Tsinghua University    

Key idea of HVTR: two-stage hybrid rendering.


We propose a novel neural rendering pipeline, Hybrid Volumetric-Textural Rendering (HVTR), which synthesizes virtual human avatars from arbitrary poses efficiently and at high quality. First, we learn to encode articulated human motions on a dense UV manifold of the human body surface. To handle complicated motions (e.g., self-occlusions), we then leverage the encoded information on the UV manifold to construct a 3D volumetric representation based on a dynamic pose-conditioned neural radiance field. While this allows us to represent 3D geometry with changing topology, volumetric rendering is computationally heavy. Hence we employ only a rough volumetric representation using a pose-conditioned downsampled neural radiance field (PD-NeRF), which we can render efficiently at low resolutions. In addition, we learn 2D textural features that are fused with rendered volumetric features in image space. The key advantage of our approach is that we can then convert the fused features into a high-resolution, high-quality avatar by a fast GAN-based textural renderer. We demonstrate that hybrid rendering enables HVTR to handle complicated motions, render high-quality avatars under user-controlled poses/shapes and even loose clothing, and most importantly, be efficient at inference time. Our experimental results also demonstrate the state-of-the-art quantitative results. HVTR is differentiable, and can be trained end-to-end using only 2D images.



Given a coarse SMPL mesh $M_P$ with pose P and a target viewpoint (o, d), our system renders a detailed avatar using four main components: pose encoding, 2D textural feature encoding, 3D volumetric representation, and hybrid rendering. ① Pose Encoding in UV space: We learn human motions on the UV manifold of body mesh surface by recording the 3D positions of the mesh on a UV positional map and proposing optimizable geometry and texture latents to capture local motion/appearance details. The step yields pose-dependent features in UV space, which are projected into 2D textural features $\Psi^{im}_{tex}$ in ② 2D Tex-Encoding. ③ 3D Vol-Rep: To capture the rough geometry and address self-occlusion problems, we further learn a volumetric representation by constructing a pose-conditioned downsampled neural radiance field (PD-NeRF) to encode 3D pose-dependent features. ④ Hybrid Rendering : PD-NeRF is rasterized into image space $\Psi^{im}_{vol}$ by volume rendering, where 3D volumetric features are also preserved. Both the 2D textural and 3D volumetric features are pixel-aligned in image space, fused by Attentional Feature Fusion (AFF), and then converted into a realistic image and a mask by TexRenderer.

Render Loose Clothing

Our method is capable of rendering loose clothing like skirts with GAN. In the current setup (learning pose-conditioned NeRF from 64 x 96 images, e.g., regressed PD-NeRF image ), we cannot reconstruct the full offsets of the skirt, though we recover some offset details and hair knot ②③. Yet we can use GAN to render the skirts with rough geometries.

Geometry Reconstruction

As a byproduct of our method, we can also reconstruct a rough 3D geometry by learning the pose-conditioned downsampled NeRF from 45 x 45 (1/16 of full images) resolution images with only 7 sampled points along each ray in training: left (predicted geometry), right (reference image), which enables efficient training and inference.


HVTR can render human avatars with both pose and shape control from arbitary viewpoints.

Pose Driven Avatars

Render human avatars under different poses and viewpoints.

Shape Editing

Render human avatars under different shape parameters.


      title={HVTR: Hybrid Volumetric-Textural Rendering for Human Avatars},
      author={Hu, Tao and Yu, Tao and Zheng, Zerong and Zhang, He and Liu, Yebin and Zwicker, Matthias},
      booktitle = {2022 International Conference on 3D Vision (3DV)},
      year = {2022}