Co-speech Gesture Video Generation with 3D Human Meshes

Ablation Study (Audio to Video)

Given an audio input, we study the role of (1) depth and normal maps and (2) textured meshes for generating videos. We compare our full method that uses depth, normal maps and textured rendering of the meshes as input to generate video, to (1) Ours (w/o depth and normal maps) which does not use additonal depth and normal map conditioning as input, and (2) Ours (w/o textured mesh), which only uses rendering of untextured meshes as input . Please play the audio in each of the video to listen to the input speech.

Input Mesh Ours (w/o Depth + Normal) Ours (w/o Textured Mesh) Ours Oliver

Input Mesh Ours (w/o Depth + Normal) Ours (w/o Textured Mesh) Ours Conan

Input Mesh Ours (w/o Depth + Normal) Ours (w/o Textured Mesh) Ours Seth*

Input Mesh Ours (w/o Depth + Normal) Ours (w/o Textured Mesh) Ours Chemistry*

* The hands of "Seth" and "Chemistry" in the generated video from audio input contain cloudy artifacts because (1) the number of training frames for these examples is extremely small (< 7K frames), (2) The SMPL-X mesh sequence generated from audio at inference can be different compared to training mesh distribution. We observe that we need more than 25K frames for training the GAN model to make it robust to out of domain mesh sequence at inference (as in case of "Oliver" ).

Co-speech Gesture Video Generation with 3D Human Meshes

Aniruddha Mahapatra* ¹

Richa Mishra* ¹

Renda Li ^2,3

Ziyi Chen ⁴

Boyang Ding ^2,3

Jun-Yan Zhu ¹

Peng Chang ⁴

Mei Han ⁴

Jing Xiao ³

¹ Carnegie Mellon University ² University of Science and Technology of China
³ Ping An Technology ⁴ PAII Inc.

ECCV 2024

Ablation Study (Audio to Video)

Co-speech Gesture Video Generation with 3D Human Meshes

Aniruddha Mahapatra* 1

Richa Mishra* 1

Renda Li 2,3

Ziyi Chen 4

Boyang Ding 2,3

Shoulei Wang 2,3

Jun-Yan Zhu 1

Peng Chang 4

Mei Han 4

Jing Xiao 3

1 Carnegie Mellon University 2 University of Science and Technology of China 3 Ping An Technology 4 PAII Inc.

ECCV 2024

Ablation Study (Audio to Video)

Aniruddha Mahapatra* ¹

Richa Mishra* ¹

Renda Li ^2,3

Ziyi Chen ⁴

Boyang Ding ^2,3

Jun-Yan Zhu ¹

Peng Chang ⁴

Mei Han ⁴

Jing Xiao ³

¹ Carnegie Mellon University ² University of Science and Technology of China
³ Ping An Technology ⁴ PAII Inc.