Co-speech Gesture Video Generation with 3D Human Meshes

1 Carnegie Mellon University    2 University of Science and Technology of China   
3 Ping An Technology    4 PAII Inc.

ECCV 2024

Back

Baseline Comparison (Mesh to Video)

We compare video generated by our method, that uses intermediate rendering of 3D meshes as conditioning to baseline method that uses 2D keypoints as intermediate representation. The 2D keypoints are extrated from Mediapipe from ground-truth video. We also provide the corresponding ground-truth video for comparison.

Keypoint Maps
(Mediapipe)

2D Baseline

Input Mesh

Ours

Ground Truth Video

Oliver


Keypoint Maps
(Mediapipe)

2D Baseline

Input Mesh

Ours

Ground Truth Video

Conan


Keypoint Maps
(Mediapipe)

2D Baseline

Input Mesh

Ours

Ground Truth Video

Seth


Keypoint Maps
(Mediapipe)

2D Baseline

Input Mesh

Ours

Ground Truth Video

Chemistry