Given a sequence of input untextured meshes, we study the role of (1) depth and normal maps and (2) textured meshes for generating videos. We compare our full method that uses depth, normal maps and textured rendering of the meshes as input to generate video, to (1) Ours (w/o depth and normal maps) which does not use additonal depth and normal map conditioning as input, and (2) Ours (w/o textured mesh), which only uses rendering of untextured meshes as input . We also provide the corresponding ground-truth video for comparison.
Input Mesh Ours Ours Ours Ground Truth Video Oliver |
---|
Input Mesh Ours Ours Ours Ground Truth Video Conan |
Input Mesh Ours Ours Ours Ground Truth Video Seth |
Input Mesh Ours Ours Ours Ground Truth Video Chemistry |