HandMDM

Abstract

Our goal is to train a generative model of 3D hand motions, conditioned on natural language descriptions specifying motion characteristics such as handshapes, locations, finger/hand/arm movements. To this end, we automatically build pairs of 3D hand motions and their associated textual labels with unprecedented scale. Specifically, we leverage a large-scale sign language video dataset, along with noisy pseudo-annotated sign categories, which we translate into hand motion descriptions via an LLM that utilizes a dictionary of sign attributes, as well as our complementary motion-script cues. This data enables training a text-conditioned hand motion diffusion model HandMDM, that is robust across domains such as unseen sign categories from the same sign language, but also signs from another sign language and non-sign hand movements. We contribute extensive experimental investigation of these scenarios and will make our trained models and data publicly available to support future research in this relatively new field.

Text-Driven 3D Hand Motion Generation
From Sign Language Data
arXiv 2025

Paper

Data

Abstract

Video

Citation

Text-Driven 3D Hand Motion Generation From Sign Language Data arXiv 2025

Paper

Data

Abstract

Video

Citation

Text-Driven 3D Hand Motion Generation
From Sign Language Data
arXiv 2025