Co-speech gestures are everywhere. People make gestures when they chat with others, give a public speech, talk on a phone, and even think aloud. Despite this ubiquity, there are not many datasets available. The main reason is that it is expensive to recruit actors/actresses and track precise body motions. There are a few datasets available (e.g., MSP AVATAR [17] and Personality Dyads Corpus [18]), but their sizes are limited to less than 3 h, and they lack diversity in speech content and speakers. The gestures also could be unnatural owing to inconvenient body tracking suits and acting in a lab environment.

Thus, we collected a new dataset of co-speech gestures: the TED Gesture Dataset. TED is a conference where people share their ideas from a stage, and recordings of these talks are available online. Using TED talks has the following advantages compared to the existing datasets:

• Large enough to learn the mapping from speech to gestures. The number of videos continues to grow. • Various speech content and speakers. There are thousands of unique speakers, and they talk about their own ideas and stories. • The speeches are well prepared, so we expect that the speakers use proper hand gestures. • Favorable for automation of data collection and annotation. All talks come with transcripts, and flat background and steady shots make extracting human poses with computer vision technology easier.


Paper Code Results Date Stars

Dataset Loaders

No data loaders found. You can submit your data loader here.


Similar Datasets