YTD-18M is a large-scale corpus of 18M video-based dialogues, constructed from web videos: crucial to the data collection pipeline is a pretrained language model that converts error-prone automatic transcripts to a cleaner dialogue format while maintaining meaning.
Source: CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web VideosPaper | Code | Results | Date | Stars |
---|