YTD-18M

Introduced by Han et al. in CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos

YTD-18M is a large-scale corpus of 18M video-based dialogues, constructed from web videos: crucial to the data collection pipeline is a pretrained language model that converts error-prone automatic transcripts to a cleaner dialogue format while maintaining meaning.

Source: CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos

Homepage

Benchmarks

Add a new result Link an existing benchmark

No benchmarks yet. Start a new benchmark or link an existing one.

Papers

Paper	Code	Results	Date	Stars

Dataset Loaders

Add Remove

No data loaders found. You can submit your data loader here.

Tasks

Text Generation

Similar Datasets

OpenViDial

MMChat

DialogCC

Image-Chat

Source: https://seungjuhan.me/champagne/.

Usage

YTD-18M

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit

Similar Datasets

OpenViDial

MMChat

DialogCC

Image-Chat

Usage

License Edit

Modalities Edit

Languages Edit

Benchmarks

Add a new result Link an existing benchmark

Dataset Loaders

Add Remove

Tasks

License

Modalities

Languages