no code implementations • 19 Apr 2024 • Shayne Longpre, Robert Mahari, Naana Obeng-Marnu, William Brannon, Tobin South, Katy Gero, Sandy Pentland, Jad Kabbara
New capabilities in foundation models are owed in large part to massive, widely-sourced, and under-documented training data collections.
1 code implementation • 25 Oct 2023 • Shayne Longpre, Robert Mahari, Anthony Chen, Naana Obeng-Marnu, Damien Sileo, William Brannon, Niklas Muennighoff, Nathan Khazam, Jad Kabbara, Kartik Perisetla, Xinyi Wu, Enrico Shippole, Kurt Bollacker, Tongshuang Wu, Luis Villa, Sandy Pentland, Sara Hooker
The race to train language models on vast, diverse, and inconsistently documented datasets has raised pressing concerns about the legal and ethical risks for practitioners.
1 code implementation • 23 May 2023 • William Brannon, Suyash Fulay, Hang Jiang, Wonjune Kang, Brandon Roy, Jad Kabbara, Deb Roy
We propose ConGraT(Contrastive Graph-Text pretraining), a general, self-supervised method for jointly learning separate representations of texts and nodes in a parent (or ``supervening'') graph, where each text is associated with one of the nodes.
1 code implementation • 23 Dec 2022 • William Brannon, Yogesh Virkar, Brian Thompson
We investigate how humans perform the task of dubbing video content from one language into another, leveraging a novel corpus of 319. 57 hours of video from 54 professionally produced titles.
1 code implementation • 16 Jul 2019 • Doug Beeferman, William Brannon, Deb Roy
We introduce RadioTalk, a corpus of speech recognition transcripts sampled from talk radio broadcasts in the United States between October of 2018 and March of 2019.