Search Results for author: William Brannon

Found 5 papers, 4 papers with code

Data Authenticity, Consent, & Provenance for AI are all broken: what will it take to fix them?

no code implementations • 19 Apr 2024 • Shayne Longpre, Robert Mahari, Naana Obeng-Marnu, William Brannon, Tobin South, Katy Gero, Sandy Pentland, Jad Kabbara

New capabilities in foundation models are owed in large part to massive, widely-sourced, and under-documented training data collections.

Paper
Add Code

The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI

1 code implementation • 25 Oct 2023 • Shayne Longpre, Robert Mahari, Anthony Chen, Naana Obeng-Marnu, Damien Sileo, William Brannon, Niklas Muennighoff, Nathan Khazam, Jad Kabbara, Kartik Perisetla, Xinyi Wu, Enrico Shippole, Kurt Bollacker, Tongshuang Wu, Luis Villa, Sandy Pentland, Sara Hooker

The race to train language models on vast, diverse, and inconsistently documented datasets has raised pressing concerns about the legal and ethical risks for practitioners.

142

Paper
Code

ConGraT: Self-Supervised Contrastive Pretraining for Joint Graph and Text Embeddings

1 code implementation • 23 May 2023 • William Brannon, Suyash Fulay, Hang Jiang, Wonjune Kang, Brandon Roy, Jad Kabbara, Deb Roy

We propose ConGraT(Contrastive Graph-Text pretraining), a general, self-supervised method for jointly learning separate representations of texts and nodes in a parent (or ``supervening'') graph, where each text is associated with one of the nodes.

Contrastive Learning Link Prediction

Paper
Code

Dubbing in Practice: A Large Scale Study of Human Localization With Insights for Automatic Dubbing

1 code implementation • 23 Dec 2022 • William Brannon, Yogesh Virkar, Brian Thompson

We investigate how humans perform the task of dubbing video content from one language into another, leveraging a novel corpus of 319. 57 hours of video from 54 professionally produced titles.

Translation

Paper
Code

RadioTalk: a large-scale corpus of talk radio transcripts

1 code implementation • 16 Jul 2019 • Doug Beeferman, William Brannon, Deb Roy

We introduce RadioTalk, a corpus of speech recognition transcripts sampled from talk radio broadcasts in the United States between October of 2018 and March of 2019.

Descriptive speech-recognition +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.