We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet.
Ranked #1 on Speech Recognition on Common Voice English (using extra training data)
1 code implementation • 24 Jan 2022 • Arvind Neelakantan, Tao Xu, Raul Puri, Alec Radford, Jesse Michael Han, Jerry Tworek, Qiming Yuan, Nikolas Tezak, Jong Wook Kim, Chris Hallacy, Johannes Heidecke, Pranav Shyam, Boris Power, Tyna Eloundou Nekoul, Girish Sastry, Gretchen Krueger, David Schnurr, Felipe Petroski Such, Kenny Hsu, Madeleine Thompson, Tabarak Khan, Toki Sherbakov, Joanne Jang, Peter Welinder, Lilian Weng
Similarly to text embeddings, we train code embedding models on (text, code) pairs, obtaining a 20. 8% relative improvement over prior best work on code search.
Ranked #1 on Code Search on CodeSearchNet
3 code implementations • • Mitchell Wortsman, Gabriel Ilharco, Jong Wook Kim, Mike Li, Simon Kornblith, Rebecca Roelofs, Raphael Gontijo-Lopes, Hannaneh Hajishirzi, Ali Farhadi, Hongseok Namkoong, Ludwig Schmidt
Compared to standard fine-tuning, WiSE-FT provides large accuracy improvements under distribution shift, while preserving high accuracy on the target distribution.
Ranked #10 on Image Classification on ObjectNet (using extra training data)
Recently, there have been breakthroughs in computer vision ("CV") models that are more generalizable with the advent of models such as CLIP and ALIGN.
41 code implementations • 26 Feb 2021 • Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever
State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories.
Ranked #1 on Zero-Shot Learning on COCO-MLT (using extra training data)
Multi-party machine learning is a paradigm in which multiple participants collaboratively train a machine learning model to achieve a common learning objective without sharing their privately owned data.
A point-of-interest (POI) recommendation system performs an important role in location-based services because it can help people to explore new locations and promote advertisers to launch advertisements at appropriate locations.
no code implementations • 24 Aug 2019 • Irene Solaiman, Miles Brundage, Jack Clark, Amanda Askell, Ariel Herbert-Voss, Jeff Wu, Alec Radford, Gretchen Krueger, Jong Wook Kim, Sarah Kreps, Miles McCain, Alex Newhouse, Jason Blazakis, Kris McGuffie, Jasmine Wang
Large language models have a range of beneficial uses: they can assist in prose, poetry, and programming; analyze dataset biases; and more.
Automatic music transcription is considered to be one of the hardest problems in music information retrieval, yet recent deep learning approaches have achieved substantial improvements on transcription performance.
The recent success of raw audio waveform synthesis models like WaveNet motivates a new approach for music synthesis, in which the entire process --- creating audio samples from a score and instrument information --- is modeled using generative neural networks.
To date, the best performing techniques, such as the pYIN algorithm, are based on a combination of DSP pipelines and heuristics.