Search Results for author: Adam Polyak

Found 29 papers, 15 papers with code

fairseq Sˆ2: A Scalable and Integrable Speech Synthesis Toolkit

1 code implementation • EMNLP (ACL) 2021 • Changhan Wang, Wei-Ning Hsu, Yossi Adi, Adam Polyak, Ann Lee, Peng-Jen Chen, Jiatao Gu, Juan Pino

This paper presents fairseq Sˆ2, a fairseq extension for speech synthesis.

Speech Synthesis

29,233

Paper
Code

Video Editing via Factorized Diffusion Distillation

no code implementations • 14 Mar 2024 • Uriel Singer, Amit Zohar, Yuval Kirstain, Shelly Sheynin, Adam Polyak, Devi Parikh, Yaniv Taigman

We introduce Emu Video Edit (EVE), a model that establishes a new state-of-the art in video editing without relying on any supervised video editing data.

Video Editing Video Generation

Paper
Add Code

Emu Edit: Precise Image Editing via Recognition and Generation Tasks

no code implementations • 16 Nov 2023 • Shelly Sheynin, Adam Polyak, Uriel Singer, Yuval Kirstain, Amit Zohar, Oron Ashual, Devi Parikh, Yaniv Taigman

Lastly, to facilitate a more rigorous and informed assessment of instructable image editing models, we release a new challenging and versatile benchmark that includes seven different image editing tasks.

Image Inpainting Multi-Task Learning +1

Paper
Add Code

Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning

1 code implementation • 5 Sep 2023 • Lili Yu, Bowen Shi, Ramakanth Pasunuru, Benjamin Muller, Olga Golovneva, Tianlu Wang, Arun Babu, Binh Tang, Brian Karrer, Shelly Sheynin, Candace Ross, Adam Polyak, Russell Howes, Vasu Sharma, Puxin Xu, Hovhannes Tamoyan, Oron Ashual, Uriel Singer, Shang-Wen Li, Susan Zhang, Richard James, Gargi Ghosh, Yaniv Taigman, Maryam Fazel-Zarandi, Asli Celikyilmaz, Luke Zettlemoyer, Armen Aghajanyan

It is also a general-purpose model that can do both text-to-image and image-to-text generation, allowing us to introduce self-contained contrastive decoding methods that produce high-quality outputs.

Ranked #2 on Text-to-Image Generation on MS COCO

Language Modelling Retrieval +2

318

Paper
Code

Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation

1 code implementation • NeurIPS 2023 • Yuval Kirstain, Adam Polyak, Uriel Singer, Shahbuland Matiana, Joe Penna, Omer Levy

Using this web app we build Pick-a-Pic, a large, open dataset of text-to-image prompts and real users' preferences over generated images.

Text-to-Image Generation

345

Paper
Code

X&Fuse: Fusing Visual Information in Text-to-Image Generation

no code implementations • 2 Mar 2023 • Yuval Kirstain, Omer Levy, Adam Polyak

We introduce X&Fuse, a general approach for conditioning on visual information when generating images from text.

Text-to-Image Generation

Paper
Add Code

Text-To-4D Dynamic Scene Generation

no code implementations • 26 Jan 2023 • Uriel Singer, Shelly Sheynin, Adam Polyak, Oron Ashual, Iurii Makarov, Filippos Kokkinos, Naman Goyal, Andrea Vedaldi, Devi Parikh, Justin Johnson, Yaniv Taigman

We present MAV3D (Make-A-Video3D), a method for generating three-dimensional dynamic scenes from text descriptions.

Scene Generation

Paper
Add Code

AudioGen: Textually Guided Audio Generation

1 code implementation • 30 Sep 2022 • Felix Kreuk, Gabriel Synnaeve, Adam Polyak, Uriel Singer, Alexandre Défossez, Jade Copet, Devi Parikh, Yaniv Taigman, Yossi Adi

Finally, we explore the ability of the proposed method to generate audio continuation conditionally and unconditionally.

Ranked #12 on Audio Generation on AudioCaps

Audio Generation Descriptive

19,607

Paper
Code

Make-A-Video: Text-to-Video Generation without Text-Video Data

2 code implementations • 29 Sep 2022 • Uriel Singer, Adam Polyak, Thomas Hayes, Xi Yin, Jie An, Songyang Zhang, Qiyuan Hu, Harry Yang, Oron Ashual, Oran Gafni, Devi Parikh, Sonal Gupta, Yaniv Taigman

We propose Make-A-Video -- an approach for directly translating the tremendous recent progress in Text-to-Image (T2I) generation to Text-to-Video (T2V).

Ranked #3 on Text-to-Video Generation on MSR-VTT (CLIP-FID metric)

Image Generation Super-Resolution +2

1,837

Paper
Code

KNN-Diffusion: Image Generation via Large-Scale Retrieval

no code implementations • 6 Apr 2022 • Shelly Sheynin, Oron Ashual, Adam Polyak, Uriel Singer, Oran Gafni, Eliya Nachmani, Yaniv Taigman

Recent text-to-image models have achieved impressive results.

Ranked #34 on Text-to-Image Generation on MS COCO

Retrieval Text-to-Image Generation

Paper
Add Code

Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

1 code implementation • 24 Mar 2022 • Oran Gafni, Adam Polyak, Oron Ashual, Shelly Sheynin, Devi Parikh, Yaniv Taigman

Recent text-to-image generation methods provide a simple yet exciting conversion capability between text and image domains.

Ranked #20 on Text-to-Image Generation on MS COCO (using extra training data)

Semantic Segmentation Text-to-Image Generation

312

Paper
Code

Locally Shifted Attention With Early Global Integration

1 code implementation • 9 Dec 2021 • Shelly Sheynin, Sagie Benaim, Adam Polyak, Lior Wolf

The separation of the attention layer into local and global counterparts allows for a low computational cost in the number of patches, while still supporting data-dependent localization already at the first layer, as opposed to the static positioning in other visual transformers.

Image Classification

Paper
Code

Textless Speech Emotion Conversion using Decomposed & Discrete Representations

no code implementations • arXiv 2021 • Felix Kreuk, Adam Polyak, Jade Copet, Eugene Kharitonov, Tu-Anh Nguyen, Morgane Rivière, Wei-Ning Hsu, Abdelrahman Mohamed, Emmanuel Dupoux, Yossi Adi

We decompose speech into discrete and disentangled learned representations, consisting of content units, F0, speaker, and emotion.

Paper
Add Code

Textless Speech Emotion Conversion using Discrete and Decomposed Representations

no code implementations • 14 Nov 2021 • Felix Kreuk, Adam Polyak, Jade Copet, Eugene Kharitonov, Tu-Anh Nguyen, Morgane Rivière, Wei-Ning Hsu, Abdelrahman Mohamed, Emmanuel Dupoux, Yossi Adi

We use a decomposition of the speech signal into discrete learned representations, consisting of phonetic-content units, prosodic features, speaker, and emotion.

Paper
Add Code

Local-Global Shifting Vision Transformers

no code implementations • 29 Sep 2021 • Shelly Sheynin, Sagie Benaim, Adam Polyak, Lior Wolf

Due to the expensive quadratic cost of the attention mechanism, either a large patch size is used, resulting in coarse-grained global interactions, or alternatively, attention is applied only on a local region of the image at the expense of long-range interactions.

Image Classification

Paper
Add Code

fairseq S^2: A Scalable and Integrable Speech Synthesis Toolkit

4 code implementations • 14 Sep 2021 • Changhan Wang, Wei-Ning Hsu, Yossi Adi, Adam Polyak, Ann Lee, Peng-Jen Chen, Jiatao Gu, Juan Pino

This paper presents fairseq S^2, a fairseq extension for speech synthesis.

Speech Synthesis

29,233

Paper
Code

Text-Free Prosody-Aware Generative Spoken Language Modeling

1 code implementation • ACL 2022 • Eugene Kharitonov, Ann Lee, Adam Polyak, Yossi Adi, Jade Copet, Kushal Lakhotia, Tu-Anh Nguyen, Morgane Rivière, Abdelrahman Mohamed, Emmanuel Dupoux, Wei-Ning Hsu

Generative Spoken Language Modeling (GSLM) \cite{Lakhotia2021} is the only prior work addressing the generative aspects of speech pre-training, which replaces text with discovered phone-like units for language modeling and shows the ability to generate meaningful novel sentences.

Language Modelling

29,233

Paper
Code

Direct speech-to-speech translation with discrete units

1 code implementation • ACL 2022 • Ann Lee, Peng-Jen Chen, Changhan Wang, Jiatao Gu, Sravya Popuri, Xutai Ma, Adam Polyak, Yossi Adi, Qing He, Yun Tang, Juan Pino, Wei-Ning Hsu

When target text transcripts are available, we design a joint speech and text training framework that enables the model to generate dual modality output (speech and text) simultaneously in the same inference pass.

Speech-to-Speech Translation Text Generation +1

157

Paper
Code

Speech Resynthesis from Discrete Disentangled Self-Supervised Representations

2 code implementations • 1 Apr 2021 • Adam Polyak, Yossi Adi, Jade Copet, Eugene Kharitonov, Kushal Lakhotia, Wei-Ning Hsu, Abdelrahman Mohamed, Emmanuel Dupoux

We propose using self-supervised discrete representations for the task of speech resynthesis.

Disentanglement Resynthesis +2

353

Paper
Code

Generative Spoken Language Modeling from Raw Audio

2 code implementations • 1 Feb 2021 • Kushal Lakhotia, Evgeny Kharitonov, Wei-Ning Hsu, Yossi Adi, Adam Polyak, Benjamin Bolte, Tu-Anh Nguyen, Jade Copet, Alexei Baevski, Adelrahman Mohamed, Emmanuel Dupoux

We introduce Generative Spoken Language Modeling, the task of learning the acoustic and linguistic characteristics of a language from raw audio (no text, no labels), and a set of metrics to automatically evaluate the learned representations at acoustic and linguistic levels for both encoding and generation.

Ranked #1 on Resynthesis on LibriSpeech

Language Modelling Resynthesis

29,233

Paper
Code

High Fidelity Speech Regeneration with Application to Speech Enhancement

no code implementations • 31 Jan 2021 • Adam Polyak, Lior Wolf, Yossi Adi, Ori Kabeli, Yaniv Taigman

Speech enhancement has seen great improvement in recent years mainly through contributions in denoising, speaker separation, and dereverberation methods that mostly deal with environmental effects on vocal audio.

Denoising Speaker Separation +3

Paper
Add Code

Unsupervised Cross-Domain Singing Voice Conversion

no code implementations • 6 Aug 2020 • Adam Polyak, Lior Wolf, Yossi Adi, Yaniv Taigman

We present a wav-to-wav generative model for the task of singing voice conversion from any identity.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Autoencoder-based Music Translation

no code implementations • ICLR 2019 • Noam Mor, Lior Wolf, Adam Polyak, Yaniv Taigman

We present a method for translating music across musical instruments and styles.

Translation

Paper
Add Code

TTS Skins: Speaker Conversion via ASR

no code implementations • 18 Apr 2019 • Adam Polyak, Lior Wolf, Yaniv Taigman

We present a fully convolutional wav-to-wav network for converting between speakers' voices, without relying on text.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

A Universal Music Translation Network

4 code implementations • 21 May 2018 • Noam Mor, Lior Wolf, Adam Polyak, Yaniv Taigman

We present a method for translating music across musical instruments, genres, and styles.

Translation

452

Paper
Code

Fitting New Speakers Based on a Short Untranscribed Sample

no code implementations • ICML 2018 • Eliya Nachmani, Adam Polyak, Yaniv Taigman, Lior Wolf

Learning-based Text To Speech systems have the potential to generalize from one speaker to the next and thus require a relatively short sample of any new voice.

Speech Synthesis

Paper
Add Code

VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop

2 code implementations • ICLR 2018 • Yaniv Taigman, Lior Wolf, Adam Polyak, Eliya Nachmani

We present a new neural text to speech (TTS) method that is able to transform text to speech in voices that are sampled in the wild.

Sentence

870

Paper
Code

Unsupervised Creation of Parameterized Avatars

no code implementations • ICCV 2017 • Lior Wolf, Yaniv Taigman, Adam Polyak

We study the problem of mapping an input image to a tied pair consisting of a vector of parameters and an image that is created using a graphical engine from the vector of parameters.

Unsupervised Domain Adaptation

Paper
Add Code

Unsupervised Cross-Domain Image Generation

6 code implementations • 7 Nov 2016 • Yaniv Taigman, Adam Polyak, Lior Wolf

We study the problem of transferring a sample in one domain to an analog sample in another domain.

Ranked #2 on Unsupervised Image-To-Image Translation on SVNH-to-MNIST

Domain Adaptation Unsupervised Image-To-Image Translation

2,475

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.