no code implementations • 31 Mar 2024 • Michael Hassid, Tal Remez, Jonas Gehring, Roy Schwartz, Yossi Adi
On the other hand, in scenarios where unit-tests are unavailable, a ranking-based selection of candidates from the smaller model falls short of the performance of a single output from larger ones.
no code implementations • 9 Jan 2024 • Alon Ziv, Itai Gat, Gael Le Lan, Tal Remez, Felix Kreuk, Alexandre Défossez, Jade Copet, Gabriel Synnaeve, Yossi Adi
We introduce MAGNeT, a masked generative sequence modeling method that operates directly over several streams of audio tokens.
2 code implementations • 24 Aug 2023 • Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Romain Sauvestre, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Défossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom, Gabriel Synnaeve
We release Code Llama, a family of large language models for code based on Llama 2 providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks.
Ranked #26 on Code Generation on MBPP
no code implementations • 10 Aug 2023 • Tu Anh Nguyen, Wei-Ning Hsu, Antony D'Avirro, Bowen Shi, Itai Gat, Maryam Fazel-Zarani, Tal Remez, Jade Copet, Gabriel Synnaeve, Michael Hassid, Felix Kreuk, Yossi Adi, Emmanuel Dupoux
Recent work has shown that it is possible to resynthesize high-quality speech based, not on text, but on low bitrate discrete units that have been learned in a self-supervised fashion and can therefore capture expressive aspects of speech that are hard to transcribe (prosody, voice styles, non-verbal vocalization).
2 code implementations • NeurIPS 2023 • Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi, Alexandre Défossez
We tackle the task of conditional music generation.
Ranked #4 on Text-to-Music Generation on MusicCaps
1 code implementation • NeurIPS 2023 • Michael Hassid, Tal Remez, Tu Anh Nguyen, Itai Gat, Alexis Conneau, Felix Kreuk, Jade Copet, Alexandre Defossez, Gabriel Synnaeve, Emmanuel Dupoux, Roy Schwartz, Yossi Adi
In this work, we propose TWIST, a method for training SpeechLMs using a warm-start from a pretrained textual language models.
no code implementations • CVPR 2023 • Wei-Ning Hsu, Tal Remez, Bowen Shi, Jacob Donley, Yossi Adi
Moreover, we utilize self-supervised audio-visual speech model to initialize P-AVSR.
no code implementations • 21 Dec 2022 • Wei-Ning Hsu, Tal Remez, Bowen Shi, Jacob Donley, Yossi Adi
Moreover, we utilize self-supervised audio-visual speech model to initialize P-AVSR.
Ranked #1 on Speech Recognition on EasyCom
no code implementations • 20 Jul 2022 • Efthymios Tzinis, Scott Wisdom, Tal Remez, John R. Hershey
We identify several limitations of previous work on audio-visual on-screen sound separation, including the coarse resolution of spatio-temporal attention, poor convergence of the audio separation model, limited variety in training and evaluation data, and failure to account for the trade off between preservation of on-screen sounds and suppression of off-screen sounds.
no code implementations • CVPR 2022 • Michael Hassid, Michelle Tadmor Ramanovich, Brendan Shillingford, Miaosen Wang, Ye Jia, Tal Remez
In this paper we present VDTTS, a Visually-Driven Text-to-Speech model.
no code implementations • 19 Jul 2021 • Ye Jia, Michelle Tadmor Ramanovich, Tal Remez, Roi Pomerantz
We present Translatotron 2, a neural direct speech-to-speech translation model that can be trained end-to-end.
no code implementations • 17 Jun 2021 • Efthymios Tzinis, Scott Wisdom, Tal Remez, John R. Hershey
We introduce a state-of-the-art audio-visual on-screen sound separation system which is capable of learning to separate sounds and associate them with on-screen objects by looking at in-the-wild videos.
no code implementations • ICLR 2021 • Efthymios Tzinis, Scott Wisdom, Aren Jansen, Shawn Hershey, Tal Remez, Daniel P. W. Ellis, John R. Hershey
For evaluation and semi-supervised experiments, we collected human labels for presence of on-screen and off-screen sounds on a small subset of clips.
1 code implementation • 20 Aug 2018 • Tal Remez, Or Litany, Raja Giryes, Alex M. Bronstein
We propose a fully-convolutional neural-network architecture for image denoising which is simple yet powerful.
1 code implementation • ECCV 2018 • Tal Remez, Jonathan Huang, Matthew Brown
This paper presents a weakly-supervised approach to object instance segmentation.
1 code implementation • 25 Jul 2017 • Zorah Lähner, Matthias Vestner, Amit Boyarski, Or Litany, Ron Slossberg, Tal Remez, Emanuele Rodolà, Alex Bronstein, Michael Bronstein, Ron Kimmel, Daniel Cremers
We present a method to match three dimensional shapes under non-isometric deformations, topology changes and partiality.
3 code implementations • ICCV 2017 • Or Litany, Tal Remez, Emanuele Rodolà, Alex M. Bronstein, Michael M. Bronstein
We introduce a new framework for learning dense correspondence between deformable 3D shapes.
1 code implementation • 6 Jan 2017 • Tal Remez, Or Litany, Raja Giryes, Alex M. Bronstein
We further show that a significant boost in performance of up to $0. 4$ dB PSNR can be achieved by making our network class-aware, namely, by fine-tuning it for images belonging to a specific semantic class.
2 code implementations • 6 Jan 2017 • Tal Remez, Or Litany, Raja Giryes, Alex M. Bronstein
Poisson distribution is used for modeling noise in photon-limited imaging.
3 code implementations • 15 Dec 2016 • Or Litany, Tal Remez, Alex Bronstein
With the development of range sensors such as LIDAR and time-of-flight cameras, 3D point cloud scans have become ubiquitous in computer vision applications, the most prominent ones being gesture recognition and autonomous driving.
no code implementations • 3 Aug 2016 • Tal Remez, Or Litany, Shachar Yoseff, Harel Haim, Alex Bronstein
We present a proof-of-concept end-to-end system for computational extended depth of field (EDOF) imaging.
no code implementations • 6 Dec 2015 • Or Litany, Tal Remez, Alex Bronstein
Recently, the dense binary pixel Gigavision camera had been introduced, emulating a digital version of the photographic film.
no code implementations • 4 Dec 2015 • Or Litany, Tal Remez, Daniel Freedman, Lior Shapira, Alex Bronstein, Ran Gal
We present ASIST, a technique for transforming point clouds by replacing objects with their semantically equivalent counterparts.
no code implementations • 9 Nov 2015 • Tal Remez, Shai Avidan
Each tree in the forest produces a segmentation of the image plane and the boundaries of the segmentations of all trees are aggregated to produce a final hierarchical contour map.
no code implementations • 15 Oct 2015 • Tal Remez, Or Litany, Alex Bronstein
In this work, we study a variant of a sensor with binary threshold pixels and propose a reconstruction algorithm combining an ML data fitting term with a sparse synthesis prior.