no code implementations • 5 Oct 2023 • Tianhong Li, Sangnie Bhardwaj, Yonglong Tian, Han Zhang, Jarred Barber, Dina Katabi, Guillaume Lajoie, Huiwen Chang, Dilip Krishnan
We demonstrate image generation and captioning performance on par with state-of-the-art text-to-image and image-to-text models with orders of magnitude fewer (only 3M) paired image-text data.
3 code implementations • 1 Jun 2023 • Kihyuk Sohn, Nataniel Ruiz, Kimin Lee, Daniel Castro Chin, Irina Blok, Huiwen Chang, Jarred Barber, Lu Jiang, Glenn Entis, Yuanzhen Li, Yuan Hao, Irfan Essa, Michael Rubinstein, Dilip Krishnan
Pre-trained large text-to-image models synthesize impressive images with an appropriate use of text prompts.
no code implementations • 3 Feb 2023 • John Harvill, Jarred Barber, Arun Nair, Ramin Pishehvar
Self-supervised representation learning approaches have grown in popularity due to the ability to train models on large amounts of unlabeled data and have demonstrated success in diverse fields such as natural language processing, computer vision, and speech.
4 code implementations • 2 Jan 2023 • Huiwen Chang, Han Zhang, Jarred Barber, AJ Maschinot, Jose Lezama, Lu Jiang, Ming-Hsuan Yang, Kevin Murphy, William T. Freeman, Michael Rubinstein, Yuanzhen Li, Dilip Krishnan
Compared to pixel-space diffusion models, such as Imagen and DALL-E 2, Muse is significantly more efficient due to the use of discrete tokens and requiring fewer sampling iterations; compared to autoregressive models, such as Parti, Muse is more efficient due to the use of parallel decoding.
Ranked #1 on Text-to-Image Generation on MS-COCO (FID metric)
no code implementations • 27 Jun 2022 • Gregory Ciccarelli, Jarred Barber, Arun Nair, Israel Cohen, Tao Zhang
We review current solutions and technical challenges for automatic speech recognition, keyword spotting, device arbitration, speech enhancement, and source localization in multidevice home environments to provide context for the INTERSPEECH 2022 special session, "Challenges and opportunities for signal processing and machine learning for multiple smart devices".
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
2 code implementations • 13 Jan 2022 • Peyman Bateni, Jarred Barber, Raghav Goyal, Vaden Masrani, Jan-Willem van de Meent, Leonid Sigal, Frank Wood
The first method, Simple CNAPS, employs a hierarchically regularized Mahalanobis-distance based classifier combined with a state of the art neural adaptive feature extractor to achieve strong performance on Meta-Dataset, mini-ImageNet and tiered-ImageNet benchmarks.
no code implementations • 8 Dec 2021 • Jarred Barber, Yifeng Fan, Tao Zhang
We introduce a variant of the speaker localization problem, which we call device arbitration.
2 code implementations • 28 Sep 2020 • Peyman Bateni, Jarred Barber, Jan-Willem van de Meent, Frank Wood
We propose a transductive meta-learning method that uses unlabelled instances to improve few-shot image classification performance.
2 code implementations • 17 Jun 2020 • Peyman Bateni, Jarred Barber, Jan-Willem van de Meent, Frank Wood
We develop a transductive meta-learning method that uses unlabelled instances to improve few-shot image classification performance.
Ranked #1 on Few-Shot Image Classification on Tiered ImageNet 10-way (1-shot) (using extra training data)
no code implementations • 5 Jun 2020 • Jarred Barber
Gaussian processes are powerful models for probabilistic machine learning, but are limited in application by their $O(N^3)$ inference complexity.