Search Results

Per-Pixel Classification is Not All You Need for Semantic Segmentation

3 code implementations NeurIPS 2021

Overall, the proposed mask classification-based method simplifies the landscape of effective approaches to semantic and panoptic segmentation tasks and shows excellent empirical results.

Classification Panoptic Segmentation +1

Attention Is All You Need

567 code implementations NeurIPS 2017

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration.

Ranked #2 on Multimodal Machine Translation on Multi30K (BLUE (DE-EN) metric)

Abstractive Text Summarization Coreference Resolution +8

Language Is Not All You Need: Aligning Perception with Language Models

1 code implementation NeurIPS 2023

A big convergence of language, multimodal perception, action, and world modeling is a key step toward artificial general intelligence.

Image Captioning Language Modelling +4

Neural HMMs are all you need (for high-quality attention-free TTS)

2 code implementations30 Aug 2021

Neural sequence-to-sequence TTS has achieved significantly better output quality than statistical speech synthesis using HMMs.

Speech Synthesis

DeeperGCN: All You Need to Train Deeper GCNs

3 code implementations13 Jun 2020

Graph Convolutional Networks (GCNs) have been drawing significant attention with the power of representation learning on graphs.

Graph Learning Graph Property Prediction +3

Attention is All You Need in Speech Separation

4 code implementations25 Oct 2020

Transformers are emerging as a natural alternative to standard RNNs, replacing recurrent computations with a multi-head attention mechanism.

Speech Separation

A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild

4 code implementations23 Aug 2020

However, they fail to accurately morph the lip movements of arbitrary identities in dynamic, unconstrained talking face videos, resulting in significant parts of the video being out-of-sync with the new audio.

 Ranked #1 on Unconstrained Lip-synchronization on LRS3 (using extra training data)

MORPH Unconstrained Lip-synchronization

ReZero is All You Need: Fast Convergence at Large Depth

13 code implementations10 Mar 2020

Deep networks often suffer from vanishing or exploding gradients due to inefficient signal propagation, leading to long training times or convergence difficulties.

Language Modelling

Bytes Are All You Need: Transformers Operating Directly On File Bytes

1 code implementation31 May 2023

Our model, \emph{ByteFormer}, achieves an ImageNet Top-1 classification accuracy of $77. 33\%$ when training and testing directly on TIFF file bytes using a transformer backbone with configuration similar to DeiT-Ti ($72. 2\%$ accuracy when operating on RGB images).

Classification Image Classification +1