Search Results

You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection

huggingface/transformers NeurIPS 2021

Can Transformer perform 2D object- and region-level recognition from a pure sequence-to-sequence perspective with minimal knowledge about the 2D spatial structure?

object-detection Object Detection

XLNet: Generalized Autoregressive Pretraining for Language Understanding

huggingface/transformers NeurIPS 2019

With the capability of modeling bidirectional contexts, denoising autoencoding based pretraining like BERT achieves better performance than pretraining approaches based on autoregressive language modeling.

 Ranked #1 on Sentiment Analysis on IMDb (using extra training data)

Audio Question Answering Chinese Reading Comprehension +9

wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations

huggingface/transformers NeurIPS 2020

We show for the first time that learning powerful representations from speech audio alone followed by fine-tuning on transcribed speech can outperform the best semi-supervised methods while being conceptually simpler.

 Ranked #1 on Speech Recognition on TIMIT (using extra training data)

Quantization Self-Supervised Learning +1

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers

huggingface/transformers NeurIPS 2021

We present SegFormer, a simple, efficient yet powerful semantic segmentation framework which unifies Transformers with lightweight multilayer perception (MLP) decoders.

Semantic Segmentation Thermal Image Segmentation

LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding

huggingface/transformers 18 Apr 2021

In this paper, we present LayoutXLM, a multimodal pre-trained model for multilingual document understanding, which aims to bridge the language barriers for visually-rich document understanding.

Document Image Classification

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

huggingface/transformers 31 Dec 2019

In this paper, we propose the \textbf{LayoutLM} to jointly model interactions between text and layout information across scanned document images, which is beneficial for a great number of real-world document image understanding tasks such as information extraction from scanned documents.

Document AI Document Image Classification +2

Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation

huggingface/transformers 20 Apr 2020

Quantization techniques can reduce the size of Deep Neural Networks and improve inference latency and throughput by taking advantage of high throughput integer instructions.


LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding

huggingface/transformers ACL 2021

Pre-training of text and layout has proved effective in a variety of visually-rich document understanding tasks due to its effective model architecture and the advantage of large-scale unlabeled scanned/digital-born documents.

Document Image Classification Document Layout Analysis +4

HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units

huggingface/transformers 14 Jun 2021

Self-supervised approaches for speech representation learning are challenged by three unique problems: (1) there are multiple sound units in each input utterance, (2) there is no lexicon of input sound units during the pre-training phase, and (3) sound units have variable lengths with no explicit segmentation.

Ranked #3 on Speech Recognition on LibriSpeech test-other (using extra training data)

Representation Learning Speech Recognition