Papers with Code Newsletter #9

Welcome to the 9th issue of the Papers with Code newsletter. In this edition, we cover:

simplified alternative architectures for competitive image classification,
effective invertible architectures with smaller memory footprint,
a diffusion model for state-of-the-art image synthesis,
our latest collaboration with arXiv to support links to datasets,
the latest development in online fake news detection,
...and much more!

Trending Papers with Code 📄

Simplified Architectures for Image Classification

The MLP-Mixer architecture consisting of per-batch linear embeddings, Mixer layers, and a classifier head. Figure source: Tolstikhin et al. (2021)

CNNs are widely regarded as the go-to model for dealing with computer vision tasks. Attention-based architectures have also emerged as promising approaches that produce good performance on a variety of vision tasks. Despite this trend and the successes of attention and CNN architectures, Tolstikhin et al. (2021) recently proposed a simple alternative architecture, MLP-Mixer, based on multi-layer perceptions, that produces competitive results on image classification benchmarks.

What it is: An MLP-Mixer, shown in the figure above, contains two types of layers. One layer of MLPs applied independently to image patches and another layer of MLPs applied across patches. These layers achieve the effect of mixing per-location features and mixing spatial information, respectively. Results show that when trained on large datasets, MLP-Mixer achieves competitive results on the ImageNet benchmark, with pretraining and inference cost comparable to state-of-the-art models. The authors hope that the proposed work sparks further research, how it might apply to NLP and more practical/theoretical studies of these methods.

You might also like:

Touvron et al. (2021) also recently proposed an architecture built entirely upon MLPs, called ResMLP, for image classification. It consists of a simple residual network that's trained with a modern training strategy using heavy data augmentation and optionally distillation. The network is made up of a linear layer where image patches interact and a two-layer-feed-forward network in which channels interact independently per patch. This method achieves good accuracy/complexity trade-offs on the ImageNet benchmark.

Luke Melas-Kyriazi (2021) recently proposed to replace the attention layer in a vision transformer with feed-forward layers applied over the patch and feature dimensions in an alternating fashion. The architecture achieves good results on ImageNet and the authors report that aspects of vision other than attention, such as patch embedding, may be more responsible for their strong performance than previously thought.

Ding et al. (2021) recently proposed RepMLP, an MLP-style neural network building block composed of a series of fully-connected layers. The new method involves a re-parameterization technique that adds local prior into an FC to improve its image recognition capabilities. On CIFAR, a simple pure-MLP shows performance close to CNNs. When RepMLP is incorporated into CNNs, the authors improved ResNets on tasks such as image classification and face recognition.

Momentum Residual Neural Networks

Comparison of memory needed for computing gradients of the loss, with ResNets (activations stored) and Momentum ResNets (activations are not stored). Figure source: Sander et al. (2021)

ResNets trained with backpropagation have a memory cost that linearly increases with network depth. Reversible architectures have been used to circumvent the issue. Recent work by Sander et al. (2021) proposes an invertible architecture, Momentum ResNets, that modifies the forward rule of a ResNet by adding a momentum term.

Why it matters: The authors claim that Momentum ResNets can be used as a drop-in replacement for any existing ResNet block. Results show that adding momentum progressively increases the representation capabilities of Momentum ResNets. The reported analysis reveals that Momentum ResNets can learn any linear mapping up to a multiplicative factor, while ResNets cannot. Momentum ResNets produce similar results as ResNets on CIFAR and ImageNet while having a much smaller memory footprint. All of this shows that pre-trained Momentum ResNets are promising for fine-tuning models.

🔗 Paper & Code

Diffusion Models Beat GANs on Image Synthesis

Samples generated by the best ImageNet 512x512 model. Figure source: Dhariwal et al. (2021)

GANs have been the superior and go-to method for image synthesis, generating diverse human speech, and music generation over the past few years. Despite this success, there is still more room for improvement as these methods find applications in the real world such as graphic designing and games. Some notable drawbacks of GANs range from being difficult to train and capturing less diversity as compared to other likelihood-based models. To provide an alternative effective method, Dhariwal et al. (2021) recently proposed to use diffusion models for state-of-the-art image synthesis.

What's new: Diffusion models are a class of likelihood-based models that have shown to produce high-quality images with desired properties such as distribution coverage and easy scalability. These models generate samples by gradually removing noise from a signal. Previous research has shown that they improve reliably with increased compute. The proposed method brings improvements to diffusions models that have worked for GANs, such as improved model architecture and a scheme to trade off diversity for quality. The proposed diffusion model achieves several state-of-the-art results, surpassing GANs on several metrics and datasets such as ImageNet 128x128 and LSUN Cat 256x256.

🔗 Paper, Code, and Results

User Preference-aware Fake News Detection

The UPFD framework for user preference-aware fake news detection. Figure source: Dou et al. (2021)

The negative effects of online misinformation and fake news have sparked a lot of interest to build more effective fake news detection systems. Most of the current algorithms for fake news detection mainly focus on news content mining and/or exogenous context to discover deceptive signals. However, the endogenous preference of users when they decide to spread fake news or not is mostly ignored. In an effort to improve fake news detection systems, Dou et al. (2021) recently proposed a new framework called UPFD that incorporates user preferences into these systems.

What's new: UPFD is a new fake news detection framework that incorporates social media users' historical, social engagements as information that represents their preferences towards news. Various signals are captured from user preferences by joint content and graph modeling. Given social posts and engagements, the framework extracts exogenous context and encodes the endogenous information based on user historical posts and news texts (see full architecture in the figure above). A GNN encoder is used to fuse both channels of information and the final news embedding is used to predict news' credibility. Experimental results show the effectiveness of the model as compared to other approaches on real-world datasets.

🔗 Paper & Code

Trending Libraries and Datasets 🛠

Trending Datasets

ExpMRC - a benchmark for explainability evaluation on machine reading comprehension.

DeformingThings4D - a synthetic dataset containing animation sequences spanning 31 categories of humanoids and animals.

QASPER - is a dataset for question answering on NLP research papers.

Trending Libraries/Tools

PyTorch Geometric Temporal - a deep learning framework combining state-of-the-art machine learning algorithms for neural spatiotemporal signal processing.

SummVis - an open-source tool for visualizing abstractive summaries that enable fine-grained analysis of the models, data, and evaluation metrics associated with text summarization.

skweak - a Python-based toolkit for applying weak supervision to a wide range of NLP tasks.

Community Highlights ✍️

We would like to thank:

@debeshjha1 for contributing to several leaderboards and datasets, including the Kvasir-Capsule-SEG dataset used for polyp segmentation.
@Xiaoshui_Juang for contributing to datasets, including the 3DCSR which is a 3D cross-source point cloud dataset for registration task.
@pcolombo for updating several leaderboards and for adding the SILICONE benchmark used for evaluating NLU systems designed for spoken language.
@gsxia for many contributions to Datasets, including DOTA which is a large-scale dataset for object detection in aerial images.
@raysonlaroca for several contributions to Leaderboards, Datasets, and Methods, including the addition of a new dataset for image-based automatic dial meter reading.

Special thanks to @pbateni, @sailorzhang, @Sanqing, @muhaochen, @kanishk95, and the hundreds of contributors for all their contributions to Papers with Code.

More from Papers with Code 🗣

Datasets on arXiv

Datasets used and introduced are now directly available directly from the Code & Data tab on arXiv.

We (Papers with Code in collaboration with arXiv) are excited to announce our latest partnership to support links to datasets on arXiv papers.

The new "Code & data" tab shows datasets used and introduced in the arXiv articles.

🔗 Read More

Follow our new @paperswithdata account on Twitter for a curated daily feed of newly published datasets in machine learning.

On Adopting the ML Reproducibility Challenge to your Course

In a recent blog post, Ana Lučić writes about how the University of Amsterdam incorporated the Machine Learning Reproducibility Challenge (MLRC) into a graduate-level course for students in the Master AI study program. They report that 9 of their papers got accepted at the MLRC.

🔗 Read more

---

We would be happy to hear your thoughts and suggestions on the newsletter. Please reply to elvis@paperswithcode.com.

See previous issues

Join us on Slack and Twitter