January 07, 2021

Papers with Code Newsletter #2

Hi everyone,

Welcome to issue #2 of the Papers with Code Newsletter. This issue features works that range from an efficient image transformer trained using distillation through attention to an approach for cross-document language modeling to advances in real-time instance segmentation and recognizing emotion cause in conversations. The issue also highlights a federated learning framework and an open-source video perception toolbox, among other developments in NLP, generative models, object detection, and face identity disentanglement.

Trending Papers with Code

Computer Vision 🏞

Training data-efficient image transformers & distillation through attention (paper & code) - This work introduces a teacher-student strategy for training based on distillation through attention specific to transformers. The vision transformer model, called DeiT, achieves a top-1 accuracy of 83.1% (single-crop evaluation) with no external data and 86M parameters. The implementation builds on top of the TIMM library by Ross Wightman.
Trending with 1008 β˜…

Throughput and accuracy on Imagenet compared to several state-of-the-art convnets, trained on Imagenet1k only. (Figure source: Touvron et al. (2020))

YolactEdge: Real-time Instance Segmentation on the Edge (Jetson AGX Xavier: 30 FPS, RTX 2080 Ti: 170 FPS) (paper & code) - YolactEdge is an approach leveraging useful techniques (like quantizing network parameters and partial feature transformation) that aims to reduce compute for real-time instance segmentation on small edge devices.
Trending with 557 β˜…

Focal Frequency Loss for Generative Models (paper & code) - Results from previous studies on generative models observe that the frequency domain gap between real and fake images could be a common issue. To address this, Jiang et al. propose a novel focal frequency loss (FFL) that directly brings optimization of generative models into the frequency domain. The approach improves various baselines in both perceptual quality and quantitative performance.
Trending with 111 β˜…

Pix2pix image-to-image translation results (with and without FFL) on CMP Facades (256 Γ— 256) and edges β†’ shoes (256 Γ— 256) datasets. (Figure source: Jiang et al. (2020))

SWA Object Detection (paper & code) - This work uses a recipe inspired by Stochastic Weights Averaging (SWA) to develop an approach for improving generalization in neural networks. Authors show that performing SWA with the proposed training policy consistently improves results in object detection by ~1.0 AP of different detectors like Mask R-CNN and YOLOv3. 
Trending with 86 β˜…

Face Identity Disentanglement via Latent Space Mapping (paper & code) - Current methods for learning disentangled representations of data rely on extensive supervision and training. Nitzan et al. propose an approach that requires minimal supervision through decoupling the process of disentanglement and synthesis. Authors claim to achieve success with disentangling identity from other facial attributes such as pose and expression, and preserving one while manipulating the other.
Trending with 52 β˜…

Disentanglement scheme used in the human head domain. (Figure source: Nitzan et al. (2020))

Soft-IntroVAE: Analyzing and Improving the Introspective Variational Autoencoder (paper & code) -  Daniel and Tamar propose a variant of IntroVAE called Soft-IntroVAE with a smooth exponential loss that allows for theoretical analysis and induces competitive results on image generation and reconstruction. Soft-IntroVAE aims to strengthen the loss formulation and stabilize training.
Trending with 36 β˜…

NLP πŸ” 

Cross-Document Language Modeling (paper & code) - This work proposes new pretraining approach for language modeling to support multi-document NLP tasks. The proposed cross-document language model extends Longformer and claim state-of-the-art results on several multi-text tasks such as event and entity cross-document coreference resolution.
Top model on Entity and Event Cross-Document Coreference Resolution

DynaSent: A Dynamic Benchmark for Sentiment Analysis (paper & code) - DynaSent is a new English-language dynamic benchmark for ternary sentiment analysis. DynaSent has a total of 121,634 sentences. The authors discuss the data collection and strategy to continue the dynamic evolution of the benchmark. 
Trending with 128 β˜…

Voice Separation with an Unknown Number of Multiple Speakers (paper & code) - Voice separation deals with separating mixed audio sequences. Nachmani et al. introduce an approach based on RNNs for separating voices at multiple processing steps. Particularly, the approach performs well compared to current methods when the unknown number or speakers increases.
Trending with 624 β˜…

Learning Dense Representations of Phrases at Scale (paper & code) - Proposes a model called DensePhrase that learns dense query-agnostic phrase representations via question generation and distillation to achieve competitive results in open-domain questions answering (QA). DensePhrase improves performance across popular QA datasets, including state-of-the-art results on NQ (79.6) and KILT: T-REx (27.84).
Top model on KILT: T-REx and Natural Questions (NQ)

On Generating Extended Summaries of Long Documents (paper & code) - This work develops a multi-task learning approach to perform the task of extended summarization which aims to capture salient points from long form documents like research papers. The approach outperforms previous systems on the blind test set of Longsumm shared task.
Trending with 43 β˜…

Libraries and Community Implementations πŸ› 

MMTracking (code) - an open-source video perception toolbox based on PyTorch supporting multiple tasks in a unified framework: video object detection, multi object tracking, and single object tracking. 
Trending with 615 β˜…

Demonstrates video perception capabilities like multiple object detection. (Figure source: MMTracking)

Flower: A Friendly Federated Learning Research Framework (paper & code) - Flower is a federated learning (FL) framework built to be agnostic towards heterogeneous client environments and have the capability to scale to a large number of clients, including mobile and embedded devices. This paper provides more design and implementation details about Flower.
Trending with 219 β˜…

kōan: A Corrected CBOW Implementation (paper & code) - kōan implements a CBOW algorithm that claims competitive results to skip-gram embeddings on various intrinsic and extrinsic tasks. 
Trending with 173 β˜…

Found the highlights interesting? Check out more of the latest trending papers with code here.

Community Highlights

Bio/Social NLP πŸ” 

Clinical Entity Recognition (paper & code) - Kocaman & Talby present a clinical text mining system that can recognize over 100 different clinical entity types and perform clinical assertion status detection to distinguish between clinical facts that are present, absent, conditional, or about someone other than the patient. Work improves on the previous best performing benchmarks for clinical assertion status detection.
Thanks to Veysel Kocaman for adding results and task descriptions.

Emotion Cause in Conversations (paper & code) Recognizing emotion cause in text is challenging yet a key step toward causal reasoning that also aid model interpretability. Poria et al. (2020) constructed a dataset called RECCON comprising emotional causes of 6200 emotional utterances in 1000 dialogues. The authors introduce two challenging subtasks of RECCON -- 1) Causal Span Extraction and 2) Causal Emotion Entailment where Transformer and BERT types models struggle in achieving decent performance which leaves room for future model improvement. 
Thanks to Soujanya Poria for adding results and task descriptions

Emotion causes in conversations. (Figure source: Poria et al. (2020))

Interpretability πŸ”

Interpretable agents for RL tasks (paper & code) - This work by Custode & Iacca focuses on obtaining interpretable agents for reinforcement learning tasks. The results are competitive with the state of the art in several reinforcement learning benchmarks. They show that by analyzing the solutions obtained one can understand the agent's inner working and gain knowledge about the problem faced.
Thanks to Leonardo Lucio Custode for adding results and task descriptions.

Image Clustering πŸ§©

Contrastive Clustering (paper & codeBy treating labels as representation, work by Li et al. observes that rows and columns of the feature matrix correspond to the representation of instance and cluster, respectively. Based on the observation, authors propose an online clustering method that simultaneously performs contrastive learning at both the instance- and cluster-level. The proposed method achieves competitive results on six image datasets.
Thanks to Yufan Li for adding results and task descriptions.

Notable Community Mentions βœοΈ

A huge thanks to @evalai, @Draguns, @DeLightCMU, @alexiszam, @nwoyecid and hundreds of other contributors for their multiple contributions to results, methods, and tasks.

If you want to learn how contribute to results, methods, and tasks, reach out at elvis@paperswithcode.com or join the Slack group (#contributions channel).

Announcements and Final Words

Image Papers with Code now integrates with ongoing competitions! Our first partner is EvalAI - huge thanks to @rishabhjain2018! Check out some of the new Eval AI leaderboards:Learn more about how to integrate your competitions here

Image for post Last week we published our Year in Review for 2020 where we took a look back at the top trending papers, libraries and benchmarks for 2020.

We would love to hear your thoughts, feedback and suggestions for the newsletter. Please reach out to elvis@paperswithcode.com

Join the community: Slack | Twitter 

Thanks for reading,
The Papers With Code Team