YOLOX: Exceeding YOLO Series in 2021

Megvii-BaseDetection/YOLOX 18 Jul 2021

In this report, we present some experienced improvements to YOLO series, forming a new high-performance detector -- YOLOX.

Autonomous Driving

Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data

xinntao/Real-ESRGAN 22 Jul 2021

Though many attempts have been made in blind super-resolution to restore low-resolution images with unknown and complex degradations, they are still far from addressing general real-world degraded images.


Contextual Transformer Networks for Visual Recognition

JDAI-CV/CoTNet 26 Jul 2021

Such design fully capitalizes on the contextual information among input keys to guide the learning of dynamic attention matrix and thus strengthens the capacity of visual representation.

Instance Segmentation Object Detection +1

Human Pose Regression with Residual Log-likelihood Estimation

Jeff-sjtu/res-loglikelihood-regression 23 Jul 2021

In light of this, we propose a novel regression paradigm with Residual Log-likelihood Estimation (RLE) to capture the underlying output distribution.

Multi-Person Pose Estimation

U$^2$-Net: Going Deeper with Nested U-Structure for Salient Object Detection

nadermx/backgroundremover 18 May 2020

In this paper, we design a simple yet powerful deep network architecture, U$^2$-Net, for salient object detection (SOD).

Image Classification RGB Salient Object Detection +2

Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis

codestella/putting-nerf-on-a-diet 1 Apr 2021

We present DietNeRF, a 3D neural scene representation estimated from a few images.

Image Reconstruction

Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech

huawei-noah/Speech-Backbones 13 May 2021

Recently, denoising diffusion probabilistic models and generative score matching have shown high potential in modelling complex data distributions while stochastic calculus has provided a unified point of view on these techniques allowing for flexible inference schemes.

Speech Synthesis Text-To-Speech Synthesis

Highly accurate protein structure prediction with AlphaFold

deepmind/alphafold Nature 2021

Accurate computational approaches are needed to address this gap and to enable large-scale structural bioinformatics.

Protein Folding Protein Structure Prediction

LARGE: Latent-Based Regression through GAN Semantics

YotamNitzan/LARGE 22 Jul 2021

For modern generative frameworks, this semantic encoding manifests as smooth, linear directions which affect image attributes in a disentangled manner.

Brax -- A Differentiable Physics Engine for Large Scale Rigid Body Simulation

google/brax 24 Jun 2021

We present Brax, an open source library for rigid body simulation with a focus on performance and parallelism on accelerators, written in JAX.

OpenAI Gym

