TorchScale: Transformers at Scale

microsoft/torchscale 23 Nov 2022

Large Transformers have achieved state-of-the-art performance across many tasks.

Language Modelling Machine Translation +1

SuperFusion: Multilevel LiDAR-Camera Fusion for Long-Range HD Map Generation and Prediction

haomo-ai/superfusion 28 Nov 2022

To this end, we propose a novel network named SuperFusion, exploiting the fusion of LiDAR and camera data at multiple levels.

Autonomous Driving

Human-level play in the game of Diplomacy by combining language models with strategic reasoning

facebookresearch/diplomacy_cicero Science 2022

Despite much progress in training AI systems to imitate human language, building agents that use language to communicate intentionally with humans in interactive environments remains a major challenge.


FFHQ-UV: Normalized Facial UV-Texture Dataset for 3D Face Reconstruction

csbhr/ffhq-uv 25 Nov 2022

Our pipeline utilizes the recent advances in StyleGAN-based facial image editing approaches to generate multi-view normalized face images from single-image inputs.

3D Face Reconstruction

Fast-SNARF: A Fast Deformer for Articulated Neural Fields

xuchen-ethz/fast-snarf 28 Nov 2022

A key challenge in making such methods applicable to articulated objects, such as the human body, is to model the deformation of 3D locations between the rest pose (a canonical space) and the deformed space.

3D Reconstruction Novel View Synthesis

MetaFormer Baselines for Vision

facebookresearch/xformers 24 Oct 2022

By simply applying depthwise separable convolutions as token mixer in the bottom stages and vanilla self-attention in the top stages, the resulting model CAFormer sets a new record on ImageNet-1K: it achieves an accuracy of 85. 5% at 224x224 resolution, under normal supervised training without external data or distillation.

Image Classification

Towards Robust Blind Face Restoration with Codebook Lookup Transformer

sczhou/codeformer 22 Jun 2022

In this paper, we demonstrate that a learned discrete codebook prior in a small proxy space largely reduces the uncertainty and ambiguity of restoration mapping by casting blind face restoration as a code prediction task, while providing rich visual atoms for generating high-quality faces.

Blind Face Restoration

LiT: Zero-Shot Transfer with Locked-image text Tuning

mlfoundations/open_clip CVPR 2022

This paper presents contrastive-tuning, a simple method employing contrastive training to align image and text models while still taking advantage of their pre-training.

Image Classification Retrieval +2

ZeroEGGS: Zero-shot Example-based Gesture Generation from Speech

ubisoft/ubisoft-laforge-ZeroEGGS 15 Sep 2022

In a series of experiments, we first demonstrate the flexibility and generalizability of our model to new speakers and styles.

Gesture Generation

Versatile Diffusion: Text, Images and Variations All in One Diffusion Model

shi-labs/versatile-diffusion 15 Nov 2022

Through our experiments, we demonstrate that VD and its underlying framework have the following merits: a) VD handles all subtasks with competitive quality; b) VD initiates novel extensions and applications such as disentanglement of style and semantic, image-text dual-guided generation, etc.

Disentanglement Image Captioning +4

