Dilated Neighborhood Attention Transformer

SHI-Labs/Neighborhood-Attention-Transformer 29 Sep 2022

These models typically employ localized attention mechanisms, such as the sliding-window Neighborhood Attention (NA) or Swin Transformer's Shifted Window Self Attention.

Image Classification Instance Segmentation +2

SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model

atosystem/speechclip 3 Oct 2022

Data-driven speech processing models usually perform well with a large amount of text supervision, but collecting transcribed speech data is costly.

Language Modelling

IntrinsicNeRF: Learning Intrinsic Neural Radiance Fields for Editable Novel View Synthesis

zju3dv/intrinsicnerf 2 Oct 2022

Given that intrinsic decomposition is a fundamentally ambiguous and under-constrained inverse problem, we propose a novel distance-aware point sampling and adaptive reflectance iterative clustering optimization method that enables IntrinsicNeRF with traditional intrinsic decomposition constraints to be trained in an unsupervised manner, resulting in temporally consistent intrinsic decomposition results.

Neural Rendering Novel View Synthesis

facebookresearch/rl ICLR 2021

A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.


VIP: Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training

facebookresearch/vip 30 Sep 2022

Given the inherent cost and scarcity of in-domain, task-specific robot data, learning from large, diverse, offline human videos has emerged as a promising path towards acquiring a generally useful visual representation for control; however, how these human videos can be used for general-purpose reward learning remains an open question.

Offline RL Representation Learning

Efficient Few-Shot Learning Without Prompts

huggingface/setfit 22 Sep 2022

This simple framework requires no prompts or verbalizers, and achieves high accuracy with orders of magnitude less parameters than existing techniques.

Few-Shot Learning

VToonify: Controllable High-Resolution Portrait Video Style Transfer

williamyang1991/vtoonify 22 Sep 2022

Although a series of successful portrait image toonification models built upon the powerful StyleGAN have been proposed, these image-oriented methods have obvious limitations when applied to videos, such as the fixed frame size, the requirement of face alignment, missing non-facial details and temporal inconsistency.

Face Alignment Style Transfer +1

Robust Speech Recognition via Large-Scale Weak Supervision

openai/whisper Preprint 2022

We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet.

Robust Speech Recognition

High-Resolution Image Synthesis with Latent Diffusion Models

compvis/stable-diffusion CVPR 2022

By decomposing the image formation process into a sequential application of denoising autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results on image data and beyond.

Denoising Image Inpainting +3

PyEPO: A PyTorch-based End-to-End Predict-then-Optimize Library for Linear and Integer Programming

khalil-research/pyepo 28 Jun 2022

It provides two base algorithms: the first is based on the convex surrogate loss function from the seminal work of Elmachtoub & Grigas (2021), and the second is based on the differentiable black-box solver approach of Vlastelica et al. (2019).

