TabDDPM: Modelling Tabular Data with Diffusion Models

rotot0/tab-ddpm 30 Sep 2022

Denoising diffusion probabilistic models are currently becoming the leading paradigm of generative modeling for many important data modalities.


SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model

atosystem/speechclip 3 Oct 2022

Data-driven speech processing models usually perform well with a large amount of text supervision, but collecting transcribed speech data is costly.

Language Modelling

Dilated Neighborhood Attention Transformer

SHI-Labs/Neighborhood-Attention-Transformer 29 Sep 2022

These models typically employ localized attention mechanisms, such as the sliding-window Neighborhood Attention (NA) or Swin Transformer's Shifted Window Self Attention.

Image Classification Instance Segmentation +2

facebookresearch/rl 9 Sep 2015

A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.

Continuous Control Q-Learning +2

LAVIS: A Library for Language-Vision Intelligence

salesforce/lavis 15 Sep 2022

We introduce LAVIS, an open-source deep learning library for LAnguage-VISion research and applications.

Image Captioning Image Retrieval +6

SoundStream: An End-to-End Neural Audio Codec

google/lyra 7 Jul 2021

We present SoundStream, a novel neural audio codec that can efficiently compress speech, music and general audio at bitrates normally targeted by speech-tailored codecs.

Speech Enhancement

VIP: Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training

facebookresearch/vip 30 Sep 2022

Given the inherent cost and scarcity of in-domain, task-specific robot data, learning from large, diverse, offline human videos has emerged as a promising path towards acquiring a generally useful visual representation for control; however, how these human videos can be used for general-purpose reward learning remains an open question.

Offline RL Representation Learning

Robust Speech Recognition via Large-Scale Weak Supervision

openai/whisper Preprint 2022

We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet.

Robust Speech Recognition

Mintaka: A Complex, Natural, and Multilingual Dataset for End-to-End Question Answering

amazon-research/mintaka 4 Oct 2022

We introduce Mintaka, a complex, natural, and multilingual dataset designed for experimenting with end-to-end question-answering models.

Question Answering

IntrinsicNeRF: Learning Intrinsic Neural Radiance Fields for Editable Novel View Synthesis

zju3dv/intrinsicnerf 2 Oct 2022

Given that intrinsic decomposition is a fundamentally ambiguous and under-constrained inverse problem, we propose a novel distance-aware point sampling and adaptive reflectance iterative clustering optimization method that enables IntrinsicNeRF with traditional intrinsic decomposition constraints to be trained in an unsupervised manner, resulting in temporally consistent intrinsic decomposition results.

Neural Rendering Novel View Synthesis

