GAUDI: A Neural Architect for Immersive 3D Scene Generation

apple/ml-gaudi 27 Jul 2022

We introduce GAUDI, a generative model capable of capturing the distribution of complex and realistic 3D scenes that can be rendered immersively from a moving camera.

Image Generation Scene Generation

Hybrid Spectrogram and Waveform Source Separation

facebookresearch/demucs 5 Nov 2021

Source separation models either work on the spectrogram or waveform domain.

Music Source Separation

MinVIS: A Minimal Video Instance Segmentation Framework without Video-based Training

nvlabs/minvis 3 Aug 2022

By only training a query-based image instance segmentation model, MinVIS outperforms the previous best result on the challenging Occluded VIS dataset by over 10% AP.

Instance Segmentation +2

Expanding Language-Image Pretrained Models for General Video Recognition

microsoft/VideoX 4 Aug 2022

Extensive experiments demonstrate that our approach is effective and can be generalized to different video recognition scenarios.

Action Classification Action Recognition +2

MobileNeRF: Exploiting the Polygon Rasterization Pipeline for Efficient Neural Field Rendering on Mobile Architectures

google-research/jax3d 30 Jul 2022

Neural Radiance Fields (NeRFs) have demonstrated amazing ability to synthesize images of 3D scenes from novel views.

Novel View Synthesis

Multi-Scale 2D Temporal Adjacent Networks for Moment Localization with Natural Language

microsoft/2D-TAN 4 Dec 2020

It is a challenging problem because a target moment may take place in the context of other temporal moments in the untrimmed video.

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

wongkinyiu/yolov7 6 Jul 2022

YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.

Real-Time Object Detection

NAFSSR: Stereo Image Super-Resolution Using NAFNet

megvii-research/NAFNet 19 Apr 2022

This paper inherits a strong and simple image restoration model, NAFNet, for single-view feature extraction and extends it by adding cross attention modules to fuse features between views to adapt to binocular scenarios.

Image Restoration Stereo Image Super-Resolution

Elucidating the Design Space of Diffusion-Based Generative Models

lucidrains/imagen-pytorch 1 Jun 2022

We argue that the theory and practice of diffusion-based generative models are currently unnecessarily convoluted and seek to remedy the situation by presenting a design space that clearly separates the concrete design choices.

Image Generation

