GAUDI: A Neural Architect for Immersive 3D Scene Generation

apple/ml-gaudi 27 Jul 2022

We introduce GAUDI, a generative model capable of capturing the distribution of complex and realistic 3D scenes that can be rendered immersively from a moving camera.

Image Generation Scene Generation

MinVIS: A Minimal Video Instance Segmentation Framework without Video-based Training

nvlabs/minvis 3 Aug 2022

By only training a query-based image instance segmentation model, MinVIS outperforms the previous best result on the challenging Occluded VIS dataset by over 10% AP.

Instance Segmentation +2

Hybrid Spectrogram and Waveform Source Separation

facebookresearch/demucs 5 Nov 2021

Source separation models either work on the spectrogram or waveform domain.

Music Source Separation

Expanding Language-Image Pretrained Models for General Video Recognition

microsoft/VideoX 4 Aug 2022

Extensive experiments demonstrate that our approach is effective and can be generalized to different video recognition scenarios.

Action Classification Action Recognition +2

MobileNeRF: Exploiting the Polygon Rasterization Pipeline for Efficient Neural Field Rendering on Mobile Architectures

google-research/jax3d 30 Jul 2022

Neural Radiance Fields (NeRFs) have demonstrated amazing ability to synthesize images of 3D scenes from novel views.

Novel View Synthesis

NAFSSR: Stereo Image Super-Resolution Using NAFNet

megvii-research/NAFNet 19 Apr 2022

This paper inherits a strong and simple image restoration model, NAFNet, for single-view feature extraction and extends it by adding cross attention modules to fuse features between views to adapt to binocular scenarios.

Image Restoration Stereo Image Super-Resolution

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

rinongal/textual_inversion 2 Aug 2022

Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.

Text to image generation Text-to-Image Generation

Multi-Scale 2D Temporal Adjacent Networks for Moment Localization with Natural Language

microsoft/2D-TAN 4 Dec 2020

It is a challenging problem because a target moment may take place in the context of other temporal moments in the untrimmed video.

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

wongkinyiu/yolov7 6 Jul 2022

YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.

Real-Time Object Detection

