DiffBEV: Conditional Diffusion Model for Bird's Eye View Perception

jiayuzou2020/diffbev 15 Mar 2023

Diffusion models naturally have the ability to denoise noisy samples to the ideal data, which motivates us to utilize the diffusion model to get a better BEV representation.

3D Object Detection Autonomous Driving +3

FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model

vvictoryuki/freedom 17 Mar 2023

In this work, we propose a training-Free conditional Diffusion Model (FreeDoM) used for various conditions.

Face Detection

VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation

modelscope/modelscope 15 Mar 2023

A diffusion probabilistic model (DPM), which constructs a forward diffusion process by gradually adding noise to data points and learns the reverse denoising process to generate new samples, has been shown to handle complex data distribution.

Denoising Image Generation +1

BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision

opengvlab/internimage 18 Nov 2022

The proposed method is verified with a wide spectrum of traditional and modern image backbones and achieves new SoTA results on the large-scale nuScenes dataset.

3D Object Detection

ChatGPT Asks, BLIP-2 Answers: Automatic Questioning Towards Enriched Visual Descriptions

vision-cair/chatcaptioner 12 Mar 2023

By keeping acquiring new visual information from BLIP-2's answers, ChatCaptioner is able to generate more enriched image descriptions.

Image Captioning Question Answering

Effectively Modeling Time Series with Simple Discrete State Spaces

hazyresearch/spacetime 16 Mar 2023

For expressivity, we propose a new SSM parameterization based on the companion matrix -- a canonical representation for discrete-time processes -- which enables SpaceTime's SSM layers to learn desirable autoregressive processes.

Time Series Classification

Eliciting Latent Predictions from Transformers with the Tuned Lens

alignmentresearch/tuned-lens 14 Mar 2023

We analyze transformers from the perspective of iterative inference, seeking to understand how model predictions are refined layer by layer.

Language Modelling

Universal Instance Perception as Object Discovery and Retrieval

MasterBin-IIAU/UNINEXT 12 Mar 2023

All instance perception tasks aim at finding certain objects specified by some queries such as category names, language expressions, and target annotations, but this complete field has been split into multiple independent subtasks.

 Ranked #1 on Referring Expression Comprehension on RefCOCOg-test (using extra training data)

Multi-Object Tracking and Segmentation Multiple Object Tracking +12

Parameter is Not All You Need: Starting from Non-Parametric Networks for 3D Point Cloud Analysis

zrrskywalker/point-nn 14 Mar 2023

We present a Non-parametric Network for 3D point cloud analysis, Point-NN, which consists of purely non-learnable components: farthest point sampling (FPS), k-nearest neighbors (k-NN), and pooling operations, with trigonometric functions.

3D Point Cloud Classification Training-free 3D Part Segmentation +1

LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

timdettmers/bitsandbytes 15 Aug 2022

We develop a procedure for Int8 matrix multiplication for feed-forward and attention projection layers in transformers, which cut the memory needed for inference by half while retaining full precision performance.

Language Modelling Linguistic Acceptability +4

