SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving

ganwanshui/simpleoccupancy 16 Mar 2023

Towards a more comprehensive perception of a 3D scene, in this paper, we propose a SurroundOcc method to predict the 3D occupancy with multi-camera images.

3D Object Detection Autonomous Driving +2

46
0.56 stars / hour

Wavelet Diffusion Models are fast and scalable Image Generators

vinairesearch/wavediff 29 Nov 2022

Diffusion models are rising as a powerful solution for high-fidelity image generation, which exceeds GANs in quality in many circumstances.

Image Generation

92
0.54 stars / hour

FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model

vvictoryuki/freedom 17 Mar 2023

In this work, we propose a training-Free conditional Diffusion Model (FreeDoM) used for various conditions.

Face Detection

49
0.53 stars / hour

Nerfstudio: A Modular Framework for Neural Radiance Field Development

nerfstudio-project/nerfstudio 8 Feb 2023

Neural Radiance Fields (NeRF) are a rapidly growing area of research with wide-ranging applications in computer vision, graphics, robotics, and more.

4,143
0.51 stars / hour

ChatGPT Asks, BLIP-2 Answers: Automatic Questioning Towards Enriched Visual Descriptions

vision-cair/chatcaptioner 12 Mar 2023

By keeping acquiring new visual information from BLIP-2's answers, ChatCaptioner is able to generate more enriched image descriptions.

Image Captioning Question Answering

127
0.51 stars / hour

Effectively Modeling Time Series with Simple Discrete State Spaces

hazyresearch/spacetime 16 Mar 2023

For expressivity, we propose a new SSM parameterization based on the companion matrix -- a canonical representation for discrete-time processes -- which enables SpaceTime's SSM layers to learn desirable autoregressive processes.

Time Series Classification

67
0.50 stars / hour

Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person Retrieval

anosorae/irra 22 Mar 2023

To alleviate these issues, we present IRRA: a cross-modal Implicit Relation Reasoning and Aligning framework that learns relations between local visual-textual tokens and enhances global image-text matching without requiring additional prior supervision.

 Ranked #1 on Text based Person Retrieval on RSTPReid (using extra training data)

Language Modelling Masked Language Modeling +6

25
0.50 stars / hour

SHERF: Generalizable Human NeRF from a Single Image

skhu101/sherf 22 Mar 2023

To this end, we propose a bank of 3D-aware hierarchical features, including global, point-level, and pixel-aligned features, to facilitate informative encoding.

3D Human Reconstruction

20
0.50 stars / hour

ReAct: Synergizing Reasoning and Acting in Language Models

ysymyth/ReAct 6 Oct 2022

While large language models (LLMs) have demonstrated impressive capabilities across tasks in language understanding and interactive decision making, their abilities for reasoning (e. g. chain-of-thought prompting) and acting (e. g. action plan generation) have primarily been studied as separate topics.

Decision Making Fact Verification +1

163
0.48 stars / hour

KERPLE: Kernelized Relative Positional Embedding for Length Extrapolation

eleutherai/gpt-neox 20 May 2022

Relative positional embeddings (RPE) have received considerable attention since RPEs effectively model the relative distance among tokens and enable length extrapolation.

Language Modelling

4,388
0.46 stars / hour