Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models

lukashoel/text2room 21 Mar 2023

We present Text2Room, a method for generating room-scale textured 3D meshes from a given text prompt as input.

Monocular Depth Estimation

SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications

amshaker/swiftformer 27 Mar 2023

Using our proposed efficient additive attention, we build a series of models called "SwiftFormer" which achieves state-of-the-art performance in terms of both accuracy and mobile inference speed.

Equivariant Similarity for Vision-Language Foundation Models

wangt-cn/eqben 25 Mar 2023

Unlike the existing image-text similarity objective which only categorizes matched pairs as similar and unmatched pairs as dissimilar, equivariance also requires similarity to vary faithfully according to the semantic changes.

Retrieval Text Retrieval +1

ReVersion: Diffusion-Based Relation Inversion from Images

ziqihuangg/reversion 23 Mar 2023

Specifically, we propose a novel relation-steering contrastive learning scheme to impose two critical properties of the relation prompt: 1) The relation prompt should capture the interaction between objects, enforced by the preposition prior.

Contrastive Learning

Reflexion: an autonomous agent with dynamic memory and self-reflection

noahshinn024/reflexion 20 Mar 2023

To achieve full automation, we introduce a straightforward yet effective heuristic that enables the agent to pinpoint hallucination instances, avoid repetition in action sequences, and, in some environments, construct an internal memory map of the given environment.

Decision Making Language Modelling

ADAPT: Action-aware Driving Caption Transformer

jxbbb/adapt 1 Feb 2023

To bridge the gap, we propose an end-to-end transformer-based architecture, ADAPT (Action-aware Driving cAPtion Transformer), which provides user-friendly natural language narrations and reasoning for each decision making step of autonomous vehicular control and action.

Autonomous Driving Decision Making

Ref-NPR: Reference-Based Non-Photorealistic Radiance Fields for Controllable Scene Stylization

dvlab-research/ref-npr 6 Dec 2022

We propose a ray registration process based on the stylized reference view to obtain pseudo-ray supervision in novel views.

Semantic correspondence

Data-centric Artificial Intelligence: A Survey

daochenzha/data-centric-ai 17 Mar 2023

Artificial Intelligence (AI) is making a profound impact in almost every domain.

GLM-130B: An Open Bilingual Pre-trained Model

thudm/glm-130b 5 Oct 2022

We introduce GLM-130B, a bilingual (English and Chinese) pre-trained language model with 130 billion parameters.

Language Modelling Multi-task Language Understanding +1

