Search Results for author: Yihan Zeng

Found 18 papers, 6 papers with code

FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors

1 code implementation14 Jan 2025 Yabo Zhang, Xinpeng Zhou, Yihan Zeng, Hang Xu, Hui Li, WangMeng Zuo

We highlight the effectiveness and efficiency of FramePainter across various of editing signals: it domainantly outperforms previous state-of-the-art methods with far less training data, achieving highly seamless and coherent editing of images, \eg, automatically adjust the reflection of the cup.

Image to Video Generation

AtomThink: A Slow Thinking Framework for Multimodal Mathematical Reasoning

1 code implementation18 Nov 2024 Kun Xiang, Zhili Liu, Zihao Jiang, Yunshuang Nie, Runhui Huang, Haoxiang Fan, Hanhui Li, Weiran Huang, Yihan Zeng, Jianhua Han, Lanqing Hong, Hang Xu, Xiaodan Liang

In this paper, we address the challenging task of multimodal mathematical reasoning by incorporating the ability of ``slow thinking" into multimodal large language models (MLLMs).

Mathematical Reasoning

JointDreamer: Ensuring Geometry Consistency and Text Congruence in Text-to-3D Generation via Joint Score Distillation

no code implementations17 Jul 2024 Chenhan Jiang, Yihan Zeng, Tianyang Hu, Songcun Xu, Wei zhang, Hang Xu, Dit-yan Yeung

However, this paradigm distills view-agnostic 2D image distributions into the rendering distribution of 3D representation for each view independently, overlooking the coherence across views and yielding 3D inconsistency in generations.

3D Generation Text to 3D

DreamPhysics: Learning Physics-Based 3D Dynamics with Video Diffusion Priors

1 code implementation3 Jun 2024 Tianyu Huang, Haoze Zhang, Yihan Zeng, Zhilu Zhang, Hui Li, WangMeng Zuo, Rynson W. H. Lau

In this work, to combine the strengths and complementing shortcomings of the above two solutions, we propose to learn the physical properties of a material field with video diffusion priors, and then utilize a physics-based Material-Point-Method (MPM) simulator to generate 4D content with realistic motions.

Collaborative Novel Object Discovery and Box-Guided Cross-Modal Alignment for Open-Vocabulary 3D Object Detection

1 code implementation2 Jun 2024 Yang Cao, Yihan Zeng, Hang Xu, Dan Xu

3D-NOD is further extended with an Enrichment strategy that significantly enriches the novel object distribution in the training scenes, and then enhances the model's ability to localize more novel objects.

3D Object Detection cross-modal alignment +3

OpenOcc: Open Vocabulary 3D Scene Reconstruction via Occupancy Representation

no code implementations18 Mar 2024 Haochen Jiang, Yueming Xu, Yihan Zeng, Hang Xu, Wei zhang, Jianfeng Feng, Li Zhang

We model the geometric structure of the scene with occupancy representation and distill the pre-trained open vocabulary model into a 3D language field via volume rendering for zero-shot inference.

3D Reconstruction 3D Scene Reconstruction +3

GS-CLIP: Gaussian Splatting for Contrastive Language-Image-3D Pretraining from Real-World Data

no code implementations9 Feb 2024 Haoyuan Li, Yanpeng Zhou, Yihan Zeng, Hang Xu, Xiaodan Liang

3D Shape represented as point cloud has achieve advancements in multimodal pre-training to align image and language descriptions, which is curial to object identification, classification, and retrieval.

3DGS Language Modeling +2

PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with Time-Decoupled Training and Reusable Coop-Diffusion

no code implementations27 Dec 2023 Guansong Lu, Yuanfan Guo, Jianhua Han, Minzhe Niu, Yihan Zeng, Songcen Xu, Zeyi Huang, Zhao Zhong, Wei zhang, Hang Xu

Current large-scale diffusion models represent a giant leap forward in conditional image synthesis, capable of interpreting diverse cues like text, human poses, and edges.

Computational Efficiency Denoising +1

DreamControl: Control-Based Text-to-3D Generation with 3D Self-Prior

1 code implementation CVPR 2024 Tianyu Huang, Yihan Zeng, Zhilu Zhang, Wan Xu, Hang Xu, Songcen Xu, Rynson W. H. Lau, WangMeng Zuo

The priors are then regarded as input conditions to maintain reasonable geometries, in which conditional LoRA and weighted score are further proposed to optimize detailed textures.

3D Generation NeRF +1

CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection

1 code implementation NeurIPS 2023 Yang Cao, Yihan Zeng, Hang Xu, Dan Xu

Open-vocabulary 3D Object Detection (OV-3DDet) aims to detect objects from an arbitrary list of categories within a 3D scene, which remains seldom explored in the literature.

3D Object Detection cross-modal alignment +4

DiffDis: Empowering Generative Diffusion Model with Cross-Modal Discrimination Capability

no code implementations ICCV 2023 Runhui Huang, Jianhua Han, Guansong Lu, Xiaodan Liang, Yihan Zeng, Wei zhang, Hang Xu

DiffDis first formulates the image-text discriminative problem as a generative diffusion process of the text embedding from the text encoder conditioned on the image.

Image Generation Zero-Shot Learning

SUIT: Learning Significance-guided Information for 3D Temporal Detection

no code implementations4 Jul 2023 Zheyuan Zhou, Jiachen Lu, Yihan Zeng, Hang Xu, Li Zhang

To this end, we propose to learn Significance-gUided Information for 3D Temporal detection (SUIT), which simplifies temporal information as sparse features for information fusion across frames.

3D Object Detection Autonomous Driving +2

CLIP2: Contrastive Language-Image-Point Pretraining From Real-World Point Cloud Data

no code implementations CVPR 2023 Yihan Zeng, Chenhan Jiang, Jiageng Mao, Jianhua Han, Chaoqiang Ye, Qingqiu Huang, Dit-yan Yeung, Zhen Yang, Xiaodan Liang, Hang Xu

Contrastive Language-Image Pre-training, benefiting from large-scale unlabeled text-image pairs, has demonstrated great performance in open-world vision understanding tasks.

3D geometry

Cannot find the paper you are looking for? You can Submit a new open access paper.