1 code implementation • 14 Jan 2025 • Yabo Zhang, Xinpeng Zhou, Yihan Zeng, Hang Xu, Hui Li, WangMeng Zuo
We highlight the effectiveness and efficiency of FramePainter across various of editing signals: it domainantly outperforms previous state-of-the-art methods with far less training data, achieving highly seamless and coherent editing of images, \eg, automatically adjust the reflection of the cup.
1 code implementation • 18 Nov 2024 • Kun Xiang, Zhili Liu, Zihao Jiang, Yunshuang Nie, Runhui Huang, Haoxiang Fan, Hanhui Li, Weiran Huang, Yihan Zeng, Jianhua Han, Lanqing Hong, Hang Xu, Xiaodan Liang
In this paper, we address the challenging task of multimodal mathematical reasoning by incorporating the ability of ``slow thinking" into multimodal large language models (MLLMs).
no code implementations • 26 Sep 2024 • Kai Chen, Yunhao Gou, Runhui Huang, Zhili Liu, Daxin Tan, Jing Xu, Chunwei Wang, Yi Zhu, Yihan Zeng, Kuo Yang, Dingdong Wang, Kun Xiang, Haoyuan Li, Haoli Bai, Jianhua Han, Xiaohui Li, Weike Jin, Nian Xie, Yu Zhang, James T. Kwok, Hengshuang Zhao, Xiaodan Liang, Dit-yan Yeung, Xiao Chen, Zhenguo Li, Wei zhang, Qun Liu, Jun Yao, Lanqing Hong, Lu Hou, Hang Xu
GPT-4o, an omni-modal model that enables vocal conversations with diverse emotions and tones, marks a milestone for omni-modal foundation models.
no code implementations • 17 Jul 2024 • Chenhan Jiang, Yihan Zeng, Tianyang Hu, Songcun Xu, Wei zhang, Hang Xu, Dit-yan Yeung
However, this paradigm distills view-agnostic 2D image distributions into the rendering distribution of 3D representation for each view independently, overlooking the coherence across views and yielding 3D inconsistency in generations.
1 code implementation • 3 Jun 2024 • Tianyu Huang, Haoze Zhang, Yihan Zeng, Zhilu Zhang, Hui Li, WangMeng Zuo, Rynson W. H. Lau
In this work, to combine the strengths and complementing shortcomings of the above two solutions, we propose to learn the physical properties of a material field with video diffusion priors, and then utilize a physics-based Material-Point-Method (MPM) simulator to generate 4D content with realistic motions.
1 code implementation • 2 Jun 2024 • Yang Cao, Yihan Zeng, Hang Xu, Dan Xu
3D-NOD is further extended with an Enrichment strategy that significantly enriches the novel object distribution in the training scenes, and then enhances the model's ability to localize more novel objects.
no code implementations • 18 Mar 2024 • Haochen Jiang, Yueming Xu, Yihan Zeng, Hang Xu, Wei zhang, Jianfeng Feng, Li Zhang
We model the geometric structure of the scene with occupancy representation and distill the pre-trained open vocabulary model into a 3D language field via volume rendering for zero-shot inference.
no code implementations • 9 Feb 2024 • Haoyuan Li, Yanpeng Zhou, Yihan Zeng, Hang Xu, Xiaodan Liang
3D Shape represented as point cloud has achieve advancements in multimodal pre-training to align image and language descriptions, which is curial to object identification, classification, and retrieval.
no code implementations • 27 Dec 2023 • Guansong Lu, Yuanfan Guo, Jianhua Han, Minzhe Niu, Yihan Zeng, Songcen Xu, Zeyi Huang, Zhao Zhong, Wei zhang, Hang Xu
Current large-scale diffusion models represent a giant leap forward in conditional image synthesis, capable of interpreting diverse cues like text, human poses, and edges.
1 code implementation • CVPR 2024 • Tianyu Huang, Yihan Zeng, Zhilu Zhang, Wan Xu, Hang Xu, Songcen Xu, Rynson W. H. Lau, WangMeng Zuo
The priors are then regarded as input conditions to maintain reasonable geometries, in which conditional LoRA and weighted score are further proposed to optimize detailed textures.
1 code implementation • NeurIPS 2023 • Yang Cao, Yihan Zeng, Hang Xu, Dan Xu
Open-vocabulary 3D Object Detection (OV-3DDet) aims to detect objects from an arbitrary list of categories within a 3D scene, which remains seldom explored in the literature.
no code implementations • 29 Sep 2023 • Tianyu Huang, Yihan Zeng, Bowen Dong, Hang Xu, Songcen Xu, Rynson W. H. Lau, WangMeng Zuo
To this end, an NTFGen module is proposed to model general text latent code in noisy fields.
no code implementations • ICCV 2023 • Cuican Yu, Guansong Lu, Yihan Zeng, Jian Sun, Xiaodan Liang, Huibin Li, Zongben Xu, Songcen Xu, Wei zhang, Hang Xu
In this paper, we propose a text-guided 3D faces generation method, refer as TG-3DFace, for generating realistic 3D faces using text guidance.
no code implementations • ICCV 2023 • Runhui Huang, Jianhua Han, Guansong Lu, Xiaodan Liang, Yihan Zeng, Wei zhang, Hang Xu
DiffDis first formulates the image-text discriminative problem as a generative diffusion process of the text embedding from the text encoder conditioned on the image.
no code implementations • 4 Jul 2023 • Zheyuan Zhou, Jiachen Lu, Yihan Zeng, Hang Xu, Li Zhang
To this end, we propose to learn Significance-gUided Information for 3D Temporal detection (SUIT), which simplifies temporal information as sparse features for information fusion across frames.
no code implementations • 22 Mar 2023 • Yihan Zeng, Chenhan Jiang, Jiageng Mao, Jianhua Han, Chaoqiang Ye, Qingqiu Huang, Dit-yan Yeung, Zhen Yang, Xiaodan Liang, Hang Xu
Contrastive Language-Image Pre-training, benefiting from large-scale unlabeled text-image pairs, has demonstrated great performance in open-world vision understanding tasks.
Ranked #3 on
Zero-shot 3D Point Cloud Classification
on ScanNetV2
no code implementations • CVPR 2023 • Yihan Zeng, Chenhan Jiang, Jiageng Mao, Jianhua Han, Chaoqiang Ye, Qingqiu Huang, Dit-yan Yeung, Zhen Yang, Xiaodan Liang, Hang Xu
Contrastive Language-Image Pre-training, benefiting from large-scale unlabeled text-image pairs, has demonstrated great performance in open-world vision understanding tasks.
no code implementations • CVPR 2022 • Yihan Zeng, Da Zhang, Chunwei Wang, Zhenwei Miao, Ting Liu, Xin Zhan, Dayang Hao, Chao Ma
LiDAR and camera are two common sensors to collect data in time for 3D object detection under the autonomous driving context.