2 code implementations • 24 Dec 2024 • Huanjin Yao, Jiaxing Huang, Wenhao Wu, Jingyi Zhang, Yibo Wang, Shunyu Liu, Yingjie Wang, Yuxin Song, Haocheng Feng, Li Shen, DaCheng Tao
Using CoMCTS, we construct Mulberry-260k, a multimodal dataset with a tree of rich, explicit and well-defined reasoning nodes for each question.
1 code implementation • 9 Dec 2024 • Zheng Chen, Chenming Wu, Zhelun Shen, Chen Zhao, Weicai Ye, Haocheng Feng, Errui Ding, Song-Hai Zhang
Wide-baseline panoramic images are frequently used in applications like VR and simulations to minimize capturing labor costs and storage needs.
no code implementations • 14 Oct 2024 • Jiazhi Guan, Quanwei Yang, Kaisiyuan Wang, Hang Zhou, Shengyi He, Zhiliang Xu, Haocheng Feng, Errui Ding, Jingdong Wang, Hongtao Xie, Youjian Zhao, Ziwei Liu
We propose a Motion-Enhanced Textural Alignment module to enhance the bond between driving and target signals.
1 code implementation • 24 Sep 2024 • Chuyang Zhao, Yuxing Song, Wenhao Wang, Haocheng Feng, Errui Ding, Yifan Sun, Xinyan Xiao, Jingdong Wang
Most existing multimodality methods use separate backbones for autoregression-based discrete text generation and diffusion-based continuous visual generation, or the same backbone by discretizing the visual data to use autoregression for both text and visual generation.
no code implementations • 6 Aug 2024 • Jiazhi Guan, Zhiliang Xu, Hang Zhou, Kaisiyuan Wang, Shengyi He, Zhanwang Zhang, Borong Liang, Haocheng Feng, Errui Ding, Jingtuo Liu, Jingdong Wang, Youjian Zhao, Ziwei Liu
Lip-syncing videos with given audio is the foundation for various applications including the creation of virtual presenters or performers.
no code implementations • 26 Jun 2024 • Hao Li, Jingfeng Li, Dingwen Zhang, Chenming Wu, Jieqi Shi, Chen Zhao, Haocheng Feng, Errui Ding, Jingdong Wang, Junwei Han
Dynamic Gaussian splatting has led to impressive scene reconstruction and image synthesis advances in novel views.
no code implementations • 26 Jun 2024 • Hao Li, Ming Yuan, Yan Zhang, Chenming Wu, Chen Zhao, Chunyu Song, Haocheng Feng, Errui Ding, Dingwen Zhang, Jingdong Wang
To address this, this paper presents a novel driving view synthesis dataset and benchmark specifically designed for autonomous driving simulations.
no code implementations • 4 Jun 2024 • Yanmin Wu, Jiarui Meng, Haijie Li, Chenming Wu, Yahao Shi, Xinhua Cheng, Chen Zhao, Haocheng Feng, Errui Ding, Jingdong Wang, Jian Zhang
To ensure robust feature presentation and 3D point-level understanding, we first employ SAM masks without cross-frame associations to train instance features with 3D consistency.
1 code implementation • 22 May 2024 • Huanjin Yao, Wenhao Wu, Taojiannan Yang, Yuxin Song, Mengxi Zhang, Haocheng Feng, Yifan Sun, Zhiheng Li, Wanli Ouyang, Jingdong Wang
We witness the rise of larger and higher-quality instruction datasets, as well as the involvement of larger-sized LLMs.
1 code implementation • 18 May 2024 • Mengxi Zhang, Wenhao Wu, Yu Lu, Yuxin Song, Kang Rong, Huanjin Yao, Jianbo Zhao, Fanglong Liu, Yifan Sun, Haocheng Feng, Jingdong Wang
To verify our viewpoint, we present the Automated Multi-level Preference (AMP) framework for MLLMs.
no code implementations • 22 Mar 2024 • Jinbo Wu, Xing Liu, Chenming Wu, Xiaobo Gao, Jialun Liu, Xinqi Liu, Chen Zhao, Haocheng Feng, Errui Ding, Jingdong Wang
We propose an optimal viewpoint selection strategy, that finds the most miniature set of viewpoints covering all the faces of a mesh.
no code implementations • 15 Mar 2024 • Hao Li, Yuanyuan Gao, Chenming Wu, Dingwen Zhang, Yalun Dai, Chen Zhao, Haocheng Feng, Errui Ding, Jingdong Wang, Junwei Han
Specifically, we design a novel joint learning framework that consists of an Iterative Pose Optimization Network (IPO-Net) and a Generalizable 3D-Gaussians (G-3DG) model.
no code implementations • 26 Feb 2024 • Xinqi Liu, Chenming Wu, Jialun Liu, Xing Liu, Jinbo Wu, Chen Zhao, Haocheng Feng, Errui Ding, Jingdong Wang
In this paper, we present a novel method that facilitates the creation of vivid 3D Gaussian avatars from monocular video inputs (GVA).
no code implementations • CVPR 2024 • Yu Wang, Xin Li, Shengzhao Weng, Gang Zhang, Haixiao Yue, Haocheng Feng, Junyu Han, Errui Ding
Based on this observation we propose the first general knowledge distillation paradigm for DETR(KD-DETR) with consistent distillation points sampling for both homogeneous and heterogeneous distillation.
no code implementations • CVPR 2024 • Jialun Liu, Chenming Wu, Xinqi Liu, Xing Liu, Jinbo Wu, Haotian Peng, Chen Zhao, Haocheng Feng, Jingtuo Liu, Errui Ding
This model gradually reduces the texture noise on the octree nodes resulting in the restoration of fine texture.
1 code implementation • 8 Dec 2023 • Yahao Shi, Yanmin Wu, Chenming Wu, Xing Liu, Chen Zhao, Haocheng Feng, Jian Zhang, Bin Zhou, Errui Ding, Jingdong Wang
Our method achieves state-of-the-art performance in both relighting and novel view synthesis tasks among the recently proposed inverse rendering methods while achieving real-time rendering.
no code implementations • 11 Oct 2023 • Deli Yu, Teng Xi, Jianwei Li, Baopu Li, Gang Zhang, Haocheng Feng, Junyu Han, Jingtuo Liu, Errui Ding, Jingdong Wang
On one hand, different images share more similar attention patterns in early layers than later layers, indicating that the dynamic query-by-key self-attention matrix may be replaced with a static self-attention matrix in early layers.
no code implementations • 1 Sep 2023 • Xin Li, Wenqing Chu, Ye Wu, Weihang Yuan, Fanglong Liu, Qi Zhang, Fu Li, Haocheng Feng, Errui Ding, Jingdong Wang
In this paper, we present VideoGen, a text-to-video generation approach, which can generate a high-definition video with high frame fidelity and strong temporal consistency using reference-guided latent diffusion.
no code implementations • 30 Jul 2023 • Jinbo Wu, Xiaobo Gao, Xing Liu, Zhengyang Shen, Chen Zhao, Haocheng Feng, Jingtuo Liu, Errui Ding
In this paper, we study Text-to-3D content generation leveraging 2D diffusion priors to enhance the quality and detail of the generated 3D models.
no code implementations • CVPR 2023 • Jiazhi Guan, Zhanwang Zhang, Hang Zhou, Tianshu Hu, Kaisiyuan Wang, Dongliang He, Haocheng Feng, Jingtuo Liu, Errui Ding, Ziwei Liu, Jingdong Wang
Despite recent advances in syncing lip movements with any audio waves, current methods still struggle to balance generation quality and the model's generalization ability.
no code implementations • CVPR 2023 • Zhongwei Qiu, Yang Qiansheng, Jian Wang, Haocheng Feng, Junyu Han, Errui Ding, Chang Xu, Dongmei Fu, Jingdong Wang
To handle the variances of objects as time proceeds, a novel scheme of progressive decoding is used to update pose and shape queries at each frame.
Ranked #32 on
3D Human Pose Estimation
on 3DPW
1 code implementation • 26 Jan 2023 • Xiaohu Huang, Hao Zhou, Jian Wang, Haocheng Feng, Junyu Han, Errui Ding, Jingdong Wang, Xinggang Wang, Wenyu Liu, Bin Feng
In this paper, we propose a graph contrastive learning framework for skeleton-based action recognition (\textit{SkeletonGCL}) to explore the \textit{global} context across all sequences.
Ranked #16 on
Skeleton Based Action Recognition
on NTU RGB+D
1 code implementation • 7 Dec 2022 • Haixiao Yue, Keyao Wang, Guosheng Zhang, Haocheng Feng, Junyu Han, Errui Ding, Jingdong Wang
We further extend CDFTN for multi-target domain adaptation by leveraging data from more unlabeled target domains.
2 code implementations • 15 Nov 2022 • Yu Wang, Xin Li, Shengzhao Wen, Fukui Yang, Wanping Zhang, Gang Zhang, Haocheng Feng, Junyu Han, Errui Ding
In this paper, we focus on the compression of DETR with knowledge distillation.
no code implementations • arXiv 2022 • Qiang Chen, Jian Wang, Chuchu Han, Shan Zhang, Zexian Li, Xiaokang Chen, Jiahui Chen, Xiaodi Wang, Shuming Han, Gang Zhang, Haocheng Feng, Kun Yao, Junyu Han, Errui Ding, Jingdong Wang
The training process consists of self-supervised pretraining and finetuning a ViT-Huge encoder on ImageNet-1K, pretraining the detector on Object365, and finally finetuning it on COCO.
Ranked #8 on
Object Detection
on COCO test-dev
(using extra training data)
1 code implementation • 13 Oct 2022 • Jian Wang, Chenhui Gou, Qiman Wu, Haocheng Feng, Junyu Han, Errui Ding, Jingdong Wang
Recently, transformer-based networks have shown impressive results in semantic segmentation.
Ranked #2 on
Real-Time Semantic Segmentation
on CamVid
2 code implementations • ICCV 2023 • Qiang Chen, Xiaokang Chen, Jian Wang, Shan Zhang, Kun Yao, Haocheng Feng, Junyu Han, Errui Ding, Gang Zeng, Jingdong Wang
Detection transformer (DETR) relies on one-to-one assignment, assigning one ground-truth object to one prediction, for end-to-end detection without NMS post-processing.
1 code implementation • 21 Jul 2022 • Teng Xi, Yifan Sun, Deli Yu, Bi Li, Nan Peng, Gang Zhang, Xinyu Zhang, Zhigang Wang, Jinwen Chen, Jian Wang, Lufei Liu, Haocheng Feng, Junyu Han, Jingtuo Liu, Errui Ding, Jingdong Wang
UFO aims to benefit each single task with a large-scale pretraining on all tasks.
1 code implementation • 13 Jun 2022 • Yanpeng Sun, Qiang Chen, Xiangyu He, Jian Wang, Haocheng Feng, Junyu Han, Errui Ding, Jian Cheng, Zechao Li, Jingdong Wang
In this paper, we rethink the paradigm and explore a new regime: {\em fine-tuning a small part of parameters in the backbone}.
Ranked #12 on
Few-Shot Semantic Segmentation
on COCO-20i (1-shot)
1 code implementation • CVPR 2021 • Bi Li, Teng Xi, Gang Zhang, Haocheng Feng, Junyu Han, Jingtuo Liu, Errui Ding, Wenyu Liu
Since only a subset of classes is selected for each iteration, the computing requirement is reduced.
Ranked #4 on
Face Recognition
on AgeDB-30
6 code implementations • 8 May 2020 • Haocheng Feng, Zhibin Hong, Haixiao Yue, Yang Chen, Keyao Wang, Junyu Han, Jingtuo Liu, Errui Ding
In this paper, we reformulate FAS in an anomaly detection perspective and propose a residual-learning framework to learn the discriminative live-spoof differences which are defined as the spoof cues.