no code implementations • 15 Jan 2025 • Siqi Li, Zhengkai Jiang, Jiawei Zhou, Zhihong Liu, Xiaowei Chi, Haoqian Wang
Virtual try-on has emerged as a pivotal task at the intersection of computer vision and fashion, aimed at digitally simulating how clothing items fit on the human body.
no code implementations • 3 Jan 2025 • Siyuan Huang, Liliang Chen, Pengfei Zhou, Shengcong Chen, Zhengkai Jiang, Yue Hu, Yue Liao, Peng Gao, Hongsheng Li, Maoqing Yao, Guanghui Ren
We introduce EnerVerse, a generative robotics foundation model that constructs and interprets embodied spaces.
no code implementations • 22 Nov 2024 • Yiyang Cai, Zhengkai Jiang, Yulong Liu, Chunyang Jiang, Wei Xue, Wenhan Luo, Yike Guo
Facial personalization represents a crucial downstream task in the domain of text-to-image generation.
1 code implementation • 10 Nov 2024 • Zhennan Chen, Yajie Li, Haofan Wang, Zhibo Chen, Zhengkai Jiang, Jun Li, Qian Wang, Jian Yang, Ying Tai
Regional prompting, or compositional generation, which enables fine-grained spatial control, has gained increasing attention for its practicality in real-world applications.
no code implementations • 26 Sep 2024 • Xin Li, Siyuan Huang, Qiaojun Yu, Zhengkai Jiang, Ce Hao, Yimeng Zhu, Hongsheng Li, Peng Gao, Cewu Lu
In addition, this research also underscores the potential of VLMs to unify various garment manipulation tasks within a single framework, paving the way for broader applications in home automation and assistive robotics for future.
no code implementations • 17 Sep 2024 • Xiaofeng Mao, Zhengkai Jiang, Fu-Yun Wang, Wenbing Zhu, Jiangning Zhang, Hao Chen, Mingmin Chi, Yabiao Wang
Video diffusion models have shown great potential in generating high-quality videos, making them an increasingly popular focus.
no code implementations • 9 Sep 2024 • Tianwu Lei, Silin Chen, Bohan Wang, Zhengkai Jiang, Ningmu Zou
We propose the test-time adaption to eliminate the bias between the unseen test sample representation and the feature distribution learned by the expert model.
no code implementations • 6 Aug 2024 • Xiaofeng Mao, Zhengkai Jiang, Qilin Wang, Chencan Fu, Jiangning Zhang, Jiafu Wu, Yabiao Wang, Chengjie Wang, Wei Li, Mingmin Chi
In an attempt to bridge this research gap, we introduce a novel Masked Diffusion Transformer for co-speech gesture generation, referred to as MDT-A2G, which directly implements the denoising process on gesture sequences.
1 code implementation • 30 May 2024 • Kai Wu, Boyuan Jiang, Zhengkai Jiang, Qingdong He, Donghao Luo, Shengzhi Wang, Qingwen Liu, Chengjie Wang
Multimodal large language models (MLLMs) contribute a powerful mechanism to understanding visual information building on large language models.
no code implementations • 28 May 2024 • Qilin Wang, Zhengkai Jiang, Chengming Xu, Jiangning Zhang, Yabiao Wang, Xinyi Zhang, Yun Cao, Weijian Cao, Chengjie Wang, Yanwei Fu
This enables accurate alignment of pose and shape in the generated videos, providing a robust framework capable of handling a wide range of body shapes and dynamic hand movements.
no code implementations • 28 May 2024 • Sihe Zhang, Qingdong He, Jinlong Peng, Yuxi Li, Zhengkai Jiang, Jiafu Wu, Mingmin Chi, Yabiao Wang, Chengjie Wang
To mitigate this issue, we introduce a novel setting for low-quality image retrieval, and propose an Adaptive Noise-Based Network (AdapNet) to learn robust abstract representations.
1 code implementation • 17 May 2024 • Yizhang Jin, Jian Li, Yexin Liu, Tianjun Gu, Kai Wu, Zhengkai Jiang, Muyang He, Bo Zhao, Xin Tan, Zhenye Gan, Yabiao Wang, Chengjie Wang, Lizhuang Ma
In the past year, Multimodal Large Language Models (MLLMs) have demonstrated remarkable performance in tasks such as visual question answering, visual understanding and reasoning.
2 code implementations • 9 May 2024 • Peng Gao, Le Zhuo, Dongyang Liu, Ruoyi Du, Xu Luo, Longtian Qiu, Yuhang Zhang, Chen Lin, Rongjie Huang, Shijie Geng, Renrui Zhang, Junlin Xi, Wenqi Shao, Zhengkai Jiang, Tianshuo Yang, Weicai Ye, He Tong, Jingwen He, Yu Qiao, Hongsheng Li
Sora unveils the potential of scaling Diffusion Transformer for generating photorealistic images and videos at arbitrary resolutions, aspect ratios, and durations, yet it still lacks sufficient implementation details.
1 code implementation • CVPR 2024 • Xiaojun Hou, Jiazheng Xing, Yijie Qian, Yaowei Guo, Shuo Xin, JunHao Chen, Kai Tang, Mengmeng Wang, Zhengkai Jiang, Liang Liu, Yong liu
Multimodal Visual Object Tracking (VOT) has recently gained significant attention due to its robustness.
Ranked #25 on
Rgb-T Tracking
on RGBT234
1 code implementation • 18 Mar 2024 • Liren He, Zhengkai Jiang, Jinlong Peng, Liang Liu, Qiangang Du, Xiaobin Hu, Wenbing Zhu, Mingmin Chi, Yabiao Wang, Chengjie Wang
In the field of multi-class anomaly detection, reconstruction-based methods derived from single-class anomaly detection face the well-known challenge of "learning shortcuts", wherein the model fails to learn the patterns of normal samples as it should, opting instead for shortcuts such as identity mapping or artificial noise elimination.
no code implementations • 11 Mar 2024 • Qingdong He, Jinlong Peng, Zhengkai Jiang, Xiaobin Hu, Jiangning Zhang, Qiang Nie, Yabiao Wang, Chengjie Wang
On top of that, PointSeg can incorporate with various foundation models and even surpasses the specialist training-based methods by 3. 4$\%$-5. 4$\%$ mAP across various datasets, serving as an effective generalist model.
no code implementations • 10 Mar 2024 • Xiaobin Hu, Xu Peng, Donghao Luo, Xiaozhong Ji, Jinlong Peng, Zhengkai Jiang, Jiangning Zhang, Taisong Jin, Chengjie Wang, Rongrong Ji
Aiming to simultaneously generate the object and its matting annotation, we build a matting head to make a green color removal in the latent space of the VAE decoder.
1 code implementation • 21 Jan 2024 • Qingdong He, Jinlong Peng, Zhengkai Jiang, Kai Wu, Xiaozhong Ji, Jiangning Zhang, Yabiao Wang, Chengjie Wang, Mingang Chen, Yunsheng Wu
3D open-vocabulary scene understanding aims to recognize arbitrary novel categories beyond the base label space.
no code implementations • 15 Dec 2023 • Shizhan Liu, Zhengkai Jiang, Yuxi Li, Jinlong Peng, Yabiao Wang, Weiyao Lin
Active domain adaptation has emerged as a solution to balance the expensive annotation cost and the performance of trained models in semantic segmentation.
1 code implementation • 29 Nov 2023 • Xiaowei Chi, Rongyu Zhang, Zhengkai Jiang, Yijiang Liu, Yatian Wang, Xingqun Qi, Wenhan Luo, Peng Gao, Shanghang Zhang, Qifeng Liu, Yike Guo
Moreover, to further enhance the effectiveness of $M^{3}Adapter$ while preserving the coherence of semantic context comprehension, we introduce a two-stage $M^{3}FT$ fine-tuning strategy.
no code implementations • 24 May 2023 • Zhengkai Jiang, Liang Liu, Jiangning Zhang, Yabiao Wang, Mingang Chen, Chengjie Wang
This paper introduces a novel attention mechanism, called dual attention, which is both efficient and effective.
1 code implementation • 18 May 2023 • Siyuan Huang, Zhengkai Jiang, Hao Dong, Yu Qiao, Peng Gao, Hongsheng Li
This paper presents Instruct2Act, a framework that utilizes Large Language Models to map multi-modal instructions to sequential actions for robotic manipulation tasks.
1 code implementation • 4 May 2023 • Renrui Zhang, Zhengkai Jiang, Ziyu Guo, Shilin Yan, Junting Pan, Xianzheng Ma, Hao Dong, Peng Gao, Hongsheng Li
Driven by large-data pre-training, Segment Anything Model (SAM) has been demonstrated as a powerful and promptable framework, revolutionizing the segmentation models.
Ranked #2 on
Personalized Segmentation
on PerSeg
1 code implementation • ICCV 2023 • Jiangning Zhang, Xiangtai Li, Jian Li, Liang Liu, Zhucun Xue, Boshen Zhang, Zhengkai Jiang, Tianxin Huang, Yabiao Wang, Chengjie Wang
This paper focuses on developing modern, efficient, lightweight models for dense predictions while trading off parameters, FLOPs, and performance.
1 code implementation • 14 Jul 2022 • Zhengkai Jiang, Yuxi Li, Ceyuan Yang, Peng Gao, Yabiao Wang, Ying Tai, Chengjie Wang
Unsupervised Domain Adaptation (UDA) aims to adapt the model trained on the labeled source domain to an unlabeled target domain.
Ranked #15 on
Unsupervised Domain Adaptation
on SYNTHIA-to-Cityscapes
1 code implementation • 30 May 2022 • Ziteng Cui, Kunchang Li, Lin Gu, Shenghan Su, Peng Gao, Zhengkai Jiang, Yu Qiao, Tatsuya Harada
Challenging illumination conditions (low-light, under-exposure and over-exposure) in the real world not only cast an unpleasant visual appearance but also taint the computer vision tasks.
Ranked #3 on
Image Enhancement
on Exposure-Errors
no code implementations • 8 Feb 2022 • Zhengkai Jiang, Zhangxuan Gu, Jinlong Peng, Hang Zhou, Liang Liu, Yabiao Wang, Ying Tai, Chengjie Wang, Liqing Zhang
In contrast, we present a simple and efficient single-stage VIS framework based on the instance segmentation method CondInst by adding an extra tracking head.
Ranked #27 on
Video Instance Segmentation
on YouTube-VIS validation
3 code implementations • 27 Jul 2021 • Qingyu Song, Changan Wang, Zhengkai Jiang, Yabiao Wang, Ying Tai, Chengjie Wang, Jilin Li, Feiyue Huang, Yang Wu
In this paper, we propose a purely point-based framework for joint crowd counting and individual localization.
Ranked #6 on
Crowd Counting
on ShanghaiTech A
no code implementations • 24 May 2021 • Jinlong Peng, Zhengkai Jiang, Yueyang Gu, Yang Wu, Yabiao Wang, Ying Tai, Chengjie Wang, Weiyao Lin
In addition, we add a localization branch to predict the localization accuracy, so that it can work as the replacement of the regression assistance link during inference.
1 code implementation • ICCV 2021 • Qingyu Song, Changan Wang, Zhengkai Jiang, Yabiao Wang, Ying Tai, Chengjie Wang, Jilin Li, Feiyue Huang, Yang Wu
In this paper, we propose a purely point-based framework for joint crowd counting and individual localization.
1 code implementation • NeurIPS 2020 • Lin Song, Yanwei Li, Zhengkai Jiang, Zeming Li, Xiangyu Zhang, Hongbin Sun, Jian Sun, Nanning Zheng
The Learnable Tree Filter presents a remarkable approach to model structure-preserving relations for semantic segmentation.
1 code implementation • NeurIPS 2020 • Lin Song, Yanwei Li, Zhengkai Jiang, Zeming Li, Hongbin Sun, Jian Sun, Nanning Zheng
To this end, we propose a fine-grained dynamic head to conditionally select a pixel-level combination of FPN features from different scales for each instance, which further releases the ability of multi-scale feature representation.
no code implementations • 26 Jul 2020 • Lei Shi, Kai Shuang, Shijie Geng, Peng Su, Zhengkai Jiang, Peng Gao, Zuohui Fu, Gerard de Melo, Sen Su
We evaluate CVLP on several down-stream tasks, including VQA, GQA and NLVR2 to validate the superiority of contrastive learning on multi-modality representation learning.
2 code implementations • 7 Jul 2020 • Benjin Zhu, Jian-Feng Wang, Zhengkai Jiang, Fuhang Zong, Songtao Liu, Zeming Li, Jian Sun
During training, to both satisfy the prior distribution of data and adapt to category characteristics, we present Center Weighting to adjust the category-specific prior distributions.
1 code implementation • ECCV 2020 • Zhengkai Jiang, Yu Liu, Ceyuan Yang, Jihao Liu, Peng Gao, Qian Zhang, Shiming Xiang, Chunhong Pan
Transferring existing image-based detectors to the video is non-trivial since the quality of frames is always deteriorated by part occlusion, rare pose, and motion blur.
Ranked #24 on
Video Object Detection
on ImageNet VID
3 code implementations • 26 Aug 2019 • Benjin Zhu, Zhengkai Jiang, Xiangxin Zhou, Zeming Li, Gang Yu
This report presents our method which wins the nuScenes3D Detection Challenge [17] held in Workshop on Autonomous Driving(WAD, CVPR 2019).
Ranked #6 on
3D Object Detection
on nuScenes LiDAR only
no code implementations • 13 Dec 2018 • Gao Peng, Zhengkai Jiang, Haoxuan You, Pan Lu, Steven Hoi, Xiaogang Wang, Hongsheng Li
It can robustly capture the high-level interactions between language and vision domains, thus significantly improves the performance of visual question answering.