Search Results for author: Zhengkai Jiang

Found 37 papers, 20 papers with code

RealVVT: Towards Photorealistic Video Virtual Try-on via Spatio-Temporal Consistency

no code implementations15 Jan 2025 Siqi Li, Zhengkai Jiang, Jiawei Zhou, Zhihong Liu, Xiaowei Chi, Haoqian Wang

Virtual try-on has emerged as a pivotal task at the intersection of computer vision and fashion, aimed at digitally simulating how clothing items fit on the human body.

Virtual Try-on

Region-Aware Text-to-Image Generation via Hard Binding and Soft Refinement

1 code implementation10 Nov 2024 Zhennan Chen, Yajie Li, Haofan Wang, Zhibo Chen, Zhengkai Jiang, Jun Li, Qian Wang, Jian Yang, Ying Tai

Regional prompting, or compositional generation, which enables fine-grained spatial control, has gained increasing attention for its practicality in real-world applications.

Attribute RAG +1

SKT: Integrating State-Aware Keypoint Trajectories with Vision-Language Models for Robotic Garment Manipulation

no code implementations26 Sep 2024 Xin Li, Siyuan Huang, Qiaojun Yu, Zhengkai Jiang, Ce Hao, Yimeng Zhu, Hongsheng Li, Peng Gao, Cewu Lu

In addition, this research also underscores the potential of VLMs to unify various garment manipulation tasks within a single framework, paving the way for broader applications in home automation and assistive robotics for future.

Keypoint Detection

OSV: One Step is Enough for High-Quality Image to Video Generation

no code implementations17 Sep 2024 Xiaofeng Mao, Zhengkai Jiang, Fu-Yun Wang, Wenbing Zhu, Jiangning Zhang, Hao Chen, Mingmin Chi, Yabiao Wang

Video diffusion models have shown great potential in generating high-quality videos, making them an increasingly popular focus.

Image to Video Generation

Adapted-MoE: Mixture of Experts with Test-Time Adaption for Anomaly Detection

no code implementations9 Sep 2024 Tianwu Lei, Silin Chen, Bohan Wang, Zhengkai Jiang, Ningmu Zou

We propose the test-time adaption to eliminate the bias between the unseen test sample representation and the feature distribution learned by the expert model.

Representation Learning Unsupervised Anomaly Detection

MDT-A2G: Exploring Masked Diffusion Transformers for Co-Speech Gesture Generation

no code implementations6 Aug 2024 Xiaofeng Mao, Zhengkai Jiang, Qilin Wang, Chencan Fu, Jiangning Zhang, Jiafu Wu, Yabiao Wang, Chengjie Wang, Wei Li, Mingmin Chi

In an attempt to bridge this research gap, we introduce a novel Masked Diffusion Transformer for co-speech gesture generation, referred to as MDT-A2G, which directly implements the denoising process on gesture sequences.

Denoising Gesture Generation

NoiseBoost: Alleviating Hallucination with Noise Perturbation for Multimodal Large Language Models

1 code implementation30 May 2024 Kai Wu, Boyuan Jiang, Zhengkai Jiang, Qingdong He, Donghao Luo, Shengzhi Wang, Qingwen Liu, Chengjie Wang

Multimodal large language models (MLLMs) contribute a powerful mechanism to understanding visual information building on large language models.

Hallucination

VividPose: Advancing Stable Video Diffusion for Realistic Human Image Animation

no code implementations28 May 2024 Qilin Wang, Zhengkai Jiang, Chengming Xu, Jiangning Zhang, Yabiao Wang, Xinyi Zhang, Yun Cao, Weijian Cao, Chengjie Wang, Yanwei Fu

This enables accurate alignment of pose and shape in the generated videos, providing a robust framework capable of handling a wide range of body shapes and dynamic hand movements.

Image Animation

AdapNet: Adaptive Noise-Based Network for Low-Quality Image Retrieval

no code implementations28 May 2024 Sihe Zhang, Qingdong He, Jinlong Peng, Yuxi Li, Zhengkai Jiang, Jiafu Wu, Mingmin Chi, Yabiao Wang, Chengjie Wang

To mitigate this issue, we introduce a novel setting for low-quality image retrieval, and propose an Adaptive Noise-Based Network (AdapNet) to learn robust abstract representations.

Image Retrieval Re-Ranking +1

Efficient Multimodal Large Language Models: A Survey

1 code implementation17 May 2024 Yizhang Jin, Jian Li, Yexin Liu, Tianjun Gu, Kai Wu, Zhengkai Jiang, Muyang He, Bo Zhao, Xin Tan, Zhenye Gan, Yabiao Wang, Chengjie Wang, Lizhuang Ma

In the past year, Multimodal Large Language Models (MLLMs) have demonstrated remarkable performance in tasks such as visual question answering, visual understanding and reasoning.

Edge-computing Question Answering +2

Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers

2 code implementations9 May 2024 Peng Gao, Le Zhuo, Dongyang Liu, Ruoyi Du, Xu Luo, Longtian Qiu, Yuhang Zhang, Chen Lin, Rongjie Huang, Shijie Geng, Renrui Zhang, Junlin Xi, Wenqi Shao, Zhengkai Jiang, Tianshuo Yang, Weicai Ye, He Tong, Jingwen He, Yu Qiao, Hongsheng Li

Sora unveils the potential of scaling Diffusion Transformer for generating photorealistic images and videos at arbitrary resolutions, aspect ratios, and durations, yet it still lacks sufficient implementation details.

Learning Unified Reference Representation for Unsupervised Multi-class Anomaly Detection

1 code implementation18 Mar 2024 Liren He, Zhengkai Jiang, Jinlong Peng, Liang Liu, Qiangang Du, Xiaobin Hu, Wenbing Zhu, Mingmin Chi, Yabiao Wang, Chengjie Wang

In the field of multi-class anomaly detection, reconstruction-based methods derived from single-class anomaly detection face the well-known challenge of "learning shortcuts", wherein the model fails to learn the patterns of normal samples as it should, opting instead for shortcuts such as identity mapping or artificial noise elimination.

Multi-class Anomaly Detection

PointSeg: A Training-Free Paradigm for 3D Scene Segmentation via Foundation Models

no code implementations11 Mar 2024 Qingdong He, Jinlong Peng, Zhengkai Jiang, Xiaobin Hu, Jiangning Zhang, Qiang Nie, Yabiao Wang, Chengjie Wang

On top of that, PointSeg can incorporate with various foundation models and even surpasses the specialist training-based methods by 3. 4$\%$-5. 4$\%$ mAP across various datasets, serving as an effective generalist model.

Scene Segmentation

DiffuMatting: Synthesizing Arbitrary Objects with Matting-level Annotation

no code implementations10 Mar 2024 Xiaobin Hu, Xu Peng, Donghao Luo, Xiaozhong Ji, Jinlong Peng, Zhengkai Jiang, Jiangning Zhang, Taisong Jin, Chengjie Wang, Rongrong Ji

Aiming to simultaneously generate the object and its matting annotation, we build a matting head to make a green color removal in the latent space of the VAE decoder.

Image Matting Object

Density Matters: Improved Core-set for Active Domain Adaptive Segmentation

no code implementations15 Dec 2023 Shizhan Liu, Zhengkai Jiang, Yuxi Li, Jinlong Peng, Yabiao Wang, Weiyao Lin

Active domain adaptation has emerged as a solution to balance the expensive annotation cost and the performance of trained models in semantic segmentation.

Domain Adaptation Semantic Segmentation

M$^{2}$Chat: Empowering VLM for Multimodal LLM Interleaved Text-Image Generation

1 code implementation29 Nov 2023 Xiaowei Chi, Rongyu Zhang, Zhengkai Jiang, Yijiang Liu, Yatian Wang, Xingqun Qi, Wenhan Luo, Peng Gao, Shanghang Zhang, Qifeng Liu, Yike Guo

Moreover, to further enhance the effectiveness of $M^{3}Adapter$ while preserving the coherence of semantic context comprehension, we introduce a two-stage $M^{3}FT$ fine-tuning strategy.

Image Generation Language Modelling +1

Dual Path Transformer with Partition Attention

no code implementations24 May 2023 Zhengkai Jiang, Liang Liu, Jiangning Zhang, Yabiao Wang, Mingang Chen, Chengjie Wang

This paper introduces a novel attention mechanism, called dual attention, which is both efficient and effective.

Image Classification object-detection +2

Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model

1 code implementation18 May 2023 Siyuan Huang, Zhengkai Jiang, Hao Dong, Yu Qiao, Peng Gao, Hongsheng Li

This paper presents Instruct2Act, a framework that utilizes Large Language Models to map multi-modal instructions to sequential actions for robotic manipulation tasks.

Language Modeling Language Modelling +3

Personalize Segment Anything Model with One Shot

1 code implementation4 May 2023 Renrui Zhang, Zhengkai Jiang, Ziyu Guo, Shilin Yan, Junting Pan, Xianzheng Ma, Hao Dong, Peng Gao, Hongsheng Li

Driven by large-data pre-training, Segment Anything Model (SAM) has been demonstrated as a powerful and promptable framework, revolutionizing the segmentation models.

model Personalized Segmentation +5

Rethinking Mobile Block for Efficient Attention-based Models

1 code implementation ICCV 2023 Jiangning Zhang, Xiangtai Li, Jian Li, Liang Liu, Zhucun Xue, Boshen Zhang, Zhengkai Jiang, Tianxin Huang, Yabiao Wang, Chengjie Wang

This paper focuses on developing modern, efficient, lightweight models for dense predictions while trading off parameters, FLOPs, and performance.

Unity

You Only Need 90K Parameters to Adapt Light: A Light Weight Transformer for Image Enhancement and Exposure Correction

1 code implementation30 May 2022 Ziteng Cui, Kunchang Li, Lin Gu, Shenghan Su, Peng Gao, Zhengkai Jiang, Yu Qiao, Tatsuya Harada

Challenging illumination conditions (low-light, under-exposure and over-exposure) in the real world not only cast an unpleasant visual appearance but also taint the computer vision tasks.

Exposure Correction Low-Light Image Enhancement +3

SiamRCR: Reciprocal Classification and Regression for Visual Object Tracking

no code implementations24 May 2021 Jinlong Peng, Zhengkai Jiang, Yueyang Gu, Yang Wu, Yabiao Wang, Ying Tai, Chengjie Wang, Weiyao Lin

In addition, we add a localization branch to predict the localization accuracy, so that it can work as the replacement of the regression assistance link during inference.

Classification Object +2

Fine-Grained Dynamic Head for Object Detection

1 code implementation NeurIPS 2020 Lin Song, Yanwei Li, Zhengkai Jiang, Zeming Li, Hongbin Sun, Jian Sun, Nanning Zheng

To this end, we propose a fine-grained dynamic head to conditionally select a pixel-level combination of FPN features from different scales for each instance, which further releases the ability of multi-scale feature representation.

Object object-detection +1

Contrastive Visual-Linguistic Pretraining

no code implementations26 Jul 2020 Lei Shi, Kai Shuang, Shijie Geng, Peng Su, Zhengkai Jiang, Peng Gao, Zuohui Fu, Gerard de Melo, Sen Su

We evaluate CVLP on several down-stream tasks, including VQA, GQA and NLVR2 to validate the superiority of contrastive learning on multi-modality representation learning.

Contrastive Learning regression +2

AutoAssign: Differentiable Label Assignment for Dense Object Detection

2 code implementations7 Jul 2020 Benjin Zhu, Jian-Feng Wang, Zhengkai Jiang, Fuhang Zong, Songtao Liu, Zeming Li, Jian Sun

During training, to both satisfy the prior distribution of data and adapt to category characteristics, we present Center Weighting to adjust the category-specific prior distributions.

Dense Object Detection Object +1

Learning Where to Focus for Efficient Video Object Detection

1 code implementation ECCV 2020 Zhengkai Jiang, Yu Liu, Ceyuan Yang, Jihao Liu, Peng Gao, Qian Zhang, Shiming Xiang, Chunhong Pan

Transferring existing image-based detectors to the video is non-trivial since the quality of frames is always deteriorated by part occlusion, rare pose, and motion blur.

Object object-detection +1

Class-balanced Grouping and Sampling for Point Cloud 3D Object Detection

3 code implementations26 Aug 2019 Benjin Zhu, Zhengkai Jiang, Xiangxin Zhou, Zeming Li, Gang Yu

This report presents our method which wins the nuScenes3D Detection Challenge [17] held in Workshop on Autonomous Driving(WAD, CVPR 2019).

3D Object Detection Autonomous Driving +1

Dynamic Fusion with Intra- and Inter- Modality Attention Flow for Visual Question Answering

no code implementations13 Dec 2018 Gao Peng, Zhengkai Jiang, Haoxuan You, Pan Lu, Steven Hoi, Xiaogang Wang, Hongsheng Li

It can robustly capture the high-level interactions between language and vision domains, thus significantly improves the performance of visual question answering.

Question Answering Visual Question Answering

Cannot find the paper you are looking for? You can Submit a new open access paper.