no code implementations • 9 Jun 2025 • Ruiyang Zhang, Hu Zhang, Hao Fei, Zhedong Zheng
Large Multimodal Models (LMMs), harnessing the complementarity among diverse modalities, are often considered more robust than pure Language Large Models (LLMs); yet do LMMs know what they do not know?
1 code implementation • 31 May 2025 • Yule Zhu, Ping Liu, Zhedong Zheng, Wei Liu
However, sequential editing introduces significant challenges in edit attribution and detection robustness, further complicated by the lack of large-scale, finely annotated benchmarks tailored explicitly for this task.
no code implementations • 25 May 2025 • Jintao Sun, Hu Zhang, Gangyi Ding, Zhedong Zheng
Modern end-to-end autonomous driving systems suffer from a critical limitation: their planners lack mechanisms to enforce temporal consistency between predicted trajectories and evolving scene dynamics.
1 code implementation • 26 Apr 2025 • Hang Yu, Jiahao Wen, Zhedong Zheng
Due to the high cost of annotation and privacy protection, researchers resort to synthesized data for the paradigm of pretraining and fine-tuning.
1 code implementation • 31 Mar 2025 • Lingyu Liu, Yaxiong Wang, Li Zhu, Zhedong Zheng
Our approach introduces synthetic proxy images through two key innovations: (1) Dual-path score distillation: We employ a dual-path architecture to distill motion priors from both real and synthetic data, preserving static details from the original painting while learning dynamic characteristics from synthetic frames.
no code implementations • 16 Mar 2025 • Jianwu Fang, Lei-Lei Li, Zhedong Zheng, Hongkai Yu, Jianru Xue, Zhengguo Li, Tat-Seng Chua
Traffic Accident Anticipation (TAA) in traffic scenes is a challenging problem for achieving zero fatalities in the future.
no code implementations • 21 Feb 2025 • Yinan Zhou, Yaxiong Wang, Haokun Lin, Chen Ma, Li Zhu, Zhedong Zheng
To address this issue, this paper proposes to synthesize the training triplets to augment the training resource for the CIR problem.
no code implementations • 27 Dec 2024 • Jingchun Lian, Lingyu Liu, Yaxiong Wang, Yujiao Wu, Li Zhu, Zhedong Zheng
Leveraging the MMTT dataset, we develop ForgeryTalker, an architecture designed for concurrent forgery localization and interpretation.
no code implementations • 16 Dec 2024 • Quan Chen, Tingyu Wang, Rongfeng Lu, Bolun Zheng, Zhedong Zheng, Chenggang Yan
Specifically, we propose a distance guided dynamic partition learning strategy~(DGDPL), consisting of a square partition strategy and a distance-guided adjustment strategy.
no code implementations • 16 Dec 2024 • Bingwen Hu, Heng Liu, Zhedong Zheng, Ping Liu
Our proposed multi-modal collaborative framework enables the production of realistic and high-quality SR images at significant up-scaling factors.
1 code implementation • 3 Dec 2024 • Haidong Xu, Meishan Zhang, Hao Ju, Zhedong Zheng, Erik Cambria, Min Zhang, Hao Fei
T3DEM is the most crucial step in determining the quality of Emo3D generation and encompasses three key challenges: Expression Diversity, Emotion-Content Consistency, and Expression Fluidity.
no code implementations • 28 Nov 2024 • Jiacheng Wang, Zhedong Zheng, Wei Xu, Ping Liu
Given a single image of a target object, image-to-3D generation aims to reconstruct its texture and geometric shape.
1 code implementation • 26 Nov 2024 • Shuyu Yang, Yaxiong Wang, Li Zhu, Zhedong Zheng
To enable the training and evaluation of this new task, we construct a large-scale image-text Pedestrian Anomaly Behavior (PAB) benchmark, featuring a broad spectrum of actions, e. g., running, performing, playing soccer, and the corresponding anomalies, e. g., lying, being hit, and falling of the same identity.
1 code implementation • 20 Nov 2024 • Hao Ju, Zhedong Zheng
To address these limitations, we formulate a new video-based drone geo-localization task and propose the Video2BEV paradigm.
1 code implementation • 18 Nov 2024 • Ruiyang Zhang, Hu Zhang, Zhedong Zheng
Given the higher information load processed by large vision-language models (LVLMs) compared to single-modal LLMs, detecting LVLM hallucinations requires more human and time expense, and thus rise a wider safety concerns.
no code implementations • 15 Oct 2024 • Guiyu Zhang, Huan-ang Gao, Zijian Jiang, Hao Zhao, Zhedong Zheng
Based on the uncertainty estimation, we regularize the model training by adaptively rectifying the reward.
1 code implementation • 1 Aug 2024 • Ruiyang Zhang, Hu Zhang, Hang Yu, Zhedong Zheng
(2) Based on the assessed uncertainty, we adaptively adjust the weight of every 3D bbox coordinate via uncertainty regularization, refining the training process on pseudo bboxes.
no code implementations • 24 Jul 2024 • Mu Chen, Zhedong Zheng, Yi Yang
Unsupervised domain adaptive segmentation aims to improve the segmentation accuracy of models on target domains without relying on labeled data from those domains.
1 code implementation • 11 Jul 2024 • Ruiyang Zhang, Hu Zhang, Hang Yu, Zhedong Zheng
In this paper, we are among the early attempts to integrate LiDAR data with 2D images for unsupervised 3D detection and introduce a new method, dubbed LiDAR-2D Self-paced Learning (LiSe).
1 code implementation • 16 Apr 2024 • Jintao Sun, Hao Fei, Zhedong Zheng, Gangyi Ding
In text-based person search endeavors, data generation has emerged as a prevailing practice, addressing concerns over privacy preservation and the arduous task of manual annotation.
Ranked #2 on
Text based Person Retrieval
on ICFG-PEDES
no code implementations • 16 Jan 2024 • Lidong Zeng, Zhedong Zheng, Yinwei Wei, Tat-Seng Chua
This paper delves into the text-guided image editing task, focusing on modifying a reference image according to user-specified textual feedback to embody specific attributes.
2 code implementations • 21 Nov 2023 • Meng Chu, Zhedong Zheng, Wei Ji, Tingyu Wang, Tat-Seng Chua
Navigating drones through natural language commands remains challenging due to the dearth of accessible multi-modal datasets and the stringent precision requirements for aligning visual and textual data.
2 code implementations • 21 Nov 2023 • Mu Chen, Zhedong Zheng, Yi Yang
Based on such observation, we propose a depth-aware framework to explicitly leverage depth estimation to mix the categories and facilitate the two complementary tasks, i. e., segmentation and depth learning in an end-to-end manner.
1 code implementation • 26 Sep 2023 • Han Yi, Zhedong Zheng, Xiangyu Xu, Tat-Seng Chua
We aspire for our work to pave the way for automatic 3D prototyping via natural language descriptions.
1 code implementation • 5 Jun 2023 • Shuyu Yang, Yinan Zhou, Yaxiong Wang, Yujiao Wu, Li Zhu, Zhedong Zheng
To verify the feasibility of learning from the generated data, we develop a new joint Attribute Prompt Learning and Text Matching Learning (APTM) framework, considering the shared knowledge between attribute and text.
Ranked #1 on
Text based Person Retrieval
on ICFG-PEDES
(using extra training data)
no code implementations • 3 Jun 2023 • Xu Zhang, Zhedong Zheng, Linchao Zhu, Yi Yang
Triplet ambiguity refers to a type of semantic ambiguity that arises between the reference image, the relative caption, and the target image.
Content-Based Image Retrieval
Image Retrieval with Multi-Modal Query
+2
1 code implementation • 6 May 2023 • Yuxia Wu, Tianhao Dai, Zhedong Zheng, Lizi Liao
Existing task-oriented conversational search systems heavily rely on domain ontologies with pre-defined slots and candidate value sets.
1 code implementation • 25 Apr 2023 • Leigang Qu, Meng Liu, Wenjie Wang, Zhedong Zheng, Liqiang Nie, Tat-Seng Chua
Image-text retrieval aims to bridge the modality gap and retrieve cross-modal content based on semantic similarities.
1 code implementation • CVPR 2023 • Wei Ji, Renjie Liang, Zhedong Zheng, Wenqiao Zhang, Shengyu Zhang, Juncheng Li, Mengze Li, Tat-Seng Chua
Moreover, we treat the uncertainty score of frames in a video as a whole, and estimate the difficulty of each video, which can further relieve the burden of video selection.
1 code implementation • CVPR 2023 • Chao Wang, Zhedong Zheng, Ruijie Quan, Yifan Sun, Yi Yang
(2) The conventional paradigm usually focuses on mining the abnormal pattern of a superimposed image to separate the noise, which de facto conflicts with the primary image restoration task.
no code implementations • 25 Dec 2022 • Xiaolong Shen, Zhedong Zheng, Yi Yang
As its name suggests, it is made up of two modules: Part-level Spatial Modeling and Part-level Temporal Modeling.
Ranked #1 on
Sign Language Recognition
on BOBSL
1 code implementation • 19 Dec 2022 • Jianwu Fang, Lei-Lei Li, Kuan Yang, Zhedong Zheng, Jianru Xue, Tat-Seng Chua
In particular, the text description provides a dense semantic description guidance for the primary context of the traffic scene, while the driver attention provides a traction to focus on the critical region closely correlating with safe driving.
1 code implementation • 14 Nov 2022 • Mu Chen, Zhedong Zheng, Yi Yang, Tat-Seng Chua
In an attempt to fill this gap, we propose a unified pixel- and patch-wise self-supervised learning framework, called PiPa, for domain adaptive semantic segmentation that facilitates intra-image pixel-wise correlations and patch-wise semantic consistency against different contexts.
Ranked #1 on
Semantic Segmentation
on SYNTHIA-to-Cityscapes
1 code implementation • 14 Nov 2022 • Yiyang Chen, Zhedong Zheng, Wei Ji, Leigang Qu, Tat-Seng Chua
The key idea underpinning the proposed method is to integrate fine- and coarse-grained retrieval as matching data points with small and large fluctuations, respectively.
Composed Image Retrieval (CoIR)
Image Retrieval with Multi-Modal Query
+1
no code implementations • 10 Nov 2022 • Tingyu Wang, Zhedong Zheng, Zunjie Zhu, Yuhan Gao, Yi Yang, Chenggang Yan
Cross-view geo-localization aims to spot images of the same location shot from two platforms, e. g., the drone platform and the satellite platform.
no code implementations • IEEE Transactions on Pattern Analysis and Machine Intelligence 2022 • Chuchu Han, Zhedong Zheng, Kai Su, Dongdong Yu, Zehuan Yuan, Changxin Gao, Nong Sang, Yi Yang
Person search aims at localizing and recognizing query persons from raw video frames, which is a combination of two sub-tasks, i. e., pedestrian detection and person re-identification.
Ranked #4 on
Person Search
on PRW
no code implementations • 8 Jul 2022 • Yucheng Suo, Zhedong Zheng, Xiaohan Wang, Bang Zhang, Yi Yang
We optimize the two losses and keypoint detector network in an end-to-end manner.
1 code implementation • IEEE Transactions on Image Processing (TIP) 2022 • Jinliang Lin, Zhedong Zheng, Zhun Zhong, Zhiming Luo, Shaozi Li, Yi Yang, Nicu Sebe
Inspired by the human visual system for mining local patterns, we propose a new framework called RK-Net to jointly learn the discriminative Representation and detect salient Keypoints with a single Network.
Ranked #2 on
Drone navigation
on University-1652
1 code implementation • 27 Apr 2022 • Zhedong Zheng, Jiayin Zhu, Wei Ji, Yi Yang, Tat-Seng Chua
This research aims to study a self-supervised 3D clothing reconstruction method, which recovers the geometry shape and texture of human clothing from a single image.
Ranked #1 on
Single-View 3D Reconstruction
on CUB-200-2011
2 code implementations • 18 Apr 2022 • Tingyu Wang, Zhedong Zheng, Yaoqi Sun, Chenggang Yan, Yi Yang, Tat-Seng Chua
This task is mostly regarded as an image retrieval problem.
1 code implementation • CVPR 2022 • Xuanmeng Zhang, Zhedong Zheng, Daiheng Gao, Bang Zhang, Pan Pan, Yi Yang
To address this challenge, we propose Multi-View Consistent Generative Adversarial Networks (MVCGAN) for high-quality 3D-aware image synthesis with geometry constraints.
1 code implementation • 1 Sep 2021 • Chao Sun, Zhedong Zheng, Xiaohan Wang, Mingliang Xu, Yi Yang
Albeit simple, the pre-trained encoder can capture the key points of an unseen point cloud and surpasses the encoder trained from scratch on downstream tasks.
Ranked #46 on
3D Part Segmentation
on ShapeNet-Part
no code implementations • 3 Aug 2021 • Bingwen Hu, Ping Liu, Zhedong Zheng, Mingwu Ren
Third, a Try-on Synthesis Module (TSM) combines the coarse result and the warped clothes to generate the final virtual try-on image, preserving details of the desired clothes and under the desired pose.
no code implementations • 3 Jun 2021 • Kezhou Lin, Xiaohan Wang, Zhedong Zheng, Linchao Zhu, Yi Yang
Obtaining viewer responses from videos can be useful for creators and streaming platforms to analyze the video performance and improve the future user experience.
1 code implementation • 31 May 2021 • Shuai Bai, Zhedong Zheng, Xiaohan Wang, Junyang Lin, Zhu Zhang, Chang Zhou, Yi Yang, Hongxia Yang
In this paper, we apply one new modality, i. e., the language description, to search the vehicle of interest and explore the potential of this task in the real-world scenario.
2 code implementations • 29 Mar 2021 • Zhedong Zheng, Yi Yang
Domain adaptation is to transfer the shared knowledge learned from the source domain to a new environment, i. e., target domain.
1 code implementation • 22 Feb 2021 • Chuchu Han, Zhedong Zheng, Changxin Gao, Nong Sang, Yi Yang
Specifically, to reconcile the conflicts of multiple objectives, we simplify the standard tightly coupled pipelines and establish a deeply decoupled multi-task learning framework.
Ranked #9 on
Person Search
on PRW
1 code implementation • 14 Dec 2020 • Xuanmeng Zhang, Minyue Jiang, Zhedong Zheng, Xiao Tan, Errui Ding, Yi Yang
We argue that the first phase equals building the k-nearest neighbor graph, while the second phase can be viewed as spreading the message within the graph.
Ranked #1 on
Image Retrieval
on Oxford5k
1 code implementation • 26 Aug 2020 • Tingyu Wang, Zhedong Zheng, Chenggang Yan, Jiyong Zhang, Yaoqi Sun, Bolun Zheng, Yi Yang
Existing methods usually concentrate on mining the fine-grained feature of the geographic target in the image center, but underestimate the contextual information in neighbor areas.
Ranked #3 on
Drone navigation
on University-1652
1 code implementation • 8 Jun 2020 • Zhedong Zheng, Nenggan Zheng, Yi Yang
To our knowledge, we are among the first attempts to conduct person re-identification in the 3D space.
3 code implementations • 14 Apr 2020 • Zhedong Zheng, Tao Ruan, Yunchao Wei, Yi Yang, Tao Mei
This stage relaxes the full alignment between the training and testing domains, as it is agnostic to the target vehicle domain.
Ranked #1 on
Vehicle Re-Identification
on VehicleID
2 code implementations • 8 Mar 2020 • Zhedong Zheng, Yi Yang
This paper focuses on the unsupervised domain adaptation of transferring the knowledge from the source domain to the target domain in the context of semantic segmentation.
3 code implementations • 27 Feb 2020 • Zhedong Zheng, Yunchao Wei, Yi Yang
To our knowledge, University-1652 is the first drone-based geo-localization dataset and enables two new tasks, i. e., drone-view target localization and drone navigation.
Ranked #6 on
Drone navigation
on University-1652
no code implementations • 24 Jan 2020 • Xiaodong Wang, Zhedong Zheng, Yang He, Fei Yan, Zhiqiang Zeng, Yi Yang
To verify this, we evaluate our method on two widely-used image retrieval datasets, i. e., Oxford5k and Paris6K, and one person re-identification dataset, i. e., Market-1501.
1 code implementation • 24 Dec 2019 • Zhedong Zheng, Yi Yang
We consider the unsupervised scene adaptation problem of learning from both labeled source data and unlabeled target data.
Ranked #1 on
Domain Adaptation
on SYNTHIA-to-Cityscapes Labels
1 code implementation • 16 Sep 2019 • Bingwen Hu, Zhedong Zheng, Ping Liu, Wankou Yang, Mingwu Ren
Given two facial images with and without eyeglasses, the proposed model learns to swap the eye area in two faces.
12 code implementations • CVPR 2019 • Zhedong Zheng, Xiaodong Yang, Zhiding Yu, Liang Zheng, Yi Yang, Jan Kautz
To this end, we propose a joint learning framework that couples re-id learning and data generation end-to-end.
Ranked #1 on
Person Re-Identification
on UAV-Human
Image-to-Image Translation
Unsupervised Domain Adaptation
+1
2 code implementations • 7 Sep 2018 • Zhedong Zheng, Liang Zheng, Yi Yang, Fei Wu
Opposite-Direction Feature Attack (ODFA) effectively exploits feature-level adversarial gradients and takes advantage of feature distance in the representation space.
1 code implementation • ECCV 2018 • Yawei Luo, Zhedong Zheng, Liang Zheng, Tao Guan, Junqing Yu, Yi Yang
To address the two kinds of inconsistencies, this paper proposes the Macro-Micro Adversarial Net (MMAN).
Ranked #12 on
Semantic Segmentation
on LIP val
1 code implementation • 30 Jan 2018 • Qingji Guan, Yaping Huang, Zhun Zhong, Zhedong Zheng, Liang Zheng, Yi Yang
This paper considers the task of thorax disease classification on chest X-ray images.
no code implementations • 21 Jan 2018 • Yan Huang, Jinsong Xu, Qiang Wu, Zhedong Zheng, Zhao-Xiang Zhang, Jian Zhang
Unlike the traditional label which usually is a single integral number, the virtual label proposed in this work is a set of weight-based values each individual of which is a number in (0, 1] called multi-pseudo label and reflects the degree of relation between each generated data to every pre-defined class of real data.
10 code implementations • CVPR 2018 • Zhun Zhong, Liang Zheng, Zhedong Zheng, Shaozi Li, Yi Yang
In this paper, we explicitly consider this challenge by introducing camera style (CamStyle) adaptation.
Ranked #75 on
Person Re-Identification
on DukeMTMC-reID
2 code implementations • 15 Nov 2017 • Zhedong Zheng, Liang Zheng, Michael Garrett, Yi Yang, Mingliang Xu, Yi-Dong Shen
In this paper, we propose a new system to discriminatively embed the image and text to a shared visual-textual space.
Ranked #1 on
Cross-Modal Retrieval
on CUHK-PEDES
1 code implementation • 3 Jul 2017 • Zhedong Zheng, Liang Zheng, Yi Yang
This task aims to search a query person in a large image pool.
Ranked #1 on
Person Re-Identification
on CUHK03 (detected)
2 code implementations • 21 Mar 2017 • Yutian Lin, Liang Zheng, Zhedong Zheng, Yu Wu, Zhilan Hu, Chenggang Yan, Yi Yang
Person re-identification (re-ID) and attribute recognition share a common target at learning pedestrian descriptions.
Ranked #79 on
Person Re-Identification
on DukeMTMC-reID
8 code implementations • ICCV 2017 • Zhedong Zheng, Liang Zheng, Yi Yang
We verify the proposed method on a practical problem: person re-identification (re-ID).
Ranked #4 on
Person Re-Identification
on CUHK03
Fine-Grained Image Classification
Generative Adversarial Network
+2
4 code implementations • 17 Nov 2016 • Zhedong Zheng, Liang Zheng, Yi Yang
We revisit two popular convolutional neural networks (CNN) in person re-identification (re-ID), i. e, verification and classification models.
Ranked #1 on
Person Re-Identification
on Market-1501+500k