Instilling Multi-round Thinking to Text-guided Image Generation

no code implementations16 Jan 2024 Lidong Zeng, Zhedong Zheng, Yinwei Wei, Tat-Seng Chua

This paper delves into the text-guided image editing task, focusing on modifying a reference image according to user-specified textual feedback to embody specific attributes.

Image Generation text-guided-generation +1

Transferring to Real-World Layouts: A Depth-aware Framework for Scene Adaptation

no code implementations21 Nov 2023 Mu Chen, Zhedong Zheng, Yi Yang

Based on such observation, we propose a depth-aware framework to explicitly leverage depth estimation to mix the categories and facilitate the two complementary tasks, i. e., segmentation and depth learning in an end-to-end manner.

Depth Estimation Scene Segmentation +2

Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with Spatial Relation Matching

no code implementations21 Nov 2023 Meng Chu, Zhedong Zheng, Wei Ji, Tingyu Wang, Tat-Seng Chua

Navigating drones through natural language commands remains challenging due to the dearth of accessible multi-modal datasets and the stringent precision requirements for aligning visual and textual data.

Drone navigation Language Modelling +2

Progressive Text-to-3D Generation for Automatic 3D Prototyping

1 code implementation26 Sep 2023 Han Yi, Zhedong Zheng, Xiangyu Xu, Tat-Seng Chua

We aspire for our work to pave the way for automatic 3D prototyping via natural language descriptions.

3D Generation Text to 3D

Towards Unified Text-based Person Retrieval: A Large-scale Multi-Attribute and Language Search Benchmark

1 code implementation5 Jun 2023 Shuyu Yang, Yinan Zhou, Yaxiong Wang, Yujiao Wu, Li Zhu, Zhedong Zheng

To verify the feasibility of learning from the generated data, we develop a new joint Attribute Prompt Learning and Text Matching Learning (APTM) framework, considering the shared knowledge between attribute and text.

Attribute Image-text matching +7

Relieving Triplet Ambiguity: Consensus Network for Language-Guided Image Retrieval

no code implementations3 Jun 2023 Xu Zhang, Zhedong Zheng, Xiaohan Wang, Yi Yang

We propose a novel Consensus Network (Css-Net) that self-adaptively learns from noisy triplets to minimize the negative effects of triplet ambiguity.

Image Retrieval Image Retrieval with Multi-Modal Query +1

Actively Discovering New Slots for Task-oriented Conversation

1 code implementation6 May 2023 Yuxia Wu, Tianhao Dai, Zhedong Zheng, Lizi Liao

Existing task-oriented conversational search systems heavily rely on domain ontologies with pre-defined slots and candidate value sets.

Active Learning Conversational Search

Learnable Pillar-based Re-ranking for Image-Text Retrieval

1 code implementation25 Apr 2023 Leigang Qu, Meng Liu, Wenjie Wang, Zhedong Zheng, Liqiang Nie, Tat-Seng Chua

Image-text retrieval aims to bridge the modality gap and retrieve cross-modal content based on semantic similarities.

Re-Ranking Retrieval +1

Context-Aware Pretraining for Efficient Blind Image Decomposition

1 code implementation CVPR 2023 Chao Wang, Zhedong Zheng, Ruijie Quan, Yifan Sun, Yi Yang

(2) The conventional paradigm usually focuses on mining the abnormal pattern of a superimposed image to separate the noise, which de facto conflicts with the primary image restoration task.

Attribute Image Reconstruction +1

Are Binary Annotations Sufficient? Video Moment Retrieval via Hierarchical Uncertainty-Based Active Learning

1 code implementation CVPR 2023 Wei Ji, Renjie Liang, Zhedong Zheng, Wenqiao Zhang, Shengyu Zhang, Juncheng Li, Mengze Li, Tat-Seng Chua

Moreover, we treat the uncertainty score of frames in a video as a whole, and estimate the difficulty of each video, which can further relieve the burden of video selection.

Active Learning Moment Retrieval +1

StepNet: Spatial-temporal Part-aware Network for Isolated Sign Language Recognition

no code implementations25 Dec 2022 Xiaolong Shen, Zhedong Zheng, Yi Yang

As its name suggests, it is made up of two modules: Part-level Spatial Modeling and Part-level Temporal Modeling.

Optical Flow Estimation Sign Language Recognition

Cognitive Accident Prediction in Driving Scenes: A Multimodality Benchmark

1 code implementation19 Dec 2022 Jianwu Fang, Lei-Lei Li, Kuan Yang, Zhedong Zheng, Jianru Xue, Tat-Seng Chua

In particular, the text description provides a dense semantic description guidance for the primary context of the traffic scene, while the driver attention provides a traction to focus on the critical region closely correlating with safe driving.

Decision Making

PiPa: Pixel- and Patch-wise Self-supervised Learning for Domain Adaptative Semantic Segmentation

1 code implementation14 Nov 2022 Mu Chen, Zhedong Zheng, Yi Yang, Tat-Seng Chua

In an attempt to fill this gap, we propose a unified pixel- and patch-wise self-supervised learning framework, called PiPa, for domain adaptive semantic segmentation that facilitates intra-image pixel-wise correlations and patch-wise semantic consistency against different contexts.

Self-Supervised Learning Semantic Segmentation +2

Composed Image Retrieval with Text Feedback via Multi-grained Uncertainty Regularization

1 code implementation14 Nov 2022 Yiyang Chen, Zhedong Zheng, Wei Ji, Leigang Qu, Tat-Seng Chua

The key idea underpinning the proposed method is to integrate fine- and coarse-grained retrieval as matching data points with small and large fluctuations, respectively.

Composed Image Retrieval (CoIR) Image Retrieval with Multi-Modal Query +1

Learning Cross-view Geo-localization Embeddings via Dynamic Weighted Decorrelation Regularization

no code implementations10 Nov 2022 Tingyu Wang, Zhedong Zheng, Zunjie Zhu, Yuhan Gao, Yi Yang, Chenggang Yan

Cross-view geo-localization aims to spot images of the same location shot from two platforms, e. g., the drone platform and the satellite platform.

Joint Representation Learning and Keypoint Detection for Cross-view Geo-localization

1 code implementation IEEE Transactions on Image Processing (TIP) 2022 Jinliang Lin, Zhedong Zheng, Zhun Zhong, Zhiming Luo, Shaozi Li, Yi Yang, Nicu Sebe

Inspired by the human visual system for mining local patterns, we propose a new framework called RK-Net to jointly learn the discriminative Representation and detect salient Keypoints with a single Network.

Drone navigation Drone-view target localization +3

3D Magic Mirror: Clothing Reconstruction from a Single Image via a Causal Perspective

1 code implementation27 Apr 2022 Zhedong Zheng, Jiayin Zhu, Wei Ji, Yi Yang, Tat-Seng Chua

This research aims to study a self-supervised 3D clothing reconstruction method, which recovers the geometry shape and texture of human clothing from a single image.

3D Reconstruction Person Re-Identification +2

Multi-View Consistent Generative Adversarial Networks for 3D-aware Image Synthesis

1 code implementation CVPR 2022 Xuanmeng Zhang, Zhedong Zheng, Daiheng Gao, Bang Zhang, Pan Pan, Yi Yang

To address this challenge, we propose Multi-View Consistent Generative Adversarial Networks (MVCGAN) for high-quality 3D-aware image synthesis with geometry constraints.

3D-Aware Image Synthesis

Self-supervised Point Cloud Representation Learning via Separating Mixed Shapes

1 code implementation1 Sep 2021 Chao Sun, Zhedong Zheng, Xiaohan Wang, Mingliang Xu, Yi Yang

Albeit simple, the pre-trained encoder can capture the key points of an unseen point cloud and surpasses the encoder trained from scratch on downstream tasks.

3D Part Segmentation 3D Point Cloud Classification +3

SPG-VTON: Semantic Prediction Guidance for Multi-pose Virtual Try-on

no code implementations3 Aug 2021 Bingwen Hu, Ping Liu, Zhedong Zheng, Mingwu Ren

Third, a Try-on Synthesis Module (TSM) combines the coarse result and the warped clothes to generate the final virtual try-on image, preserving details of the desired clothes and under the desired pose.

Virtual Try-on

Less is More: Sparse Sampling for Dense Reaction Predictions

no code implementations3 Jun 2021 Kezhou Lin, Xiaohan Wang, Zhedong Zheng, Linchao Zhu, Yi Yang

Obtaining viewer responses from videos can be useful for creators and streaming platforms to analyze the video performance and improve the future user experience.

Connecting Language and Vision for Natural Language-Based Vehicle Retrieval

1 code implementation31 May 2021 Shuai Bai, Zhedong Zheng, Xiaohan Wang, Junyang Lin, Zhu Zhang, Chang Zhou, Yi Yang, Hongxia Yang

In this paper, we apply one new modality, i. e., the language description, to search the vehicle of interest and explore the potential of this task in the real-world scenario.

Language Modelling Management +2

Decoupled and Memory-Reinforced Networks: Towards Effective Feature Learning for One-Step Person Search

no code implementations22 Feb 2021 Chuchu Han, Zhedong Zheng, Changxin Gao, Nong Sang, Yi Yang

Specifically, to reconcile the conflicts of multiple objectives, we simplify the standard tightly coupled pipelines and establish a deeply decoupled multi-task learning framework.

Metric Learning Multi-Task Learning +2

Understanding Image Retrieval Re-Ranking: A Graph Neural Network Perspective

1 code implementation14 Dec 2020 Xuanmeng Zhang, Minyue Jiang, Zhedong Zheng, Xiao Tan, Errui Ding, Yi Yang

We argue that the first phase equals building the k-nearest neighbor graph, while the second phase can be viewed as spreading the message within the graph.

Drone-view target localization Image Retrieval +4

Each Part Matters: Local Patterns Facilitate Cross-view Geo-localization

1 code implementation26 Aug 2020 Tingyu Wang, Zhedong Zheng, Chenggang Yan, Jiyong Zhang, Yaoqi Sun, Bolun Zheng, Yi Yang

Existing methods usually concentrate on mining the fine-grained feature of the geographic target in the image center, but underestimate the contextual information in neighbor areas.

Drone navigation Drone-view target localization +2

VehicleNet: Learning Robust Visual Representation for Vehicle Re-identification

3 code implementations14 Apr 2020 Zhedong Zheng, Tao Ruan, Yunchao Wei, Yi Yang, Tao Mei

This stage relaxes the full alignment between the training and testing domains, as it is agnostic to the target vehicle domain.

Representation Learning Vehicle Re-Identification

Rectifying Pseudo Label Learning via Uncertainty Estimation for Domain Adaptive Semantic Segmentation

3 code implementations8 Mar 2020 Zhedong Zheng, Yi Yang

This paper focuses on the unsupervised domain adaptation of transferring the knowledge from the source domain to the target domain in the context of semantic segmentation.

Pseudo Label Segmentation +3

University-1652: A Multi-view Multi-source Benchmark for Drone-based Geo-localization

3 code implementations27 Feb 2020 Zhedong Zheng, Yunchao Wei, Yi Yang

To our knowledge, University-1652 is the first drone-based geo-localization dataset and enables two new tasks, i. e., drone-view target localization and drone navigation.

Drone navigation Drone-view target localization +2

Progressive Local Filter Pruning for Image Retrieval Acceleration

no code implementations24 Jan 2020 Xiaodong Wang, Zhedong Zheng, Yang He, Fei Yan, Zhiqiang Zeng, Yi Yang

To verify this, we evaluate our method on two widely-used image retrieval datasets, i. e., Oxford5k and Paris6K, and one person re-identification dataset, i. e., Market-1501.

Image Retrieval Network Pruning +2

Unsupervised Scene Adaptation with Memory Regularization in vivo

2 code implementations24 Dec 2019 Zhedong Zheng, Yi Yang

We consider the unsupervised scene adaptation problem of learning from both labeled source data and unlabeled target data.

Semantic Segmentation Synthetic-to-Real Translation +1

Unsupervised Eyeglasses Removal in the Wild

1 code implementation16 Sep 2019 Bingwen Hu, Zhedong Zheng, Ping Liu, Wankou Yang, Mingwu Ren

Given two facial images with and without eyeglasses, the proposed model learns to swap the eye area in two faces.

Face Reconstruction Face Verification +3

Query Attack via Opposite-Direction Feature:Towards Robust Image Retrieval

2 code implementations7 Sep 2018 Zhedong Zheng, Liang Zheng, Yi Yang, Fei Wu

Opposite-Direction Feature Attack (ODFA) effectively exploits feature-level adversarial gradients and takes advantage of feature distance in the representation space.

Adversarial Attack General Classification +3

Multi-pseudo Regularized Label for Generated Data in Person Re-Identification

no code implementations21 Jan 2018 Yan Huang, Jinsong Xu, Qiang Wu, Zhedong Zheng, Zhao-Xiang Zhang, Jian Zhang

Unlike the traditional label which usually is a single integral number, the virtual label proposed in this work is a set of weight-based values each individual of which is a number in (0, 1] called multi-pseudo label and reflects the degree of relation between each generated data to every pre-defined class of real data.

Generative Adversarial Network Person Re-Identification +1

A Discriminatively Learned CNN Embedding for Person Re-identification

4 code implementations17 Nov 2016 Zhedong Zheng, Liang Zheng, Yi Yang

We revisit two popular convolutional neural networks (CNN) in person re-identification (re-ID), i. e, verification and classification models.

General Classification Image Retrieval +2

