no code implementations • 22 Apr 2024 • Wencheng Zhu, Xin Zhou, Pengfei Zhu, Yu Wang, QinGhua Hu
Note that constraints on intra-sample similarities and inter-sample dissimilarities can be efficiently and effectively reformulated into a contrastive learning framework with newly designed positive and negative pairs.
1 code implementation • 13 Apr 2024 • Yuwei Tang, Zhenyi Lin, Qilong Wang, Pengfei Zhu, QinGhua Hu
To this end, we disassemble three key components involved in computation of logit bias (i. e., logit features, logit predictor, and logit fusion) and empirically analyze the effect on performance of few-shot classification.
1 code implementation • 19 Mar 2024 • Pengfei Zhu, Yang Sun, Bing Cao, QinGhua Hu
These adapters are shared across different tasks and constrained by mutual information regularization, ensuring compatibility with different tasks while complementarity for multi-source images.
1 code implementation • 12 Jan 2024 • Yu Wang, Junxian Mu, Pengfei Zhu, QinGhua Hu
We show that the differences in attention maps can lead to diverse representations so that the fused representations can well handle the open space.
1 code implementation • 12 Jan 2024 • Pengfei Zhu, Qian Wang, Yu Wang, Jialu Li, QinGhua Hu
In this paper, we propose to dynamically learn the weights of SSL tasks for different nodes and fuse the embeddings learned from different SSL tasks to boost performance.
no code implementations • 5 Jan 2024 • Yuxin Yang, Pengfei Zhu, Mengshi Qi, Huadong Ma
To uncover latent motion patterns in human behavior, we introduce a novel memory-based method, named Motion Pattern Priors Memory Network.
1 code implementation • 27 Dec 2023 • Yan Fan, Yu Wang, Pengfei Zhu, QinGhua Hu
In this work, we focus on semi-supervised continual learning (SSCL), where the model progressively learns from partially labeled data with unknown categories.
1 code implementation • 17 Dec 2023 • Bing Cao, Junliang Guo, Pengfei Zhu, QinGhua Hu
To handle this problem, we propose a novel multi-modal visual prompt tracking model based on a universal bi-directional adapter, cross-prompting multiple modalities mutually.
Ranked #6 on Rgb-T Tracking on LasHeR
1 code implementation • ICCV 2023 • Mingze Gao, Qilong Wang, Zhenyi Lin, Pengfei Zhu, QinGhua Hu, Jingbo Zhou
Distinguished from LP which builds a linear classification head based on the mean of final features (e. g., word tokens for ViT) or classification tokens, our MP performs a linear classifier on feature distribution, which provides the stronger representation ability by exploiting richer statistical information inherent in features.
1 code implementation • IEEE Transactions on Circuits and Systems for Video Technology 2023 • Guanlin Chen, Pengfei Zhu, Bing Cao, Xing Wang, QinGhua Hu
During the tracking process, a cross-drone mapping mechanism is proposed by using the surrounding information of the drone with promising tracking status as reference, assisting drones that lost targets to re-calibrate, which implements real-time cross-drone information interaction.
1 code implementation • IEEE Transactions on Circuits and Systems for Video Technology 2023 • Guosong Jiang, Pengfei Zhu, Yu Wang, QinGhua Hu
In this paper, we point out that balancing between structural risk and open space risk is crucial for open set recognition, and re-formalize it as open set structural risk.
no code implementations • ICCV 2023 • Pengfei Zhu, Mengshi Qi, Xia Li, Weijian Li, Huadong Ma
Predicting attention regions of interest is an important yet challenging task for self-driving systems.
no code implementations • 9 Feb 2023 • Pengfei Zhu, Chao Pang, Yekun Chai, Lei LI, Shuohuan Wang, Yu Sun, Hao Tian, Hua Wu
In response to this lacuna, this paper introduces a pioneering contribution in the form of a text-to-waveform music generation model, underpinned by the utilization of diffusion models.
1 code implementation • ICCV 2023 • Yiming Sun, Bing Cao, Pengfei Zhu, QinGhua Hu
The MoLE performs specialized learning of multi-modal local features, prompting the fused images to retain the local information in a sample-adaptive manner, while the MoGE focuses on the global information that complements the fused image with overall texture detail and contrast.
1 code implementation • 13 Dec 2022 • Qinghe Wang, Lijie Liu, Miao Hua, Pengfei Zhu, WangMeng Zuo, QinGhua Hu, Huchuan Lu, Bing Cao
We blend the semantic layouts of source head and source body, and then inpaint the transition region by the semantic layout generator, achieving a coarse-grained head swapping.
2 code implementations • 7 Nov 2022 • Xiaoran Fan, Chao Pang, Tian Yuan, He Bai, Renjie Zheng, Pengfei Zhu, Shuohuan Wang, Junkun Chen, Zeyu Chen, Liang Huang, Yu Sun, Hua Wu
In this paper, we extend the pretraining method for cross-lingual multi-speaker speech synthesis tasks, including cross-lingual multi-speaker voice cloning and cross-lingual multi-speaker speech editing.
1 code implementation • ACMMM 2022 • Yiming Sun, Bing Cao, Pengfei Zhu, QinGhua Hu
We cascade the image fusion network with the detection networks of both modalities and use the detection loss of the fused images to provide guidance on task-related information for the optimization of the image fusion network.
no code implementations • 29 Aug 2022 • Pengfei Zhu, Xinjie Yao, Yu Wang, Meng Cao, Binyuan Hui, Shuai Zhao, QinGhua Hu
Multi-view learning has progressed rapidly in recent years.
no code implementations • 21 Jun 2022 • Junwen Pan, Guanlin Chen, Yi Liu, Jiexiang Wang, Cheng Bian, Pengfei Zhu, Zhicheng Zhang
Answer grounding aims to reveal the visual evidence for visual question answering (VQA), which entails highlighting relevant positions in the image when answering questions about images.
1 code implementation • 19 Mar 2022 • Junwen Pan, Pengfei Zhu, Kaihua Zhang, Bing Cao, Yu Wang, Dingwen Zhang, Junwei Han, QinGhua Hu
Semantic segmentation with limited annotations, such as weakly supervised semantic segmentation (WSSS) and semi-supervised semantic segmentation (SSSS), is a challenging task that has attracted much attention recently.
Ranked #34 on Weakly-Supervised Semantic Segmentation on COCO 2014 val
no code implementations • 10 Mar 2022 • Junwen Pan, Qi Bi, Yanzhan Yang, Pengfei Zhu, Cheng Bian
Due to the lack of expertise for medical image annotation, the investigation of label-efficient methodology for medical image segmentation becomes a heated topic.
no code implementations • 23 Nov 2021 • Pengfei Zhu, Hongtao Yu, Kaihua Zhang, Yu Wang, Shuai Zhao, Lei Wang, Tianzhu Zhang, QinGhua Hu
To address this issue, segmentation-based trackers have been proposed that employ per-pixel matching to improve the tracking performance of deformable objects effectively.
no code implementations • 31 Aug 2021 • Pengfei Zhu, Xiaoguang Li, Jian Li, Hai Zhao
Open-domain Question Answering (ODQA) has achieved significant results in terms of supervised learning manner.
Machine Reading Comprehension Open-Domain Question Answering
1 code implementation • 19 Jul 2021 • Dawei Du, Longyin Wen, Pengfei Zhu, Heng Fan, QinGhua Hu, Haibin Ling, Mubarak Shah, Junwen Pan, Ali Al-Ali, Amr Mohamed, Bakour Imene, Bin Dong, Binyu Zhang, Bouchali Hadia Nesma, Chenfeng Xu, Chenzhen Duan, Ciro Castiello, Corrado Mencar, Dingkang Liang, Florian Krüger, Gennaro Vessio, Giovanna Castellano, Jieru Wang, Junyu Gao, Khalid Abualsaud, Laihui Ding, Lei Zhao, Marco Cianciotta, Muhammad Saqib, Noor Almaadeed, Omar Elharrouss, Pei Lyu, Qi Wang, Shidong Liu, Shuang Qiu, Siyang Pan, Somaya Al-Maadeed, Sultan Daud Khan, Tamer Khattab, Tao Han, Thomas Golda, Wei Xu, Xiang Bai, Xiaoqing Xu, Xuelong Li, Yanyun Zhao, Ye Tian, Yingnan Lin, Yongchao Xu, Yuehan Yao, Zhenyu Xu, Zhijian Zhao, Zhipeng Luo, Zhiwei Wei, Zhiyuan Zhao
Crowd counting on the drone platform is an interesting topic in computer vision, which brings new challenges such as small object inference, background clutter and wide viewpoint.
1 code implementation • CVPR 2021 • Longyin Wen, Dawei Du, Pengfei Zhu, QinGhua Hu, Qilong Wang, Liefeng Bo, Siwei Lyu
To promote the developments of object detection, tracking and counting algorithms in drone-captured videos, we construct a benchmark with a new drone-captured largescale dataset, named as DroneCrowd, formed by 112 video clips with 33, 600 HD frames in various scenarios.
2 code implementations • 5 Jan 2021 • Binyuan Hui, Ruiying Geng, Qiyu Ren, Binhua Li, Yongbin Li, Jian Sun, Fei Huang, Luo Si, Pengfei Zhu, Xiaodan Zhu
Semantic parsing has long been a fundamental problem in natural language processing.
Ranked #5 on Dialogue State Tracking on CoSQL
no code implementations • ECCV 2020 • Junbing Li, Changqing Zhang, Pengfei Zhu, Baoyuan Wu, Lei Chen, QinGhua Hu
Although significant progress achieved, multi-label classification is still challenging due to the complexity of correlations among different labels.
no code implementations • ECCV 2020 • Cong-Cong Li, Dawei Du, Libo Zhang, Longyin Wen, Tiejian Luo, Yanjun Wu, Pengfei Zhu
Specifically, we first build the spatial pyramid representation to capture context information of objects at different scales.
1 code implementation • 16 Mar 2020 • Pengfei Zhu, Jiayu Zheng, Dawei Du, Longyin Wen, Yiming Sun, QinGhua Hu
Moreover, an agent sharing network (ASNet) is proposed by self-supervised template sharing and view-aware fusion of the target from multiple drones, which can improve the tracking accuracy significantly compared with single drone tracking.
2 code implementations • 5 Mar 2020 • Yiming Sun, Bing Cao, Pengfei Zhu, QinGhua Hu
To address this dilemma, we further propose an uncertainty-aware cross-modality vehicle detection (UA-CMDet) framework to extract complementary information from cross-modal images, which can significantly improve the detection performance in low light conditions.
3 code implementations • 26 Jan 2020 • Pengfei Zhu, Hai Zhao, Xiaoguang Li
Multi-choice Machine Reading Comprehension (MRC) requires model to decide the correct answer from a set of answer options when given a passage and a question.
Ranked #3 on Reading Comprehension on RACE
2 code implementations • 16 Jan 2020 • Pengfei Zhu, Longyin Wen, Dawei Du, Xiao Bian, Heng Fan, QinGhua Hu, Haibin Ling
We provide a large-scale drone captured dataset, VisDrone, which includes four tracks, i. e., (1) image object detection, (2) video object detection, (3) single object tracking, and (4) multi-object tracking.
no code implementations • 1 Jan 2020 • Pengfei Zhu, Hai Zhao, Xiaoguang Li
Multi-choice Machine Reading Comprehension (MRC) requires model to decide the correct answer from a set of answer options when given a passage and a question.
1 code implementation • 4 Dec 2019 • Longyin Wen, Dawei Du, Pengfei Zhu, QinGhua Hu, Qilong Wang, Liefeng Bo, Siwei Lyu
This paper proposes a space-time multi-scale attention network (STANet) to solve density map estimation, localization and tracking in dense crowds of video clips captured by drones with arbitrary crowd density, perspective, and flight altitude.
1 code implementation • International Conference on Computer Vision Workshops 2019 • Dawei Du, Pengfei Zhu, Longyin Wen, Xiao Bian, Haibin Lin, QinGhua Hu, Tao Peng, Jiayu Zheng, Xinyao Wang, Yue Zhang, Liefeng Bo, Hailin Shi, Rui Zhu, Aashish Kumar, Aijin Li, Almaz Zinollayev, Anuar Askergaliyev, Arne Schumann, Binjie Mao, Byeongwon Lee, Chang Liu, Changrui Chen, Chunhong Pan, Chunlei Huo, Da Yu, Dechun Cong, Dening Zeng, Dheeraj Reddy Pailla, Di Li, Dong Wang, Donghyeon Cho, Dongyu Zhang, Furui Bai, George Jose, Guangyu Gao, Guizhong Liu, Haitao Xiong, Hao Qi, Haoran Wang, Heqian Qiu, Hongliang Li, Huchuan Lu, Ildoo Kim, Jaekyum Kim, Jane Shen, Jihoon Lee, Jing Ge, Jingjing Xu, Jingkai Zhou, Jonas Meier, Jun Won Choi, Junhao Hu, Junyi Zhang, Junying Huang, Kaiqi Huang, Keyang Wang, Lars Sommer, Lei Jin, Lei Zhang
Results of 33 object detection algorithms are presented.
12 code implementations • CVPR 2020 • Qilong Wang, Banggu Wu, Pengfei Zhu, Peihua Li, WangMeng Zuo, QinGhua Hu
By dissecting the channel attention module in SENet, we empirically show avoiding dimensionality reduction is important for learning channel attention, and appropriate cross-channel interaction can preserve performance while significantly decreasing model complexity.
Ranked #735 on Image Classification on ImageNet
2 code implementations • 6 Aug 2019 • Pengfei Zhu, Xinjie Yao, Yu Wang, Binyuan Hui, Dawei Du, QinGhua Hu
Dnet learns view-specific self-representation matrices, whereas Unet learns a common self-representation matrix for all views.
Ranked #1 on Multi-view Subspace Clustering on ORL
4 code implementations • CVPR 2019 • Dongwei Ren, WangMeng Zuo, QinGhua Hu, Pengfei Zhu, Deyu Meng
To handle this issue, this paper provides a better and simpler baseline deraining network by considering network architecture, input and output, and loss functions.
Ranked #1 on Single Image Deraining on Rain1400
no code implementations • 3 Oct 2018 • Andrey Ignatov, Radu Timofte, Thang Van Vu, Tung Minh Luu, Trung X. Pham, Cao Van Nguyen, Yongwoo Kim, Jae-Seok Choi, Munchurl Kim, Jie Huang, Jiewen Ran, Chen Xing, Xingguang Zhou, Pengfei Zhu, Mingrui Geng, Yawei Li, Eirikur Agustsson, Shuhang Gu, Luc van Gool, Etienne de Stoutz, Nikolay Kobyshev, Kehui Nie, Yan Zhao, Gen Li, Tong Tong, Qinquan Gao, Liu Hanwen, Pablo Navarrete Michelini, Zhu Dan, Hu Fengshuo, Zheng Hui, Xiumei Wang, Lirui Deng, Rang Meng, Jinghui Qin, Yukai Shi, Wushao Wen, Liang Lin, Ruicheng Feng, Shixiang Wu, Chao Dong, Yu Qiao, Subeesh Vasu, Nimisha Thekke Madam, Praveen Kandula, A. N. Rajagopalan, Jie Liu, Cheolkon Jung
This paper reviews the first challenge on efficient perceptual image enhancement with the focus on deploying deep learning models on smartphones.
no code implementations • COLING 2018 • Pengfei Zhu, Zhuosheng Zhang, Jiangtong Li, Yafang Huang, Hai Zhao
Traditional chatbots usually need a mass of human dialogue data, especially when using supervised machine learning method.
no code implementations • 7 Aug 2018 • Zhuosheng Zhang, Yafang Huang, Pengfei Zhu, Hai Zhao
Machine reading comprehension is a task to model relationship between passage and query.
1 code implementation • COLING 2018 • Zhuosheng Zhang, Jiangtong Li, Pengfei Zhu, Hai Zhao, Gongshen Liu
In this paper, we formulate previous utterances into context using a proposed deep utterance aggregation model to form a fine-grained context representation.
Ranked #13 on Conversational Response Selection on E-commerce
no code implementations • 20 Apr 2018 • Pengfei Zhu, Longyin Wen, Xiao Bian, Haibin Ling, QinGhua Hu
In this paper we present a large-scale visual object detection and tracking benchmark, named VisDrone2018, aiming at advancing visual understanding tasks on the drone platform.
no code implementations • 17 Apr 2018 • Pengfei Zhu, Xin Li, Pascal Poupart, Guanghui Miao
Deep Reinforcement Learning (RL) recently emerged as one of the most competitive approaches for learning in sequential decision making problems with fully observable environments, e. g., computer Go.
no code implementations • CVPR 2017 • Changqing Zhang, QinGhua Hu, Huazhu Fu, Pengfei Zhu, Xiaochun Cao
In this paper, we propose a novel Latent Multi-view Subspace Clustering (LMSC) method, which clusters data points with latent representation and simultaneously explores underlying complementary information from multiple views.
1 code implementation • 26 Apr 2017 • Pengfei Zhu, Xin Li, Pascal Poupart, Guanghui Miao
Deep Reinforcement Learning (RL) recently emerged as one of the most competitive approaches for learning in sequential decision making problems with fully observable environments, e. g., computer Go.
no code implementations • 30 Aug 2013 • Pengfei Zhu, WangMeng Zuo, Lei Zhang, Simon C. K. Shiu, David Zhang
One key issue of ISFR is how to effectively and efficiently represent the query face image set by using the gallery face image sets.