1 code implementation • ACL 2022 • Bin Liang, Chenwei Lou, Xiang Li, Min Yang, Lin Gui, Yulan He, Wenjie Pei, Ruifeng Xu
Then, the descriptions of the objects are served as a bridge to determine the importance of the association between the objects of image modality and the contextual words of text modality, so as to build a cross-modal graph for each multi-modal instance.
no code implementations • ECCV 2020 • Jiadong Liang, Wenjie Pei, Feng Lu
Typical methods for text-to-image synthesis seek to design effective generative architecture to model the text-to-image mapping directly.
1 code implementation • 28 Jul 2024 • Shuang Wu, Songlin Tang, Guangming Lu, Jianzhuang Liu, Wenjie Pei
In this work we design a Unified Voxelization framework for explicit learning of scene representations, dubbed UniVoxel, which allows for efficient modeling of the geometry, materials and illumination jointly, thereby accelerating the inverse rendering significantly.
1 code implementation • 28 Jul 2024 • Jingjing Wu, Zhengyao Fang, Pengyuan Lyu, Chengquan Zhang, Fanglin Chen, Guangming Lu, Wenjie Pei
In this work, we formulate this challenging problem as a Weakly Supervised Cross-modality Contrastive Learning problem, and design a simple yet effective model dubbed WeCromCL that is able to detect each transcription in a scene image in a weakly supervised manner.
1 code implementation • 27 Jun 2024 • Yanan sun, Yanchen Liu, Yinhao Tang, Wenjie Pei, Kai Chen
To address these challenges, we propose AnyControl, a multi-control image synthesis framework that supports arbitrary combinations of diverse control signals.
1 code implementation • CVPR 2024 • Jiapeng Su, Qi Fan, Guangming Lu, Fanglin Chen, Wenjie Pei
Instead, our key idea is to adapt a small adapter for rectifying diverse target domain styles to the source domain.
no code implementations • 1 Jan 2024 • Wenjie Pei, Weina Xu, Zongze Wu, Weichao Li, Jinfan Wang, Guangming Lu, Xiangrong Wang
In this work, we propose the Saliency-Aware Regularized Graph Neural Network (SAR-GNN) for graph classification, which consists of two core modules: 1) a traditional graph neural network serving as the backbone for learning node features and 2) the Graph Neural Memory designed to distill a compact graph representation from node features of the backbone.
no code implementations • 17 Dec 2023 • Jingwen Zhang, Zikun Zhou, Guangming Lu, Jiandong Tian, Wenjie Pei
Considering that, we propose to construct a synthetic target representation composed of dense and complete point clouds depicting the target shape precisely by shape completion for robust 3D tracking.
1 code implementation • 16 Dec 2023 • Wenjie Pei, Tongqi Xia, Fanglin Chen, Jinsong Li, Jiandong Tian, Guangming Lu
Typical methods for visual prompt tuning follow the sequential modeling paradigm stemming from NLP, which represents an input image as a flattened sequence of token embeddings and then learns a set of unordered parameterized tokens prefixed to the sequence representation as the visual prompts for task adaptation of large vision models.
1 code implementation • 3 Dec 2023 • Wenjie Pei, Qizhong Tan, Guangming Lu, Jiandong Tian
In particular, we devise the anisotropic Deformable Spatio-Temporal Attention module as the core component of D$^2$ST-Adapter, which can be tailored with anisotropic sampling densities along spatial and temporal domains to learn spatial and temporal features specifically in corresponding pathways, allowing our D$^2$ST-Adapter to encode features in a global view in 3D spatio-temporal space while maintaining a lightweight design.
1 code implementation • ICCV 2023 • Xin Feng, Yifeng Xu, Guangming Lu, Wenjie Pei
Detecting corrupted regions by learning the contrastive distinctions rather than the semantic patterns of corruptions, our model has well generalization ability across different corruption patterns.
no code implementations • 9 Aug 2023 • Songlin Tang, Wenjie Pei, Xin Tao, Tanghui Jia, Guangming Lu, Yu-Wing Tai
Existing methods for interactive segmentation in radiance fields entail scene-specific optimization and thus cannot generalize across different scenes, which greatly limits their applicability.
no code implementations • 7 Aug 2023 • Huimin Zeng, Weinong Wang, Xin Tao, Zhiwei Xiong, Yu-Wing Tai, Wenjie Pei
First, our model decouples the learning of source image semantics from the encoding of user guidance to process two types of input domains separately.
1 code implementation • 6 Aug 2023 • Zhenhua Ning, Zhuotao Tian, Guangming Lu, Wenjie Pei
Although extensive research has been conducted on 3D point cloud segmentation, effectively adapting generic models to novel categories remains a formidable challenge.
1 code implementation • 25 Mar 2023 • Zikun Zhou, Kaige Mao, Wenjie Pei, Hongpeng Wang, YaoWei Wang, Zhenyu He
To be specific, RHMNet first only uses the memory in the high-reliability level to locate the region with high reliability belonging to the target, which is highly similar to the initial target scribble.
no code implementations • 17 Jan 2023 • Jing Li, Di Kang, Wenjie Pei, Xuefei Zhe, Ying Zhang, Linchao Bao, Zhenyu He
Finally, we demonstrate that our method can be readily used to generate motion sequences with user-specified motion clips on the timeline.
no code implementations • 2 Dec 2022 • Dianwen Mei, Wei Zhuo, Jiandong Tian, Guangming Lu, Wenjie Pei
To circumvent these two challenges, we propose to activate the discriminability of novel classes explicitly in both the feature encoding stage and the prediction stage for segmentation.
no code implementations • 27 Nov 2022 • Jiatong Zhang, Zengwei Yao, Fanglin Chen, Guangming Lu, Wenjie Pei
Second, instead of only performing local self-attention within local windows as Swin Transformer does, the proposed SALG performs both 1) local intra-region self-attention for learning fine-grained features within each region and 2) global inter-region feature propagation for modeling global dependencies among all regions.
Ranked #930 on
Image Classification
on ImageNet
2 code implementations • 30 Oct 2022 • Jing Xu, Xu Luo, Xinglin Pan, Wenjie Pei, Yanan Li, Zenglin Xu
In this paper, we find that this problem usually occurs when the positions of support samples are in the vicinity of task centroid -- the mean of all class centroids in the task.
no code implementations • 30 Aug 2022 • Yi Li, Wenjie Pei, Zhenyu He
In this paper, we attempt to build a deep learning model that mimics all four steps in the traditional homography estimation pipeline.
no code implementations • 12 Aug 2022 • Jiadong Liang, Wenjie Pei, Feng Lu
Specifically, we formulate the text-to-layout generation as a sequence-to-sequence modeling task, and build our model upon Transformer to learn the spatial relationships between objects by modeling the sequential dependencies between them.
no code implementations • 25 Jul 2022 • Fengjun Li, Xin Feng, Fanglin Chen, Guangming Lu, Wenjie Pei
The real-world degradations can be beyond the simulation scope by the handcrafted degradations, which are referred to as novel degradations.
no code implementations • 25 Jul 2022 • Wenjie Pei, Shuang Wu, Dianwen Mei, Fanglin Chen, Jiandong Tian, Guangming Lu
In this work we design a novel knowledge distillation framework to guide the learning of the object detector and thereby restrain the overfitting in both the pre-training stage on base classes and fine-tuning stage on novel classes.
1 code implementation • 23 Jul 2022 • Qi Fan, Wenjie Pei, Yu-Wing Tai, Chi-Keung Tang
Motivated by the simple Gestalt principle that pixels belonging to the same object are more similar than those to different objects of same class, we propose a novel self-support matching strategy to alleviate this problem, which uses query prototypes to match query features, where the query prototypes are collected from high-confidence query predictions.
Ranked #13 on
Few-Shot Semantic Segmentation
on FSS-1000 (5-shot)
1 code implementation • 22 Jul 2022 • Shuang Wu, Wenjie Pei, Dianwen Mei, Fanglin Chen, Jiandong Tian, Guangming Lu
Most of existing methods for few-shot object detection follow the fine-tuning paradigm, which potentially assumes that the class-agnostic generalizable knowledge can be learned and transferred implicitly from base classes with abundant samples to novel classes with limited samples via such a two-stage training strategy.
1 code implementation • 20 Jul 2022 • Wenjie Pei, Xin Feng, Canmiao Fu, Qiong Cao, Guangming Lu, Yu-Wing Tai
The key challenge of sequence representation learning is to capture the long-range temporal dependencies.
1 code implementation • 16 Jul 2022 • Xin Feng, Haobo Ji, Wenjie Pei, Fanglin Chen, Guangming Lu
While the research on image background restoration from regular size of degraded images has achieved remarkable progress, restoring ultra high-resolution (e. g., 4K) images remains an extremely challenging task due to the explosion of computational complexity and memory usage, as well as the deficiency of annotated data.
1 code implementation • 15 Jul 2022 • Jingjing Wu, Pengyuan Lyu, Guangming Lu, Chengquan Zhang, Wenjie Pei
Typical text spotters follow the two-stage spotting paradigm which detects the boundary for a text instance first and then performs text recognition within the detected regions.
Ranked #5 on
Text Spotting
on ICDAR 2015
1 code implementation • CVPR 2022 • Zikun Zhou, Jianqiu Chen, Wenjie Pei, Kaige Mao, Hongpeng Wang, Zhenyu He
While it can exploit the temporal context like historical appearances and locations of the target, a potential limitation of such strategy is that the local tracker tends to misidentify a nearby distractor as the target instead of activating the re-detector when the real target is out of view.
no code implementations • 14 Dec 2021 • Jing Xu, Xinglin Pan, Xu Luo, Wenjie Pei, Zenglin Xu
To alleviate this problem, we present a simple yet effective feature rectification method by exploring the category correlation between novel and base classes as the prior knowledge.
1 code implementation • 13 Dec 2021 • Xin Li, Qiao Liu, Wenjie Pei, Qiuhong Shen, YaoWei Wang, Huchuan Lu, Ming-Hsuan Yang
Along with the rapid progress of visual tracking, existing benchmarks become less informative due to redundancy of samples and weak discrimination between current trackers, making evaluations on all datasets extremely time-consuming.
no code implementations • 4 Dec 2021 • Haobo Ji, Xin Feng, Wenjie Pei, Jinxing Li, Guangming Lu
While Transformer has achieved remarkable performance in various high-level vision tasks, it is still challenging to exploit the full potential of Transformer in image restoration.
Ranked #21 on
Image Dehazing
on SOTS Indoor
no code implementations • 17 Nov 2021 • Zebin Lin, Wenjie Pei, Fanglin Chen, David Zhang, Guangming Lu
Instead of learning each of these diverse pedestrian appearance features individually as most existing methods do, we propose to perform contrastive learning to guide the feature learning in such a way that the semantic distance between pedestrians with different appearances in the learned feature space is minimized to eliminate the appearance diversities, whilst the distance between pedestrians and background is maximized.
Ranked #1 on
Pedestrian Detection
on TJU-Ped-campus
no code implementations • 9 Nov 2021 • Chaozheng Wang, Shuzheng Gao, Cuiyun Gao, Pengyun Wang, Wenjie Pei, Lujia Pan, Zenglin Xu
Real-world data usually present long-tailed distributions.
no code implementations • 10 Oct 2021 • Zengwei Yao, Wenjie Pei, Fanglin Chen, Guangming Lu, David Zhang
Existing methods for speech separation either transform the speech signals into frequency domain to perform separation or seek to learn a separable embedding space by constructing a latent domain based on convolutional filters.
Ranked #10 on
Speech Separation
on WHAMR!
no code implementations • 1 Oct 2021 • Xin Feng, Wenjie Pei, Fengjun Li, Fanglin Chen, David Zhang, Guangming Lu
Most existing methods for image inpainting focus on learning the intra-image priors from the known regions of the current input image to infer the content of the corrupted regions in the same image.
no code implementations • ICCV 2021 • Jing Li, Di Kang, Wenjie Pei, Xuefei Zhe, Ying Zhang, Zhenyu He, Linchao Bao
In order to overcome this problem, we propose a novel conditional variational autoencoder (VAE) that explicitly models one-to-many audio-to-motion mapping by splitting the cross-modal latent code into shared code and motion-specific code.
Ranked #3 on
Gesture Generation
on BEAT
1 code implementation • ICCV 2021 • Zikun Zhou, Wenjie Pei, Xin Li, Hongpeng Wang, Feng Zheng, Zhenyu He
A potential limitation of such trackers is that not all patches are equally informative for tracking.
no code implementations • 21 Jun 2021 • Xin Li, Wenjie Pei, YaoWei Wang, Zhenyu He, Huchuan Lu, Ming-Hsuan Yang
While deep-learning based tracking methods have achieved substantial progress, they entail large-scale and high-quality annotated data for sufficient training.
1 code implementation • 15 Apr 2021 • Kai Yang, Zhenyu He, Wenjie Pei, Zikun Zhou, Xin Li, Di Yuan, Haijun Zhang
By tracking a target as a pair of corners, we avoid the need to design the anchor boxes.
1 code implementation • 9 Oct 2020 • Xin Feng, Wenjie Pei, Zihui Jia, Fanglin Chen, David Zhang, Guangming Lu
In this work we present the Deep-Masking Generative Network (DMGN), which is a unified framework for background restoration from the superimposed images and is able to cope with different types of noise.
1 code implementation • ECCV 2020 • Qi Fan, Lei Ke, Wenjie Pei, Chi-Keung Tang, Yu-Wing Tai
We propose to learn the underlying class-agnostic commonalities that can be generalized from mask-annotated categories to novel categories.
Ranked #81 on
Instance Segmentation
on COCO test-dev
1 code implementation • 18 Dec 2019 • Jiadong Liang, Wenjie Pei, Feng Lu
Typical methods for text-to-image synthesis seek to design effective generative architecture to model the text-to-image mapping directly.
no code implementations • 31 Aug 2019 • Yunqiang Li, Wenjie Pei, Yufei zha, Jan van Gemert
In this paper we push for quantization: We optimize maximum class separability in the binary space.
no code implementations • ICCV 2019 • Lei Ke, Wenjie Pei, Ruiyu Li, Xiaoyong Shen, Yu-Wing Tai
State-of-the-art image captioning methods mostly focus on improving visual features, less attention has been paid to utilizing the inherent properties of language to boost captioning performance.
Ranked #5 on
Image Captioning
on MS COCO
no code implementations • 28 Aug 2019 • Weinong Wang, Wenjie Pei, Qiong Cao, Shu Liu, Yu-Wing Tai
Person re-identification aims to identify whether pairs of images belong to the same person or not.
no code implementations • ICCV 2019 • Canmiao Fu, Wenjie Pei, Qiong Cao, Chaopeng Zhang, Yong Zhao, Xiaoyong Shen, Yu-Wing Tai
Typical methods for supervised sequence modeling are built upon the recurrent neural networks to capture temporal dependencies.
1 code implementation • CVPR 2019 • Wenjie Pei, Jiyuan Zhang, Xiangrong Wang, Lei Ke, Xiaoyong Shen, Yu-Wing Tai
Typical techniques for video captioning follow the encoder-decoder framework, which can only focus on one source video being processed.
no code implementations • 3 Apr 2018 • Wenjie Pei, David M. J. Tax
Sequence data is challenging for machine learning approaches, because the lengths of the sequences may vary between samples.
no code implementations • 23 Nov 2017 • Wenjie Pei, Hamdi Dibeklioğlu, Tadas Baltrušaitis, David M. J. Tax
In this paper, we present an end-to-end architecture for age estimation, called Spatially-Indexed Attention Model (SIAM), which is able to simultaneously learn both the appearance and dynamics of age from raw videos of facial expressions.
no code implementations • 5 Sep 2017 • Wenjie Pei, Jie Yang, Zhu Sun, Jie Zhang, Alessandro Bozzon, David M. J. Tax
In particular, we propose a novel attention scheme to learn the attention scores of user and item history in an interacting way, thus to account for the dependencies between user and item dynamics in shaping user-item interactions.
1 code implementation • CVPR 2017 • Wenjie Pei, Tadas Baltrušaitis, David M. J. Tax, Louis-Philippe Morency
An important advantage of our approach is interpretability since the temporal attention weights provide a meaningful value for the salience of each time step in the sequence.
no code implementations • 15 Mar 2016 • Wenjie Pei, David M. J. Tax, Laurens van der Maaten
Traditional techniques for measuring similarities between time series are based on handcrafted similarity measures, whereas more recent learning-based approaches cannot exploit external supervision.
no code implementations • 16 Jun 2015 • Wenjie Pei, Hamdi Dibeklioğlu, David M. J. Tax, Laurens van der Maaten
We present a new model for time series classification, called the hidden-unit logistic model, that uses binary stochastic hidden units to model latent structure in the data.