no code implementations • 11 Mar 2025 • Fangyuan Wang, Shipeng Lyu, Peng Zhou, Anqing Duan, Guodong Guo, David Navarro-Alarcon
Enabling humanoid robots to perform long-horizon mobile manipulation planning in real-world environments based on embodied perception and comprehension abilities has been a longstanding challenge.
1 code implementation • 10 Mar 2025 • Yan Tai, Luhao Zhu, Zhiqiang Chen, Ynan Ding, Yiying Dong, Xiaohong Liu, Guodong Guo
To address complex visual decoding scenarios, we introduce the Triplet-Based Referring Paradigm (TRP), which explicitly decouples three critical dimensions in visual decoding tasks through a triplet structure: concepts, decoding types, and targets.
no code implementations • 20 Dec 2024 • Xianlin Zeng, Yufeng Wang, Yuqi Sun, Guodong Guo, Baochang Zhang, Wenrui Ding
To tackle these issues, we introduce an unsupervised method based on a joint of generative training and discriminative training to learn graph structure and representation, aiming to improve the discriminative performance of generative models.
no code implementations • 14 Apr 2024 • Wenhao Dong, Haodong Zhu, Shaohui Lin, Xiaoyan Luo, Yunhang Shen, Xuhui Liu, Juan Zhang, Guodong Guo, Baochang Zhang
In this paper, we investigate cross-modality fusion by associating cross-modal features in a hidden state space based on an improved Mamba with a gating mechanism.
1 code implementation • 29 Jun 2023 • Zichang Tan, Jun Li, Jinhao Du, Jun Wan, Zhen Lei, Guodong Guo
To achieve the collaborative learning in long-tailed learning, the balanced online distillation is proposed to force the consistent predictions among different experts and augmented copies, which reduces the learning uncertainties.
no code implementations • 27 Jun 2023 • Yanjing Li, Sheng Xu, Xianbin Cao, Li'an Zhuo, Baochang Zhang, Tian Wang, Guodong Guo
One natural approach is to use 1-bit CNNs to reduce the computation and memory cost of NAS by taking advantage of the strengths of each in a unified framework, while searching the 1-bit CNNs is more challenging due to the more complicated processes involved.
no code implementations • 5 May 2023 • Ajian Liu, Zichang Tan, Zitong Yu, Chenxu Zhao, Jun Wan, Yanyan Liang, Zhen Lei, Du Zhang, Stan Z. Li, Guodong Guo
The availability of handy multi-modal (i. e., RGB-D) sensors has brought about a surge of face anti-spoofing research.
1 code implementation • CVPR 2023 • Sheng Xu, Yanjing Li, Mingbao Lin, Peng Gao, Guodong Guo, Jinhu Lu, Baochang Zhang
At the upper level, we introduce a new foreground-aware query matching scheme to effectively transfer the teacher information to distillation-desired features to minimize the conditional information entropy.
1 code implementation • 11 Dec 2022 • Fanglei Xue, Qiangchang Wang, Zichang Tan, Zhongsong Ma, Guodong Guo
The proposed APP is employed to select the most informative patches on CNN features, and ATP discards unimportant tokens in ViT.
Ranked #10 on
Facial Expression Recognition (FER)
on RAF-DB
Facial Expression Recognition
Facial Expression Recognition (FER)
+1
1 code implementation • 13 Oct 2022 • Yanjing Li, Sheng Xu, Baochang Zhang, Xianbin Cao, Peng Gao, Guodong Guo
The large pre-trained vision transformers (ViTs) have demonstrated remarkable performance on various visual tasks, but suffer from expensive computational and memory cost problems when deployed on resource-constrained devices.
1 code implementation • 3 Oct 2022 • Weixia Zhang, Dingquan Li, Xiongkuo Min, Guangtao Zhai, Guodong Guo, Xiaokang Yang, Kede Ma
No-reference image quality assessment (NR-IQA) aims to quantify how humans perceive visual distortions of digital images without access to their undistorted references.
1 code implementation • 28 Sep 2022 • Xiangcheng Liu, Tianyi Wu, Guodong Guo
The learnable thresholds are optimized in budget-aware training to balance accuracy and complexity, performing the corresponding pruning configurations for different input instances.
Ranked #6 on
Efficient ViTs
on ImageNet-1K (With LV-ViT-S)
2 code implementations • 4 Sep 2022 • Sheng Xu, Yanjing Li, Tiancheng Wang, Teli Ma, Baochang Zhang, Peng Gao, Yu Qiao, Jinhu Lv, Guodong Guo
To address this issue, Recurrent Bilinear Optimization is proposed to improve the learning process of BNNs (RBONNs) by associating the intrinsic bilinear variables in the back propagation process.
1 code implementation • 27 Aug 2022 • Runqi Wang, Yuxiang Bao, Baochang Zhang, Jianzhuang Liu, Wentao Zhu, Guodong Guo
Second, according to the similarity between incremental knowledge and base knowledge, we design an adaptive fusion of incremental knowledge, which helps the model allocate capacity to the knowledge of different difficulties.
1 code implementation • 29 May 2022 • Shangkun Sun, Yuanqi Chen, Yu Zhu, Guodong Guo, Ge Li
In this paper, we propose the Super Kernel Flow Network (SKFlow), a CNN architecture to ameliorate the impacts of occlusions on optical flow estimation.
no code implementations • 28 Apr 2022 • Jianrong Zhang, Tianyi Wu, Chuanghao Ding, Hongwei Zhao, Guodong Guo
Specifically, we first propose a Region Mask Contrastive (RMC) loss and a Region Feature Contrastive (RFC) loss to accomplish region-level contrastive property.
no code implementations • 27 Apr 2022 • Shan Zhang, Tianyi Wu, Sitong Wu, Guodong Guo
In this work, we effectively integrate the context and affinity information via the proposed novel Context and Affinity Transformer (CATrans) in a hierarchical architecture.
1 code implementation • CVPR 2022 • Jun Li, Zichang Tan, Jun Wan, Zhen Lei, Guodong Guo
NCL consists of two core components, namely Nested Individual Learning (NIL) and Nested Balanced Online Distillation (NBOD), which focus on the individual supervised learning for each single expert and the knowledge transferring among multiple experts, respectively.
Ranked #7 on
Long-tail Learning
on CIFAR-10-LT (ρ=50)
no code implementations • 26 Mar 2022 • Fangjian Lin, Tianyi Wu, Sitong Wu, Shengwei Tian, Guodong Guo
In this work, we focus on fusing multi-scale features from Transformer-based backbones for semantic segmentation, and propose a Feature Selective Transformer (FeSeFormer), which aggregates features from all scales (or levels) for each query feature.
no code implementations • CVPR 2022 • Ge Kan, Jinhu Lü, Tian Wang, Baochang Zhang, Aichun Zhu, Lei Huang, Guodong Guo, Hichem Snoussi
In this paper, we propose Bi-level doubly variational learning (BiDVL), which is based on a new bi-level optimization framework and two tractable variational distributions to facilitate learning EBLVMs.
1 code implementation • 24 Mar 2022 • Fanglei Xue, Zichang Tan, Yu Zhu, Zhongsong Ma, Guodong Guo
To be specific, the universal features denote the general characteristic of facial emotions within a period and the unique features denote the specific characteristic at this moment.
Facial Expression Recognition
Facial Expression Recognition (FER)
no code implementations • CVPR 2022 • Danyang Tu, Xiongkuo Min, Huiyu Duan, Guodong Guo, Guangtao Zhai, Wei Shen
In contrast, we redefine the HGT detection task as detecting human head locations and their gaze targets, simultaneously.
no code implementations • 20 Mar 2022 • Danyang Tu, Xiongkuo Min, Huiyu Duan, Guodong Guo, Guangtao Zhai, Wei Shen
Iwin Transformer is a hierarchical Transformer which progressively performs token representation learning and token agglomeration within irregular windows.
no code implementations • 17 Mar 2022 • Runqi Wang, Linlin Yang, Baochang Zhang, Wentao Zhu, David Doermann, Guodong Guo
Research on the generalization ability of deep neural networks (DNNs) has recently attracted a great deal of attention.
2 code implementations • 9 Mar 2022 • He Wang, Yunfeng Diao, Zichang Tan, Guodong Guo
Our method is featured by full Bayesian treatments of the clean data, the adversaries and the classifier, leading to (1) a new Bayesian Energy-based formulation of robust discriminative classifiers, (2) a new adversary sampling scheme based on natural motion manifolds, and (3) a new post-train Bayesian strategy for black-box defense.
no code implementations • 8 Mar 2022 • Kai Liu, Tianyi Wu, Cong Liu, Guodong Guo
To reduce the quadratic computation complexity caused by each query attending to all keys/values, various methods have constrained the range of attention within local regions, where each query only attends to keys/values within a hand-crafted window.
no code implementations • 28 Dec 2021 • Runqi Wang, Xiaoyue Duan, Baochang Zhang, Song Xue, Wentao Zhu, David Doermann, Guodong Guo
We show that our method improves the recognition accuracy of adversarial training on ImageNet by 8. 32% compared with the baseline.
2 code implementations • 28 Dec 2021 • Sitong Wu, Tianyi Wu, Haoru Tan, Guodong Guo
To reduce the quadratic computation complexity caused by the global self-attention, various methods constrain the range of attention within a local region to improve its efficiency.
no code implementations • 26 Nov 2021 • Sheng Xu, Yanjing Li, Junhe Zhao, Baochang Zhang, Guodong Guo
Real-time point cloud processing is fundamental for lots of computer vision tasks, while still challenged by the computational problem on resource-limited edge devices.
no code implementations • 25 Oct 2021 • Zenghao Bao, Zichang Tan, Yu Zhu, Jun Wan, Xibo Ma, Zhen Lei, Guodong Guo
To improve the performance of facial age estimation, we first formulate a simple standard baseline and build a much strong one by collecting the tricks in pre-training, data augmentation, model architecture, and so on.
no code implementations • 1 Sep 2021 • Ruiqi Zhao, Tianyi Wu, Guodong Guo
Given a source face image and a sequence of sparse face landmarks, our goal is to generate a video of the face imitating the motion of landmarks.
no code implementations • ICCV 2021 • Fanglei Xue, Qiangchang Wang, Guodong Guo
Second, to build rich relations between different local patches, the Vision Transformers (ViT) are used in FER, called ViT-FER.
Facial Expression Recognition
Facial Expression Recognition (FER)
+2
no code implementations • 23 Aug 2021 • Jian Zhao, Gang Wang, Jianan Li, Lei Jin, Nana Fan, Min Wang, Xiaojuan Wang, Ting Yong, Yafeng Deng, Yandong Guo, Shiming Ge, Guodong Guo
The 2nd Anti-UAV Workshop \& Challenge aims to encourage research in developing novel and accurate methods for multi-scale object tracking.
no code implementations • 16 Aug 2021 • Ajian Liu, Chenxu Zhao, Zitong Yu, Anyang Su, Xing Liu, Zijian Kong, Jun Wan, Sergio Escalera, Hugo Jair Escalante, Zhen Lei, Guodong Guo
The threat of 3D masks to face recognition systems is increasingly serious and has been widely concerned by researchers.
1 code implementation • ICCV 2021 • Yuan Tian, Guo Lu, Xiongkuo Min, Zhaohui Che, Guangtao Zhai, Guodong Guo, Zhiyong Gao
After optimization, the downscaled video by our framework preserves more meaningful information, which is beneficial for both the upscaling step and the downstream tasks, e. g., video action recognition task.
1 code implementation • 22 Jul 2021 • Yuan Tian, Yichao Yan, Guangtao Zhai, Guodong Guo, Zhiyong Gao
In this paper, we propose a unified action recognition framework to investigate the dynamic nature of video content by introducing the following designs.
Ranked #15 on
Action Recognition
on Something-Something V1
no code implementations • CVPR 2022 • Haitao Lin, Zichang Liu, Chilam Cheang, Yanwei Fu, Guodong Guo, xiangyang xue
The concatenation of the observed point cloud and symmetric one reconstructs a coarse object shape, thus facilitating object center (3D translation) and 3D size estimation.
1 code implementation • 8 Jun 2021 • Sitong Wu, Tianyi Wu, Fangjian Lin, Shengwei Tian, Guodong Guo
Transformers have shown impressive performance in various natural language processing and computer vision tasks, due to the capability of modeling long-range dependencies.
no code implementations • 31 May 2021 • Xiaoguang Tu, Yingtian Zou, Jian Zhao, Wenjie Ai, Jian Dong, Yuan YAO, Zhikang Wang, Guodong Guo, Zhifeng Li, Wei Liu, Jiashi Feng
Video generation from a single face image is an interesting problem and usually tackled by utilizing Generative Adversarial Networks (GANs) to integrate information from the input face image and a sequence of sparse facial landmarks.
no code implementations • 12 May 2021 • Xiaoguang Tu, Jian Zhao, Qiankun Liu, Wenjie Ai, Guodong Guo, Zhifeng Li, Wei Liu, Jiashi Feng
First, MDFR is a well-designed encoder-decoder architecture which extracts feature representation from an input face image with arbitrary low-quality factors and restores it to a high-quality counterpart.
no code implementations • 13 Apr 2021 • Ajian Liu, Chenxu Zhao, Zitong Yu, Jun Wan, Anyang Su, Xing Liu, Zichang Tan, Sergio Escalera, Junliang Xing, Yanyan Liang, Guodong Guo, Zhen Lei, Stan Z. Li, Du Zhang
To bridge the gap to real-world applications, we introduce a largescale High-Fidelity Mask dataset, namely CASIA-SURF HiFiMask (briefly HiFiMask).
1 code implementation • CVPR 2021 • Li Wang, Liang Du, Xiaoqing Ye, Yanwei Fu, Guodong Guo, xiangyang xue, Jianfeng Feng, Li Zhang
The objective of this paper is to learn context- and depth-aware feature representation to solve the problem of monocular 3D object detection.
Ranked #14 on
Monocular 3D Object Detection
on KITTI Cars Moderate
1 code implementation • 21 Jan 2021 • Nan Jiang, Kuiran Wang, Xiaoke Peng, Xuehui Yu, Qiang Wang, Junliang Xing, Guorong Li, Jian Zhao, Guodong Guo, Zhenjun Han
The releasing of such a large-scale dataset could be a useful initial step in research of tracking UAVs.
no code implementations • ICCV 2021 • Yunhao Li, Wei Shen, Zhongpai Gao, Yucheng Zhu, Guangtao Zhai, Guodong Guo
Specifically, the local region is obtained as a 2D cone-shaped field along the 2D projection of the sight line starting at the human subject's head position, and the distant region is obtained by searching along the sight line in 3D sphere space.
no code implementations • ICCV 2021 • Song Xue, Runqi Wang, Baochang Zhang, Tian Wang, Guodong Guo, David Doermann
Differentiable Architecture Search (DARTS) improves the efficiency of architecture search by learning the architecture and network parameters end-to-end.
no code implementations • International Journal of Computer Vision 2020 • Yunan Li, Jun Wan, Qiguang Miao, Sergio Escalera, Huijuan Fang, Huizhou Chen, Xiangda Qi, Guodong Guo
First impressions strongly influence social interactions, having a high impact in the personal and professional life.
1 code implementation • ECCV 2020 • Tianyi Wu, Yu Lu, Yu Zhu, Chuang Zhang, Ming Wu, Zhanyu Ma, Guodong Guo
GI unit is further improved by the SC-loss to enhance the semantic representations over the exemplar-based semantic graph.
no code implementations • 8 Sep 2020 • Hanlin Chen, Li'an Zhuo, Baochang Zhang, Xiawu Zheng, Jianzhuang Liu, Rongrong Ji, David Doermann, Guodong Guo
In this paper, binarized neural architecture search (BNAS), with a search space of binarized convolutions, is introduced to produce extremely compressed models to reduce huge computational cost on embedded devices for edge computing.
1 code implementation • 23 Jun 2020 • Mingyuan Mao, Yuxin Tian, Baochang Zhang, Qixiang Ye, Wanquan Liu, Guodong Guo, David Doermann
In this paper, we propose a new feature optimization approach to enhance features and suppress background noise in both the training and inference stages.
no code implementations • 22 Jun 2020 • Fangrui Zhu, Li Zhang, Yanwei Fu, Guodong Guo, Weidi Xie
The objective of this paper is self-supervised representation learning, with the goal of solving semi-supervised video object segmentation (a. k. a.
no code implementations • CVPR 2020 • Li'an Zhuo, Baochang Zhang, Linlin Yang, Hanlin Chen, Qixiang Ye, David Doermann, Guodong Guo, Rongrong Ji
Conventional learning methods simplify the bilinear model by regarding two intrinsically coupled factors independently, which degrades the optimization procedure.
no code implementations • 12 May 2020 • Shan Jia, Xin Li, Chuanbo Hu, Guodong Guo, Zhengquan Xu
We have witnessed rapid advances in both face presentation attack models and presentation attack detection (PAD) in recent years.
no code implementations • 23 Apr 2020 • Ajian Liu, Xuan Li, Jun Wan, Sergio Escalera, Hugo Jair Escalante, Meysam Madadi, Yi Jin, Zhuoyuan Wu, Xiaogang Yu, Zichang Tan, Qi Yuan, Ruikun Yang, Benjia Zhou, Guodong Guo, Stan Z. Li
Although ethnic bias has been verified to severely affect the performance of face recognition systems, it still remains an open research problem in face anti-spoofing.
no code implementations • 11 Mar 2020 • Ajian Li, Zichang Tan, Xuan Li, Jun Wan, Sergio Escalera, Guodong Guo, Stan Z. Li
Ethnic bias has proven to negatively affect the performance of face recognition systems, and it remains an open research problem in face anti-spoofing.
no code implementations • 5 Dec 2019 • Ajian Liu, Zichang Tan, Xuan Li, Jun Wan, Sergio Escalera, Guodong Guo, Stan Z. Li
Regardless of the usage of deep learning and handcrafted methods, the dynamic information from videos and the effect of cross-ethnicity are rarely considered in face anti-spoofing.
no code implementations • 3 Dec 2019 • Jun Jia, Zhongpai Gao, Kang Chen, Menghan Hu, Guangtao Zhai, Guodong Guo, Xiaokang Yang
To train a robust decoder against the physical distortion from the real world, a distortion network based on 3D rendering is inserted between the encoder and the decoder to simulate the camera imaging process.
no code implementations • 25 Nov 2019 • Chunlei Liu, Wenrui Ding, Yuan Hu, Baochang Zhang, Jianzhuang Liu, Guodong Guo
The BGA method is proposed to modify the binary process of GBCNs to alleviate the local minima problem, which can significantly improve the performance of 1-bit DCNNs.
no code implementations • 24 Oct 2019 • Chunlei Liu, Wenrui Ding, Jinyu Yang, Vittorio Murino, Baochang Zhang, Jungong Han, Guodong Guo
In this paper, we propose a novel aggregation signature suitable for small object tracking, especially aiming for the challenge of sudden and large drift.
no code implementations • 22 Oct 2019 • Mohammad Iqbal Nouyed, Guodong Guo
In this paper, we perform a comparative performance analysis of some of the well known face detection methods including the few used in that competition, and, compare them to our proposed body pose based face detection method.
no code implementations • 25 Sep 2019 • Defa Zhu, Si Liu, Wentao Jiang, Guanbin Li, Tianyi Wu, Guodong Guo
Visual relationship recognition models are limited in the ability to generalize from finite seen predicates to unseen ones.
no code implementations • 25 Sep 2019 • Shifeng Zhang, Yiliang Xie, Jun Wan, Hansheng Xia, Stan Z. Li, Guodong Guo
To narrow this gap and facilitate future pedestrian detection research, we introduce a large and diverse dataset named WiderPerson for dense pedestrian detection in the wild.
Ranked #3 on
Object Detection
on WiderPerson
(mMR metric)
no code implementations • Image and Vision Computing 2019 • Min Jiang, Yuanyuan Shang, Guodong Guo
Various facial representations, including geometry based representations and deep learning based, are comprehensively evaluated and analyzed from three perspectives: the overall performance on visual BMI prediction, the redundancy in facial representations and the sensitivity to head pose changes.
no code implementations • 21 Aug 2019 • Chunlei Liu, Wenrui Ding, Xin Xia, Yuan Hu, Baochang Zhang, Jianzhuang Liu, Bohan Zhuang, Guodong Guo
Binarized convolutional neural networks (BCNNs) are widely used to improve memory and computation efficiency of deep convolutional neural networks (DCNNs) for mobile and AI chips based applications.
no code implementations • ICCV 2019 • Jiaxin Gu, Junhe Zhao, Xiao-Long Jiang, Baochang Zhang, Jianzhuang Liu, Guodong Guo, Rongrong Ji
Deep convolutional neural networks (DCNNs) have dominated the recent developments in computer vision through making various record-breaking models.
no code implementations • 29 Jul 2019 • Jun Wan, Chi Lin, Longyin Wen, Yunan Li, Qiguang Miao, Sergio Escalera, Gholamreza Anbarjafari, Isabelle Guyon, Guodong Guo, Stan Z. Li
The ChaLearn large-scale gesture recognition challenge has been run twice in two workshops in conjunction with the International Conference on Pattern Recognition (ICPR) 2016 and International Conference on Computer Vision (ICCV) 2017, attracting more than $200$ teams round the world.
no code implementations • 29 Jul 2019 • Tianyi Wu, Sheng Tang, Rui Zhang, Guodong Guo, Yongdong Zhang
However, classification networks are dominated by the discriminative portion, so directly applying classification networks to scene parsing will result in inconsistent parsing predictions within one instance and among instances of the same category.
no code implementations • 26 Jul 2019 • Defa Zhu, Si Liu, Wentao Jiang, Chen Gao, Tianyi Wu, Qaingchang Wang, Guodong Guo
To address this issue, we propose a method called Untraceable GAN, which has a novel source classifier to differentiate which domain an image is translated from, and determines whether the translated image still retains the characteristics of the source domain.
no code implementations • 6 Jun 2019 • Shan Jia, Chuanbo Hu, Guodong Guo, Zhengquan Xu
Compared to 2D face presentation attacks (e. g. printed photos and video replays), 3D type attacks are more challenging to face recognition systems (FRS) by presenting 3D characteristics or materials similar to real faces.
no code implementations • 31 May 2019 • Mingbao Lin, Rongrong Ji, Shen Chen, Feng Zheng, Xiaoshuai Sun, Baochang Zhang, Liujuan Cao, Guodong Guo, Feiyue Huang
In this paper, we propose to model the similarity distributions between the input data and the hashing codes, upon which a novel supervised online hashing method, dubbed as Similarity Distribution based Online Hashing (SDOH), is proposed, to keep the intrinsic semantic relationship in the produced Hamming space.
1 code implementation • 16 May 2019 • Zhaohui Che, Ali Borji, Guangtao Zhai, Xiongkuo Min, Guodong Guo, Patrick Le Callet
Data size is the bottleneck for developing deep saliency models, because collecting eye-movement data is very time consuming and expensive.
no code implementations • 2 Apr 2019 • Zhaohui Che, Ali Borji, Guangtao Zhai, Suiyi Ling, Guodong Guo, Patrick Le Callet
The proposed attack only requires a part of the model information, and is able to generate a sparser and more insidious adversarial perturbation, compared to traditional image-space attacks.
1 code implementation • 11 Dec 2018 • Peng Lu, Gao Huang, Hangyu Lin, Wenming Yang, Guodong Guo, Yanwei Fu
This paper proposes a novel approach for Sketch-Based Image Retrieval (SBIR), for which the key is to bridge the gap between sketches and photos in terms of the data representation.
no code implementations • 23 May 2018 • Xudong Liu, Guodong Guo
To address this question, we deploy deep training for facial attributes prediction, and we explore the inconsistency issue among the attributes computed from each single image.
no code implementations • 28 Nov 2017 • Qiangchang Wang, Guodong Guo, Mohammad Iqbal Nouyed
We propose a new deep network structure for unconstrained face recognition.
no code implementations • 8 Aug 2017 • Manuel Günther, Peiyun Hu, Christian Herrmann, Chi Ho Chan, Min Jiang, Shufan Yang, Akshay Raj Dhamija, Deva Ramanan, Jürgen Beyerer, Josef Kittler, Mohamad Al Jazaery, Mohammad Iqbal Nouyed, Guodong Guo, Cezary Stankiewicz, Terrance E. Boult
Face detection and recognition benchmarks have shifted toward more difficult environments.
no code implementations • ACM Transactions on Intelligent Systems and Technology (TIST) - Special Section on Visual Understanding with RGB-D Sensors 2015 • Yu Zhu, Wenbin Chen, Guodong Guo
The experiments are conducted on four challenging depth action databases, in order to evaluate and find the best fusion methods generally.
no code implementations • CVPR 2014 • Guodong Guo, Chao Zhang
Further, we study the amount of data needed in the target population to learn a cross-population age estimator.