TDViT: Temporal Dilated Video Transformer for Dense Video Tasks

1 code implementation14 Feb 2024 Guanxiong Sun, Yang Hua, Guosheng Hu, Neil Robertson

Deep video models, for example, 3D CNNs or video transformers, have achieved promising performance on sparse video tasks, i. e., predicting one result per video.

Efficient One-stage Video Object Detection by Exploiting Temporal Consistency

1 code implementation14 Feb 2024 Guanxiong Sun, Yang Hua, Guosheng Hu, Neil Robertson

Based on the analysis, we present a simple yet efficient framework to address the computational bottlenecks and achieve efficient one-stage VOD by exploiting the temporal consistency in video frames.

GPT4Battery: An LLM-driven Framework for Adaptive State of Health Estimation of Raw Li-ion Batteries

no code implementations30 Jan 2024 Yuyuan Feng, Guosheng Hu, Zhihong Zhang

State of health (SOH) is a crucial indicator for assessing the degradation level of batteries that cannot be measured directly but requires estimation.

MAMBA: Multi-level Aggregation via Memory Bank for Video Object Detection

1 code implementation18 Jan 2024 Guanxiong Sun, Yang Hua, Guosheng Hu, Neil Robertson

However, we argue that these memory structures are not efficient or sufficient because of two implied operations: (1) concatenating all features in memory for enhancement, leading to a heavy computational cost; (2) frame-wise memory updating, preventing the memory from capturing more temporal information.

Explainability of Speech Recognition Transformers via Gradient-based Attention Visualization

1 code implementation IEEE Transactions on Multimedia 2023 Tianli Sun, Haonan Chen, Guosheng Hu, Lianghua He, Cairong Zhao

In addition, we demonstrate the utilization of visualization result in three ways: (1) We visualize attention with respect to connectionist temporal classification (CTC) loss to train an ASR model with adversarial attention erasing regularization, which effectively decreases the word error rate (WER) of the model and improves its generalization capability.

ISTVT: Interpretable Spatial-Temporal Video Transformer for Deepfake Detection

1 code implementation IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 2023 Cairong Zhao, Chutian Wang, Guosheng Hu, Haonan Chen, Chun Liu, Jinhui Tang

To address these two challenges, in this paper, we propose an Interpretable Spatial-Temporal Video Transformer (ISTVT), which consists of a novel decomposed spatial-temporal self-attention and a self-subtract mechanism to capture spatial artifacts and temporal inconsistency for robust Deepfake detection.

SGLoc: Scene Geometry Encoding for Outdoor LiDAR Localization

no code implementations CVPR 2023 Wen Li, Shangshu Yu, Cheng Wang, Guosheng Hu, Siqi Shen, Chenglu Wen

In this work, we propose a novel LiDAR localization framework, SGLoc, which decouples the pose estimation to point cloud correspondence regression and pose estimation via this correspondence.

Conv-Adapter: Exploring Parameter Efficient Transfer Learning for ConvNets

no code implementations15 Aug 2022 Hao Chen, Ran Tao, Han Zhang, Yidong Wang, Wei Ye, Jindong Wang, Guosheng Hu, Marios Savvides

Beyond classification, Conv-Adapter can generalize to detection and segmentation tasks with more than 50% reduction of parameters but comparable performance to the traditional full fine-tuning.

Boosting Active Learning via Improving Test Performance

1 code implementation10 Dec 2021 Tianyang Wang, Xingjian Li, Pengkun Yang, Guosheng Hu, Xiangrui Zeng, Siyu Huang, Cheng-Zhong Xu, Min Xu

In this work, we explore such an impact by theoretically proving that selecting unlabeled data of higher gradient norm leads to a lower upper-bound of test loss, resulting in better test performance.

DPT: Deformable Patch-based Transformer for Visual Recognition

1 code implementation30 Jul 2021 Zhiyang Chen, Yousong Zhu, Chaoyang Zhao, Guosheng Hu, Wei Zeng, Jinqiao Wang, Ming Tang

To address this problem, we propose a new Deformable Patch (DePatch) module which learns to adaptively split the images into patches with different positions and scales in a data-driven way rather than using predefined fixed patches.

OPANAS: One-Shot Path Aggregation Network Architecture Search for Object Detection

1 code implementation CVPR 2021 TingTing Liang, Yongtao Wang, Zhi Tang, Guosheng Hu, Haibin Ling

Encouraged by the success, we propose a novel One-Shot Path Aggregation Network Architecture Search (OPANAS) algorithm, which significantly improves both searching efficiency and detection accuracy.

Imbalance Robust Softmax for Deep Embeeding Learning

no code implementations23 Nov 2020 Hao Zhu, Yang Yuan, Guosheng Hu, Xiang Wu, Neil Robertson

IR-Softmax can generalise to any softmax and its variants (which are discriminative for open-set problem) by directly setting the weights as their class centers, naturally solving the data imbalance problem.

Learning Flow-based Feature Warping for Face Frontalization with Illumination Inconsistent Supervision

1 code implementation ECCV 2020 Yuxiang Wei, Ming Liu, Haolin Wang, Ruifeng Zhu, Guosheng Hu, WangMeng Zuo

Despite recent advances in deep learning-based face frontalization methods, photo-realistic and illumination preserving frontal face synthesis is still challenging due to large pose and illumination discrepancy during training.

Salvage Reusable Samples from Noisy Data for Robust Learning

1 code implementation6 Aug 2020 Zeren Sun, Xian-Sheng Hua, Yazhou Yao, Xiu-Shen Wei, Guosheng Hu, Jian Zhang

To this end, we propose a certainty-based reusable sample selection and correction approach, termed as CRSSC, for coping with label noise in training deep FG models with web images.


MetaMixUp: Learning Adaptive Interpolation Policy of MixUp with Meta-Learning

no code implementations27 Aug 2019 Zhijun Mai, Guosheng Hu, Dexiong Chen, Fumin Shen, Heng Tao Shen

Since deep networks are capable of memorizing the entire dataset, the corrupted samples generated by vanilla MixUp with a badly chosen interpolation policy will degrade the performance of networks.

Learning Symmetry Consistent Deep CNNs for Face Completion

1 code implementation19 Dec 2018 Xiaoming Li, Ming Liu, Jieru Zhu, WangMeng Zuo, Meng Wang, Guosheng Hu, Lei Zhang

As for missing pixels on both of half-faces, we present a generative reconstruction subnet together with a perceptual symmetry loss to enforce symmetry consistency of recovered structures.

Deep Metric Learning by Online Soft Mining and Class-Aware Attention

3 code implementations4 Nov 2018 Xinshao Wang, Yang Hua, Elyor Kodirov, Guosheng Hu, Neil M. Robertson

Therefore, we propose a novel sample mining method, called Online Soft Mining (OSM), which assigns one continuous score to each sample to make use of all samples in the mini-batch.

Dictionary Integration using 3D Morphable Face Models for Pose-invariant Collaborative-representation-based Classification

no code implementations1 Nov 2016 Xiaoning Song, Zhen-Hua Feng, Guosheng Hu, Josef Kittler, William Christmas, Xiao-Jun Wu

The paper presents a dictionary integration algorithm using 3D morphable face models (3DMM) for pose-invariant collaborative-representation-based face classification.

A Multiresolution 3D Morphable Face Model and Fitting Framework

1 code implementation1 Feb 2016 Patrik Huber, Guosheng Hu, Rafael Tena, Pouria Mortazavian, Willem P. Koppen, William Christmas, Matthias Rätsch, Josef Kittler

In this paper, we present the Surrey Face Model, a multi-resolution 3D Morphable Model that we make available to the public for non-commercial purposes.

Identifying Similar Patients Using Self-Organising Maps: A Case Study on Type-1 Diabetes Self-care Survey Responses

no code implementations21 Mar 2015 Santosh Tirunagari, Norman Poh, Guosheng Hu, David Windridge

Diabetes is considered a lifestyle disease and a well managed self-care plays an important role in the treatment.

