Search Results for author: Xiatian Zhu

Found 139 papers, 68 papers with code

Robust Low-Light Human Pose Estimation through Illumination-Texture Modulation

no code implementations14 Jan 2025 Feng Zhang, Ze Li, Xiatian Zhu, Lei Chen

As critical visual details become obscured, the low visibility and high ISO noise in extremely low-light images pose a significant challenge to human pose estimation.

Denoising Pose Estimation

4D Gaussian Splatting: Modeling Dynamic Scenes with Native 4D Primitives

no code implementations30 Dec 2024 Zeyu Yang, Zijie Pan, Xiatian Zhu, Li Zhang, Yu-Gang Jiang, Philip H. S. Torr

Dynamic 3D scene representation and novel view synthesis from captured videos are crucial for enabling immersive experiences required by AR/VR and metaverse applications.

Novel View Synthesis Scene Understanding

Reflective Gaussian Splatting

no code implementations26 Dec 2024 Yuxuan Yao, Zixuan Zeng, Chun Gu, Xiatian Zhu, Li Zhang

Novel view synthesis has experienced significant advancements owing to increasingly capable NeRF- and 3DGS-based methods.

Novel View Synthesis Object Reconstruction

Is Foreground Prototype Sufficient? Few-Shot Medical Image Segmentation with Background-Fused Prototype

no code implementations4 Dec 2024 Song Tang, Chunxiao Zu, Wenxin Su, Yuan Dong, Mao Ye, Yan Gan, Xiatian Zhu

However, this paradigm is not applicable to medical images where the foreground and background share numerous visual features, necessitating a more detailed description for background.

Few-Shot Semantic Segmentation Image Segmentation +2

Driving Scene Synthesis on Free-form Trajectories with Generative Prior

no code implementations2 Dec 2024 Zeyu Yang, Zijie Pan, Yuankun Yang, Xiatian Zhu, Li Zhang

To tackle this challenge, we propose a novel free-form driving view synthesis approach, dubbed DriveX, by leveraging video generative prior to optimize a 3D model across a variety of trajectories.

Novel View Synthesis

Domain Adaptive Diabetic Retinopathy Grading with Model Absence and Flowing Data

no code implementations2 Dec 2024 Wenxin Su, Song Tang, Xiaofeng Liu, Xiaojing Yi, Mao Ye, Chunxiao Zu, Jiahao Li, Xiatian Zhu

Specifically, we first theoretically reformulate conventional perturbation optimization in a generative way--learning a perturbation generation function with a latent input variable.

Diabetic Retinopathy Grading Domain Adaptation

KANs for Computer Vision: An Experimental Study

no code implementations27 Nov 2024 Karthik Mohan, Hanxiao Wang, Xiatian Zhu

This paper presents an experimental study of Kolmogorov-Arnold Networks (KANs) applied to computer vision tasks, particularly image classification.

Image Classification Kolmogorov-Arnold Networks

FAM Diffusion: Frequency and Attention Modulation for High-Resolution Image Generation with Stable Diffusion

no code implementations27 Nov 2024 Haosen Yang, Adrian Bulat, Isma Hadji, Hai X. Pham, Xiatian Zhu, Georgios Tzimiropoulos, Brais Martinez

We introduce a Frequency Modulation (FM) module that leverages the Fourier domain to improve the global structure consistency, and an Attention Modulation (AM) module which improves the consistency of local texture patterns, a problem largely ignored in prior works.

Image Generation

FEDLAD: Federated Evaluation of Deep Leakage Attacks and Defenses

no code implementations5 Nov 2024 Isaac Baglin, Xiatian Zhu, Simon Hadfield

This work highlights a crucial trade-off between privacy and model accuracy in Federated Learning and aims to advance the understanding of security challenges in decentralized machine learning systems, stimulate future research, and enhance reproducibility in evaluating Deep Leakage attacks and defenses.

Federated Learning Privacy Preserving

3D Audio-Visual Segmentation

no code implementations4 Nov 2024 Artem Sokolov, Swapnil Bhosale, Xiatian Zhu

Recognizing the sounding objects in scenes is a longstanding objective in embodied AI, with diverse applications in robotics and AR/VR/MR.

Segmentation

Motion Forecasting in Continuous Driving

2 code implementations8 Oct 2024 Nan Song, Bozhou Zhang, Xiatian Zhu, Li Zhang

Motion forecasting for agents in autonomous driving is highly challenging due to the numerous possibilities for each agent's next action and their complex interactions in space and time.

Motion Forecasting

Rethinking Weak-to-Strong Augmentation in Source-Free Domain Adaptive Object Detection

no code implementations7 Oct 2024 Jiuzheng Yang, Song Tang, Yangkuiyi Zhang, Shuaifeng Li, Mao Ye, Jianwei Zhang, Xiatian Zhu

The core idea is to distill semantics lossless knowledge in the weak features (from the weak/teacher branch) to guide the representation learning upon the strong features (from the strong/student branch).

Contrastive Learning object-detection +3

Single Image, Any Face: Generalisable 3D Face Generation

no code implementations25 Sep 2024 Wenqing Wang, Haosen Yang, Josef Kittler, Xiatian Zhu

To the best of our knowledge, this is the first attempt and benchmark for creating photorealistic 3D human face avatars from single images for generic human subject across domains.

Face Generation

FrozenSeg: Harmonizing Frozen Foundation Models for Open-Vocabulary Segmentation

no code implementations5 Sep 2024 Xi Chen, Haosen Yang, Sheng Jin, Xiatian Zhu, Hongxun Yao

To fully exploit pre-trained knowledge while minimizing training overhead, we freeze both foundation models, focusing optimization efforts solely on a lightweight transformer decoder for mask proposal generation-the performance bottleneck.

Decoder Segmentation

DeepInteraction++: Multi-Modality Interaction for Autonomous Driving

1 code implementation9 Aug 2024 Zeyu Yang, Nan Song, Wei Li, Xiatian Zhu, Li Zhang, Philip H. S. Torr

To demonstrate the effectiveness of the proposed strategy, we design DeepInteraction++, a multi-modal interaction framework characterized by a multi-modal representational interaction encoder and a multi-modal predictive interaction decoder.

3D Object Detection Autonomous Driving +3

Bayesian Detector Combination for Object Detection with Crowdsourced Annotations

1 code implementation10 Jul 2024 Zhi Qin Tan, Olga Isupova, Gustavo Carneiro, Xiatian Zhu, Yunpeng Li

Most prior object detection methods assume accurate annotations; A few recent works have studied object detection with noisy crowdsourced annotations, with evaluation on distinct synthetic crowdsourced datasets of varying setups under artificial assumptions.

Object object-detection +1

PartCraft: Crafting Creative Objects by Parts

1 code implementation5 Jul 2024 Kam Woh Ng, Xiatian Zhu, Yi-Zhe Song, Tao Xiang

This paper propels creative control in generative visual AI by allowing users to "select".

Few-Shot Medical Image Segmentation with High-Fidelity Prototypes

1 code implementation26 Jun 2024 Song Tang, Shaxu Yan, Xiaozhi Qi, Jianxin Gao, Mao Ye, Jianwei Zhang, Xiatian Zhu

Few-shot Semantic Segmentation (FSS) aims to adapt a pretrained model to new classes with as few as a single labelled training sample per class.

Few-Shot Semantic Segmentation Image Segmentation +2

AV-GS: Learning Material and Geometry Aware Priors for Novel View Acoustic Synthesis

no code implementations13 Jun 2024 Swapnil Bhosale, Haosen Yang, Diptesh Kanojia, Jiankang Deng, Xiatian Zhu

To obtain a material-aware and geometry-aware condition for audio synthesis, we learn an explicit point-based scene representation with an audio-guidance parameter on locally initialized Gaussian points, taking into account the space relation from the listener and sound source.

Audio Synthesis

Gaussian Splatting with Localized Points Management

no code implementations6 Jun 2024 Haosen Yang, Chenhao Zhang, Wenqing Wang, Marco Volino, Adrian Hilton, Li Zhang, Xiatian Zhu

To address these limitations, we propose a Localized Point Management (LPM) strategy, capable of identifying those error-contributing zones in the highest demand for both point addition and geometry calibration.

Management

Tetrahedron Splatting for 3D Generation

1 code implementation3 Jun 2024 Chun Gu, Zeyu Yang, Zijie Pan, Xiatian Zhu, Li Zhang

Using a signed distance field and Marching Tetrahedra, DMTet allows for precise mesh extraction and real-time rendering but is limited in handling large topological changes in meshes, leading to optimization challenges.

3D Generation

Proxy Denoising for Source-Free Domain Adaptation

1 code implementation3 Jun 2024 Song Tang, Wenxin Su, Mao Ye, Jianwei Zhang, Xiatian Zhu

This is grounded on a novel proxy confidence theory by modeling elegantly the domain adaption effect of the proxy's divergence against the domain-invariant space.

Denoising Source-Free Domain Adaptation

Diffusion Deepfake

no code implementations2 Apr 2024 Chaitali Bhattacharyya, Hanxiao Wang, Feng Zhang, Sungho Kim, Xiatian Zhu

To address this critical issue, we investigate the impact of enhancing training data diversity on representative detection methods.

DeepFake Detection Diversity +2

Unsupervised Audio-Visual Segmentation with Modality Alignment

no code implementations21 Mar 2024 Swapnil Bhosale, Haosen Yang, Diptesh Kanojia, Jiangkang Deng, Xiatian Zhu

Audio-Visual Segmentation (AVS) aims to identify, at the pixel level, the object in a visual scene that produces a given sound.

Contrastive Learning

Uncertainty-Aware Pseudo-Label Filtering for Source-Free Unsupervised Domain Adaptation

1 code implementation17 Mar 2024 Xi Chen, Haosen Yang, Huicong Zhang, Hongxun Yao, Xiatian Zhu

Source-free unsupervised domain adaptation (SFUDA) aims to enable the utilization of a pre-trained source model in an unlabeled target domain without access to source data.

Contrastive Learning Memorization +3

Unified Source-Free Domain Adaptation

1 code implementation12 Mar 2024 Song Tang, Wenxin Su, Mao Ye, Jianwei Zhang, Xiatian Zhu

To tackle this unified SFDA problem, we propose a novel approach called Latent Causal Factors Discovery (LCFD).

Language Modelling Source-Free Domain Adaptation +1

Efficient4D: Fast Dynamic 3D Object Generation from a Single-view Video

1 code implementation16 Jan 2024 Zijie Pan, Zeyu Yang, Xiatian Zhu, Li Zhang

An intuitive approach is to extend previous image-to-3D pipelines by transferring off-the-shelf image generation models such as score distillation sampling. However, this approach would be slow and expensive to scale due to the need for back-propagating the information-limited supervision signals through a large pretrained model.

Image Generation Image to 3D +3

Vision-language Assisted Attribute Learning

no code implementations12 Dec 2023 Kongming Liang, Xinran Wang, Rui Wang, Donghui Gao, Ling Jin, Weidong Liu, Xiatian Zhu, Zhanyu Ma, Jun Guo

Attribute labeling at large scale is typically incomplete and partial, posing significant challenges to model optimization.

Attribute Language Modeling +3

Typhoon Intensity Prediction with Vision Transformer

1 code implementation28 Nov 2023 Huanxin Chen, Pengshuai Yin, Huichou Huang, Qingyao Wu, Ruirui Liu, Xiatian Zhu

Predicting typhoon intensity accurately across space and time is crucial for issuing timely disaster warnings and facilitating emergency response.

Representation Learning

DreamCreature: Crafting Photorealistic Virtual Creatures from Imagination

2 code implementations27 Nov 2023 Kam Woh Ng, Xiatian Zhu, Yi-Zhe Song, Tao Xiang

To bridge this gap, we introduce a novel task, Virtual Creatures Generation: Given a set of unlabeled images of the target concepts (e. g., 200 bird species), we aim to train a T2I model capable of creating new, hybrid concepts within diverse backgrounds and contexts.

Disentanglement Novel Concepts

Source-Free Domain Adaptation with Frozen Multimodal Foundation Model

1 code implementation CVPR 2024 Song Tang, Wenxin Su, Mao Ye, Xiatian Zhu

We find that directly applying the ViL model to the target domain in a zero-shot fashion is unsatisfactory, as it is not specialized for this particular task but largely generic.

Source-Free Domain Adaptation

Unified Domain Adaptive Semantic Segmentation

1 code implementation22 Nov 2023 Zhe Zhang, Gaochang Wu, Jing Zhang, Xiatian Zhu, DaCheng Tao, Tianyou Chai

Under this observation, we advocate unifying the study of UDA-SS across video and image scenarios, enabling a more comprehensive understanding, synergistic advancements, and efficient knowledge sharing.

Data Augmentation Optical Flow Estimation +5

Adaptive-Labeling for Enhancing Remote Sensing Cloud Understanding

no code implementations9 Nov 2023 Jay Gala, Sauradip Nag, Huichou Huang, Ruirui Liu, Xiatian Zhu

Cloud analysis is a critical component of weather and climate science, impacting various sectors like disaster management.

Management Segmentation

Optimization Efficient Open-World Visual Region Recognition

1 code implementation2 Nov 2023 Haosen Yang, Chuofan Ma, Bin Wen, Yi Jiang, Zehuan Yuan, Xiatian Zhu

Building on the success of powerful image-level vision-language (ViL) foundation models like CLIP, recent efforts have sought to harness their capabilities by either training a contrastive model from scratch with an extensive collection of region-label pairs or aligning the outputs of a detection model with image-level representations of region proposals.

object-detection Object Recognition +1

Enhancing High-Resolution 3D Generation through Pixel-wise Gradient Clipping

1 code implementation19 Oct 2023 Zijie Pan, Jiachen Lu, Xiatian Zhu, Li Zhang

In this framework, a significant challenge arises: To compute gradients for individual image pixels, it is necessary to backpropagate gradients from the designated latent space through the frozen components of the image model, such as the VAE encoder used within LDM.

3D Generation Transfer Learning

Sarcasm in Sight and Sound: Benchmarking and Expansion to Improve Multimodal Sarcasm Detection

no code implementations29 Sep 2023 Swapnil Bhosale, Abhra Chaudhuri, Alex Lee Robert Williams, Divyank Tiwari, Anjan Dutta, Xiatian Zhu, Pushpak Bhattacharyya, Diptesh Kanojia

The introduction of the MUStARD dataset, and its emotion recognition extension MUStARD++, have identified sarcasm to be a multi-modal phenomenon -- expressed not only in natural language text, but also through manners of speech (like tonality and intonation) and visual cues (facial expression).

Benchmarking Diversity +2

Leveraging Foundation models for Unsupervised Audio-Visual Segmentation

no code implementations13 Sep 2023 Swapnil Bhosale, Haosen Yang, Diptesh Kanojia, Xiatian Zhu

Particularly, in situations where existing supervised AVS methods struggle with overlapping foreground objects, our models still excel in accurately segmenting overlapped auditory objects.

Segmentation

Actor-agnostic Multi-label Action Recognition with Multi-modal Query

1 code implementation20 Jul 2023 Anindya Mondal, Sauradip Nag, Joaquin M Prada, Xiatian Zhu, Anjan Dutta

Existing action recognition methods are typically actor-specific due to the intrinsic topological and apparent differences among the actors.

Action Classification Action Recognition In Videos +3

Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models

no code implementations15 Jun 2023 Junting Pan, Ziyi Lin, Yuying Ge, Xiatian Zhu, Renrui Zhang, Yi Wang, Yu Qiao, Hongsheng Li

Video Question Answering (VideoQA) has been significantly advanced from the scaling of recent Large Language Models (LLMs).

Ranked #3 on Temporal/Casual QA on NExT-QA (using extra training data)

cross-modal alignment Domain Generalization +3

Independent Feature Decomposition and Instance Alignment for Unsupervised Domain Adaptation

1 code implementation IJCAI 2023 Qichen He, Siying Xiao, Mao Ye, Xiatian Zhu, Ferrante Neri and Dongde Hou

Existing Unsupervised Domain Adaptation (UDA) methods typically attempt to perform knowledge transfer in a domain-invariant space explicitly or implicitly.

Transfer Learning Unsupervised Domain Adaptation

DiffTAD: Temporal Action Detection with Proposal Denoising Diffusion

1 code implementation ICCV 2023 Sauradip Nag, Xiatian Zhu, Jiankang Deng, Yi-Zhe Song, Tao Xiang

Concretely, we establish the denoising process in the Transformer decoder (e. g., DETR) by introducing a temporal location query design with faster convergence in training.

Action Detection Decoder +1

Generative Semantic Segmentation

2 code implementations CVPR 2023 Jiaqi Chen, Jiachen Lu, Xiatian Zhu, Li Zhang

To that end, the segmentation mask is expressed with a special type of image (dubbed as maskige).

Segmentation Semantic Segmentation

PersonalTailor: Personalizing 2D Pattern Design from 3D Garment Point Clouds

no code implementations17 Mar 2023 Sauradip Nag, Anran Qi, Xiatian Zhu, Ariel Shamir

Garment pattern design aims to convert a 3D garment to the corresponding 2D panels and their sewing structure.

Decoder

FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks

1 code implementation CVPR 2023 Xiao Han, Xiatian Zhu, Licheng Yu, Li Zhang, Yi-Zhe Song, Tao Xiang

In the fashion domain, there exists a variety of vision-and-language (V+L) tasks, including cross-modal retrieval, text-guided image retrieval, multi-modal classification, and image captioning.

Cross-Modal Retrieval Image Captioning +5

Unsupervised Hashing with Similarity Distribution Calibration

1 code implementation15 Feb 2023 Kam Woh Ng, Xiatian Zhu, Jiun Tian Hoe, Chee Seng Chan, Tianyu Zhang, Yi-Zhe Song, Tao Xiang

However, these methods often overlook the fact that the similarity between data points in the continuous feature space may not be preserved in the discrete hash code space, due to the limited similarity range of hash codes.

Deep Hashing Image Retrieval

Preconditioned Score-based Generative Models

1 code implementation13 Feb 2023 Hengyuan Ma, Li Zhang, Xiatian Zhu, Jianfeng Feng

Compared with the latest generative models (\eg, CLD-SGM, DDIM, and Analytic-DDIM), PDS can achieve the best sampling quality on CIFAR-10 at a FID score of 1. 99.

Image Generation

Homeomorphism Alignment for Unsupervised Domain Adaptation

1 code implementation ICCV 2023 Lihua Zhou, Mao Ye, Xiatian Zhu, Siying Xiao, Xu-Qian Fan, Ferrante Neri

With distribution alignment, it is challenging to acquire a common space which maintains fully the discriminative structure of both domains.

Pseudo Label Self-Supervised Learning +1

Controllable Person Image Synthesis with Pose-Constrained Latent Diffusion

no code implementations ICCV 2023 Xiao Han, Xiatian Zhu, Jiankang Deng, Yi-Zhe Song, Tao Xiang

Controllable person image synthesis aims at rendering a source image based on user-specified changes in body pose or appearance.

Denoising Image Generation

Post-Processing Temporal Action Detection

1 code implementation CVPR 2023 Sauradip Nag, Xiatian Zhu, Yi-Zhe Song, Tao Xiang

To address this problem, in this work we introduce a novel model-agnostic post-processing method without model redesign and retraining.

Action Classification Action Detection +1

Multi-Modal Few-Shot Temporal Action Detection

1 code implementation27 Nov 2022 Sauradip Nag, Mengmeng Xu, Xiatian Zhu, Juan-Manuel Perez-Rua, Bernard Ghanem, Yi-Zhe Song, Tao Xiang

In this work, we introduce a new multi-modality few-shot (MMFS) TAD problem, which can be considered as a marriage of FS-TAD and ZS-TAD by leveraging few-shot support videos and new class names jointly.

Action Detection Few-Shot Object Detection +3

Self-supervised Video Representation Learning with Motion-Aware Masked Autoencoders

1 code implementation9 Oct 2022 Haosen Yang, Deng Huang, Bin Wen, Jiannan Wu, Hongxun Yao, Yi Jiang, Xiatian Zhu, Zehuan Yuan

As a result, our model can extract effectively both static appearance and dynamic motion spontaneously, leading to superior spatiotemporal representation learning capability.

Representation Learning Semantic Segmentation +2

Semi-Supervised and Unsupervised Deep Visual Learning: A Survey

no code implementations24 Aug 2022 Yanbei Chen, Massimiliano Mancini, Xiatian Zhu, Zeynep Akata

Semi-supervised learning and unsupervised learning offer promising paradigms to learn from an abundance of unlabeled visual data.

Survey

DeepInteraction: 3D Object Detection via Modality Interaction

2 code implementations23 Aug 2022 Zeyu Yang, Jiaqi Chen, Zhenwei Miao, Wei Li, Xiatian Zhu, Li Zhang

Existing top-performance 3D object detectors typically rely on the multi-modal fusion strategy.

3D Object Detection Decoder +3

Vision Transformers: From Semantic Segmentation to Dense Prediction

3 code implementations19 Jul 2022 Li Zhang, Jiachen Lu, Sixiao Zheng, Xinxuan Zhao, Xiatian Zhu, Yanwei Fu, Tao Xiang, Jianfeng Feng, Philip H. S. Torr

Extensive experiments show that our methods achieve appealing performance on a variety of dense prediction tasks (e. g., object detection and instance segmentation and semantic segmentation) as well as image classification.

Image Classification Instance Segmentation +5

FashionViL: Fashion-Focused Vision-and-Language Representation Learning

1 code implementation17 Jul 2022 Xiao Han, Licheng Yu, Xiatian Zhu, Li Zhang, Yi-Zhe Song, Tao Xiang

We thus propose a Multi-View Contrastive Learning task for pulling closer the visual representation of one image to the compositional multimodal representation of another image+text.

Contrastive Learning Image Retrieval +2

Zero-Shot Temporal Action Detection via Vision-Language Prompting

1 code implementation17 Jul 2022 Sauradip Nag, Xiatian Zhu, Yi-Zhe Song, Tao Xiang

Such a novel design effectively eliminates the dependence between localization and classification by breaking the route for error propagation in-between.

Action Detection Classification +3

Semi-Supervised Temporal Action Detection with Proposal-Free Masking

1 code implementation14 Jul 2022 Sauradip Nag, Xiatian Zhu, Yi-Zhe Song, Tao Xiang

Such a novel design effectively eliminates the dependence between localization and classification by cutting off the route for error propagation in-between.

Action Detection General Classification +1

Accelerating Score-based Generative Models with Preconditioned Diffusion Sampling

1 code implementation5 Jul 2022 Hengyuan Ma, Li Zhang, Xiatian Zhu, Jianfeng Feng

However, a fundamental limitation is that their inference is very slow due to a need for many (e. g., 2000) iterations of sequential computations.

Diversity Image Generation

Softmax-free Linear Transformers

1 code implementation5 Jul 2022 Jiachen Lu, Junge Zhang, Xiatian Zhu, Jianfeng Feng, Tao Xiang, Li Zhang

With linear complexity, much longer token sequences are permitted by SOFT, resulting in superior trade-off between accuracy and complexity.

Computational Efficiency

Multimodal Learning with Transformers: A Survey

no code implementations13 Jun 2022 Peng Xu, Xiatian Zhu, David A. Clifton

Transformer is a promising neural network learner, and has achieved great success in various machine learning tasks.

Survey

Accelerating Score-based Generative Models for High-Resolution Image Synthesis

no code implementations8 Jun 2022 Hengyuan Ma, Li Zhang, Xiatian Zhu, Jingfeng Zhang, Jianfeng Feng

To ensure stability of convergence in sampling and generation quality, however, this sequential sampling process has to take a small step size and many sampling iterations (e. g., 2000).

Image Generation Vocal Bursts Intensity Prediction

Learning Ego 3D Representation as Ray Tracing

1 code implementation8 Jun 2022 Jiachen Lu, Zheyuan Zhou, Xiatian Zhu, Hang Xu, Li Zhang

A self-driving perception model aims to extract 3D semantic representations from multiple cameras collectively into the bird's-eye-view (BEV) coordinate frame of the ego car in order to ground downstream planner.

3D Object Detection Computational Efficiency +4

Knowledge Distillation Meets Open-Set Semi-Supervised Learning

1 code implementation13 May 2022 Jing Yang, Xiatian Zhu, Adrian Bulat, Brais Martinez, Georgios Tzimiropoulos

The key idea is that we leverage the teacher's classifier as a semantic critic for evaluating the representations of both teacher and student and distilling the semantic knowledge with high-order structured information over all feature dimensions.

Face Recognition Knowledge Distillation

EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision Transformers

1 code implementation6 May 2022 Junting Pan, Adrian Bulat, Fuwen Tan, Xiatian Zhu, Lukasz Dudziak, Hongsheng Li, Georgios Tzimiropoulos, Brais Martinez

In this work, pushing further along this under-studied direction we introduce EdgeViTs, a new family of light-weight ViTs that, for the first time, enable attention-based vision models to compete with the best light-weight CNNs in the tradeoff between accuracy and on-device efficiency.

SOS! Self-supervised Learning Over Sets Of Handled Objects In Egocentric Action Recognition

no code implementations10 Apr 2022 Victor Escorcia, Ricardo Guerrero, Xiatian Zhu, Brais Martinez

To overcome both limitations, we introduce Self-Supervised Learning Over Sets (SOS), an approach to pre-train a generic Objects In Contact (OIC) representation model from video object regions detected by an off-the-shelf hand-object contact detector.

Action Recognition Object +2

Unsupervised Long-Term Person Re-Identification with Clothes Change

no code implementations7 Feb 2022 Mingkun Li, Shupeng Cheng, Peng Xu, Xiatian Zhu, Chun-Guang Li, Jun Guo

We investigate unsupervised person re-identification (Re-ID) with clothes change, a new challenging problem with more practical usability and scalability to real-world deployment.

Clustering Unsupervised Long Term Person Re-Identification +2

Source-Free Object Detection by Learning To Overlook Domain Style

1 code implementation CVPR 2022 Shuaifeng Li, Mao Ye, Xiatian Zhu, Lihua Zhou, Lin Xiong

This approach suffers from both unsatisfactory accuracy of pseudo labels due to the presence of domain shift and limited use of target domain training data.

object-detection Object Detection

SOFT: Softmax-free Transformer with Linear Complexity

2 code implementations NeurIPS 2021 Jiachen Lu, Jinghan Yao, Junge Zhang, Xiatian Zhu, Hang Xu, Weiguo Gao, Chunjing Xu, Tao Xiang, Li Zhang

Crucially, with a linear complexity, much longer token sequences are permitted in SOFT, resulting in superior trade-off between accuracy and complexity.

Computational Efficiency

Few-Shot Temporal Action Localization with Query Adaptive Transformer

1 code implementation20 Oct 2021 Sauradip Nag, Xiatian Zhu, Tao Xiang

Further, a novel FS-TAL model is proposed which maximizes the knowledge transfer from training classes whilst enabling the model to be dynamically adapted to both the new class and each video of that class simultaneously.

Action Segmentation Few Shot Temporal Action Localization +4

Temporal Action Localization with Global Segmentation Mask Transformers

no code implementations29 Sep 2021 Sauradip Nag, Xiatian Zhu, Yi-Zhe Song, Tao Xiang

In this paper, to address the above two challenges, a novel {\em Global Segmentation Mask Transformer} (GSMT) is proposed.

Object object-detection +2

Single Person Pose Estimation: A Survey

no code implementations21 Sep 2021 Feng Zhang, Xiatian Zhu, Chen Wang

Human pose estimation in unconstrained images and videos is a fundamental computer vision task.

Data Augmentation Pose Estimation +1

Low-resolution Human Pose Estimation

no code implementations19 Sep 2021 Chen Wang, Feng Zhang, Xiatian Zhu, Shuzhi Sam Ge

Human pose estimation has achieved significant progress on images with high imaging resolution.

Pose Estimation

Global Aggregation then Local Distribution for Scene Parsing

1 code implementation28 Jul 2021 Xiangtai Li, Li Zhang, Guangliang Cheng, Kuiyuan Yang, Yunhai Tong, Xiatian Zhu, Tao Xiang

Modelling long-range contextual relationships is critical for pixel-wise prediction tasks such as semantic segmentation.

Scene Parsing Segmentation +1

DeepChange: A Large Long-Term Person Re-Identification Benchmark with Clothes Change

1 code implementation31 May 2021 Peng Xu, Xiatian Zhu

Currently, one of the most significant limitations in this field is the lack of a large realistic benchmark.

Person Identification Person Re-Identification

Few-Shot Website Fingerprinting Attack

no code implementations25 Jan 2021 Mantun Chen, Yongjun Wang, Zhiquan Qin, Xiatian Zhu

To address this, we introduce a model-agnostic, efficient, and Harmonious Data Augmentation (HDA) method that can improve deep WF attacking methods significantly.

Data Augmentation Cryptography and Security 68M25 K.4.1

Few-shot Action Recognition with Prototype-centered Attentive Learning

1 code implementation20 Jan 2021 Xiatian Zhu, Antoine Toisoul, Juan-Manuel Perez-Rua, Li Zhang, Brais Martinez, Tao Xiang

Extensive experiments on four standard few-shot action benchmarks show that our method clearly outperforms previous state-of-the-art methods, with the improvement particularly significant (10+\%) on the most challenging fine-grained action recognition benchmark.

Contrastive Learning Few-Shot action recognition +3

Unsupervised Noisy Tracklet Person Re-identification

no code implementations16 Jan 2021 Minxian Li, Xiatian Zhu, Shaogang Gong

Extensive comparative experiments demonstrate that the proposed STL model surpasses significantly the state-of-the-art unsupervised learning and one-shot learning re-id methods on three large tracklet person re-id benchmarks.

One-Shot Learning Person Re-Identification

Towards Uncovering the Intrinsic Data Structures for Unsupervised Domain Adaptation using Structurally Regularized Deep Clustering

2 code implementations8 Dec 2020 Hui Tang, Xiatian Zhu, Ke Chen, Kui Jia, C. L. Philip Chen

To address this issue, we are motivated by a UDA assumption of structural similarity across domains, and propose to directly uncover the intrinsic target discrimination via constrained clustering, where we constrain the clustering solutions using structural source regularization that hinges on the very same assumption.

Constrained Clustering Deep Clustering +3

Egocentric Action Recognition by Video Attention and Temporal Context

no code implementations3 Jul 2020 Juan-Manuel Perez-Rua, Antoine Toisoul, Brais Martinez, Victor Escorcia, Li Zhang, Xiatian Zhu, Tao Xiang

In this challenge, action recognition is posed as the problem of simultaneously predicting a single `verb' and `noun' class label given an input trimmed video clip.

Action Recognition

Incremental Few-Shot Object Detection

no code implementations CVPR 2020 Juan-Manuel Perez-Rua, Xiatian Zhu, Timothy Hospedales, Tao Xiang

To this end we propose OpeN-ended Centre nEt (ONCE), a detector designed for incrementally learning to detect novel class objects with few examples.

Few-Shot Learning Few-Shot Object Detection +3

Intra-Camera Supervised Person Re-Identification

no code implementations12 Feb 2020 Xiangping Zhu, Xiatian Zhu, Minxian Li, Pietro Morerio, Vittorio Murino, Shaogang Gong

Existing person re-identification (re-id) methods mostly exploit a large set of cross-camera identity labelled training data.

Person Re-Identification

Characteristic Regularisation for Super-Resolving Face Images

no code implementations30 Dec 2019 Zhiyi Cheng, Xiatian Zhu, Shaogang Gong

Extensive evaluations demonstrate the performance superiority of our method over state-of-the-art SR and UDA models on both genuine and artificial LR facial imagery data.

Image Super-Resolution Unsupervised Domain Adaptation

Distribution-Aware Coordinate Representation for Human Pose Estimation

6 code implementations CVPR 2020 Feng Zhang, Xiatian Zhu, Hanbin Dai, Mao Ye, Ce Zhu

Interestingly, we found that the process of decoding the predicted heatmaps into the final joint coordinates in the original image space is surprisingly significant for human pose estimation performance, which nevertheless was not recognised before.

Ranked #2 on Multi-Person Pose Estimation on MS COCO (using extra training data)

Keypoint Detection Multi-Person Pose Estimation

Neural Operator Search

no code implementations25 Sep 2019 Wei Li, Shaogang Gong, Xiatian Zhu

We address this limitation by additionally exploiting feature self-calibration operations, resulting in a heterogeneous search space.

Neural Architecture Search

Intra-Camera Supervised Person Re-Identification: A New Benchmark

no code implementations27 Aug 2019 Xiangping Zhu, Xiatian Zhu, Minxian Li, Vittorio Murino, Shaogang Gong

Existing person re-identification (re-id) methods rely mostly on a large set of inter-camera identity labelled training data, requiring a tedious data collection and annotation process therefore leading to poor scalability in practical re-id applications.

Multi-Label Learning Person Re-Identification

Universal Person Re-Identification

no code implementations22 Jul 2019 Xu Lan, Xiatian Zhu, Shaogang Gong

Most state-of-the-art person re-identification (re-id) methods depend on supervised model learning with a large set of cross-view identity labelled training data.

Domain Generalization Person Re-Identification +1

Unsupervised Deep Learning by Neighbourhood Discovery

1 code implementation25 Apr 2019 Jiabo Huang, Qi Dong, Shaogang Gong, Xiatian Zhu

Deep convolutional neural networks (CNNs) have demonstrated remarkable success in computer vision by supervisedly learning strong visual feature representations.

Deep Learning Image Classification

Low-Resolution Face Recognition

no code implementations21 Nov 2018 Zhiyi Cheng, Xiatian Zhu, Shaogang Gong

Whilst recent face-recognition (FR) techniques have made significant progress on recognising constrained high-resolution web images, the same cannot be said on natively unconstrained low-resolution images at large scales.

Face Recognition Super-Resolution

Single-Label Multi-Class Image Classification by Deep Logistic Regression

no code implementations20 Nov 2018 Qi Dong, Xiatian Zhu, Shaogang Gong

The objective learning formulation is essential for the success of convolutional neural networks.

Attribute Classification +4

Self-Referenced Deep Learning

no code implementations19 Nov 2018 Xu Lan, Xiatian Zhu, Shaogang Gong

Whilst being able to create stronger target networks compared to the vanilla non-teacher based learning strategy, this scheme needs to train additionally a large teacher model with expensive computational cost.

Deep Learning Knowledge Distillation

Fast Human Pose Estimation

1 code implementation CVPR 2019 Feng Zhang, Xiatian Zhu, Mao Ye

In this work, we investigate the under-studied but practically critical pose model efficiency problem.

Pose Estimation

Vehicle Re-Identification in Context

no code implementations25 Sep 2018 Aytaç Kanacı, Xiatian Zhu, Shaogang Gong

Existing vehicle re-identification (re-id) evaluation benchmarks consider strongly artificial test scenarios by assuming the availability of high quality images and fine-grained appearance at an almost constant image scale, reminiscent to images required for Automatic Number Plate Recognition, e. g. VeRi-776.

Vehicle Re-Identification

Unsupervised Person Re-identification by Deep Learning Tracklet Association

no code implementations ECCV 2018 Minxian Li, Xiatian Zhu, Shaogang Gong

Mostexistingpersonre-identification(re-id)methods relyon supervised model learning on per-camera-pair manually labelled pairwise training data.

Benchmarking Deep Learning +2

Semi-Supervised Deep Learning with Memory

1 code implementation ECCV 2018 Yanbei Chen, Xiatian Zhu, Shaogang Gong

We consider the semi-supervised multi-class classification problem of learning from sparse labelled and abundant unlabelled training data.

Deep Learning General Classification +2

Deep Association Learning for Unsupervised Video Person Re-identification

1 code implementation22 Aug 2018 Yanbei Chen, Xiatian Zhu, Shaogang Gong

In this work, to address the video person re-id task, we formulate a novel Deep Association Learning (DAL) scheme, the first end-to-end deep learning method using none of the identity labels in model initialisation and training.

Unsupervised Person Re-Identification Video-Based Person Re-Identification

Person Search by Multi-Scale Matching

no code implementations ECCV 2018 Xu Lan, Xiatian Zhu, Shaogang Gong

In contrast to previous studies, we show that sufficiently reliable person instance cropping is achievable by slightly improved state-of-the-art deep learning object detectors (e. g. Faster-RCNN), and the under-studied multi-scale matching problem in person search is a more severe barrier.

Benchmarking Human Detection +1

Open Logo Detection Challenge

2 code implementations5 Jul 2018 Hang Su, Xiatian Zhu, Shaogang Gong

In this work, we introduce a more realistic and challenging logo detection setting, called Open Logo Detection.

Person Re-Identification in Identity Regression Space

no code implementations25 Jun 2018 Hanxiao Wang, Xiatian Zhu, Shaogang Gong, Tao Xiang

Most existing person re-identification (re-id) methods are unsuitable for real-world deployment due to two reasons: Unscalability to large population size, and Inadaptability over time.

Benchmarking Incremental Learning +2

Knowledge Distillation by On-the-Fly Native Ensemble

3 code implementations NeurIPS 2018 Xu Lan, Xiatian Zhu, Shaogang Gong

Knowledge distillation is effective to train small and generalisable network models for meeting the low-memory and fast running requirements.

Computational Efficiency Image Classification +1

Imbalanced Deep Learning by Minority Class Incremental Rectification

1 code implementation28 Apr 2018 Qi Dong, Shaogang Gong, Xiatian Zhu

In particular, existing deep learning methods consider mostly either class balanced data or moderately imbalanced data in model training, and ignore the challenge of learning from significantly imbalanced training data.

Attribute Deep Learning +1

Surveillance Face Recognition Challenge

1 code implementation25 Apr 2018 Zhiyi Cheng, Xiatian Zhu, Shaogang Gong

To facilitate more studies on developing FR models that are effective and robust for low-resolution surveillance facial images, we introduce a new Surveillance Face Recognition Challenge, which we call the QMUL-SurvFace benchmark.

Face Recognition

Scalable Deep Learning Logo Detection

2 code implementations30 Mar 2018 Hang Su, Shaogang Gong, Xiatian Zhu

Existing logo detection methods usually consider a small number of logo classes and limited images per class with a strong assumption of requiring tedious object bounding box annotations, therefore not scalable to real-world dynamic applications.

Deep Learning Incremental Learning

Transferable Joint Attribute-Identity Deep Learning for Unsupervised Person Re-Identification

no code implementations CVPR 2018 Jingya Wang, Xiatian Zhu, Shaogang Gong, Wei Li

Most existing person re-identification (re-id) methods require supervised model learning from a separate large set of pairwise labelled training data for every single camera pair.

Attribute Deep Learning +2

Harmonious Attention Network for Person Re-Identification

1 code implementation CVPR 2018 Wei Li, Xiatian Zhu, Shaogang Gong

Existing person re-identification (re-id) methods either assume the availability of well-aligned person bounding box images as model input or rely on constrained attention selection mechanisms to calibrate misaligned images.

Person Re-Identification

Class Rectification Hard Mining for Imbalanced Deep Learning

1 code implementation ICCV 2017 Qi Dong, Shaogang Gong, Xiatian Zhu

Recognising detailed facial or clothing attributes in images of people is a challenging task for computer vision, especially when the training data are both in very large scale and extremely imbalanced among different attribute classes.

Attribute Deep Learning

Attribute Recognition by Joint Recurrent Learning of Context and Correlation

no code implementations ICCV 2017 Jingya Wang, Xiatian Zhu, Shaogang Gong, Wei Li

Recognising semantic pedestrian attributes in surveillance images is a challenging task for computer vision, particularly when the imaging quality is poor with complex background clutter and uncontrolled viewing conditions, and the number of labelled training data is small.

Attribute Decoder +2

Deep Reinforcement Learning Attention Selection for Person Re-Identification

no code implementations10 Jul 2017 Xu Lan, Hanxiao Wang, Shaogang Gong, Xiatian Zhu

Existing person re-identification (re-id) methods assume the provision of accurately cropped person bounding boxes with minimum background noise, mostly by manually cropping.

Deep Reinforcement Learning Person Re-Identification +2

Discovering Visual Concept Structure with Sparse and Incomplete Tags

no code implementations30 May 2017 Jingya Wang, Xiatian Zhu, Shaogang Gong

As a result, our model is able to discover more accurate semantic correlation between textual tags and visual features, and finally providing favourable visual semantics interpretation even with highly sparse and incomplete tags.

Benchmarking Clustering +1

Person Re-Identification by Deep Joint Learning of Multi-Loss Classification

no code implementations12 May 2017 Wei Li, Xiatian Zhu, Shaogang Gong

Existing person re-identification (re-id) methods rely mostly on either localised or global feature representation alone.

feature selection General Classification +1

Person Re-Identification by Camera Correlation Aware Feature Augmentation

no code implementations26 Mar 2017 Ying-Cong Chen, Xiatian Zhu, Wei-Shi Zheng, Jian-Huang Lai

The challenge of person re-identification (re-id) is to match individual images of the same person captured by different non-overlapping camera views against significant and unknown cross-view feature distortion.

Person Re-Identification

Deep Learning Logo Detection with Data Expansion by Synthesising Context

no code implementations29 Dec 2016 Hang Su, Xiatian Zhu, Shaogang Gong

Logo detection in unconstrained images is challenging, particularly when only very sparse labelled training images are accessible due to high labelling costs.

Benchmarking Deep Learning

Human-In-The-Loop Person Re-Identification

no code implementations5 Dec 2016 Hanxiao Wang, Shaogang Gong, Xiatian Zhu, Tao Xiang

Current person re-identification (re-id) methods assume that (1) pre-labelled training data are available for every camera pair, (2) the gallery size for re-identification is moderate.

Ensemble Learning Incremental Learning +1

Person Re-Identification by Unsupervised Video Matching

no code implementations25 Nov 2016 Xiaolong Ma, Xiatian Zhu, Shaogang Gong, Xudong Xie, Jianming Hu, Kin-Man Lam, Yisheng Zhong

Crucially, this model does not require pairwise labelled training data (i. e. unsupervised) therefore readily scalable to large scale camera networks of arbitrary camera pairs without the need for exhaustive data annotation for every camera pair.

Benchmarking Dynamic Time Warping +2

Multi-Task Curriculum Transfer Deep Learning of Clothing Attributes

no code implementations12 Oct 2016 Qi Dong, Shaogang Gong, Xiatian Zhu

Recognising detailed clothing characteristics (fine-grained attributes) in unconstrained images of people in-the-wild is a challenging task for computer vision, especially when there is only limited training data from the wild whilst most data available for model learning are captured in well-controlled environments using fashion models (well lit, no background clutter, frontal view, high-resolution).

Attribute Deep Learning +1

Person Re-Identification by Discriminative Selection in Video Ranking

no code implementations23 Jan 2016 Taiqing Wang, Shaogang Gong, Xiatian Zhu, Shengjin Wang

Current person re-identification (ReID) methods typically rely on single-frame imagery features, whilst ignoring space-time information from image sequences often available in the practical surveillance scenarios.

Gait Recognition Person Re-Identification

Learning from Multiple Sources for Video Summarisation

no code implementations13 Jan 2015 Xiatian Zhu, Chen Change Loy, Shaogang Gong

Many visual surveillance tasks, e. g. video summarisation, is conventionally accomplished through analysing imagerybased features.

Clustering Video Understanding

Constructing Robust Affinity Graphs for Spectral Clustering

no code implementations CVPR 2014 Xiatian Zhu, Chen Change Loy, Shaogang Gong

Spectral clustering requires robust and meaningful affinity graphs as input in order to form clusters with desired structures that can well support human intuition.

Clustering

Cannot find the paper you are looking for? You can Submit a new open access paper.