Search Results for author: Gang Hua

Found 111 papers, 49 papers with code

Topical Video Object Discovery from Key Frames by Modeling Word Co-occurrence Prior

no code implementations CVPR 2013 Gangqiang Zhao, Junsong Yuan, Gang Hua

We show that such data driven co-occurrence information from bottom-up can conveniently be incorporated in LDA with a Gaussian Markov prior, which combines top down probabilistic topic modeling with bottom up priors in a unified model.

Object Object Discovery +1

Probabilistic Elastic Matching for Pose Variant Face Verification

no code implementations CVPR 2013 Haoxiang Li, Gang Hua, Zhe Lin, Jonathan Brandt, Jianchao Yang

By augmenting each feature with its location, a Gaussian mixture model (GMM) is trained to capture the spatialappearance distribution of all face images in the training corpus.

Face Recognition Face Verification

Hash-SVM: Scalable Kernel Machines for Large-Scale Visual Classification

no code implementations CVPR 2014 Yadong Mu, Gang Hua, Wei Fan, Shih-Fu Chang

This paper presents a novel algorithm which uses compact hash bits to greatly improve the efficiency of non-linear kernel SVM in very large scale visual classification problems.

Classification General Classification

Semi-supervised Relational Topic Model for Weakly Annotated Image Recognition in Social Media

no code implementations CVPR 2014 Zhenxing Niu, Gang Hua, Xinbo Gao, Qi Tian

In such way, we can efficiently leverage the loosely related tags, and build an intermediate level representation for a collection of weakly annotated images.

Efficient Boosted Exemplar-based Face Detection

no code implementations CVPR 2014 Haoxiang Li, Zhe Lin, Jonathan Brandt, Xiaohui Shen, Gang Hua

Despite the fact that face detection has been studied intensively over the past several decades, the problem is still not completely solved.

Face Detection

Hierarchical-PEP Model for Real-World Face Recognition

no code implementations CVPR 2015 Haoxiang Li, Gang Hua

We apply the PEP model hierarchically to decompose a face image into face parts at different levels of details to build pose-invariant part-based face representations.

Face Recognition Face Verification

Similarity Learning on an Explicit Polynomial Kernel Feature Map for Person Re-Identification

no code implementations CVPR 2015 Dapeng Chen, Zejian yuan, Gang Hua, Nanning Zheng, Jingdong Wang

We follow the learning-to-rank methodology and learn a similarity function to maximize the difference between the similarity scores of matched and unmatched images for a same person.

Learning-To-Rank Patch Matching +1

A Convolutional Neural Network Cascade for Face Detection

no code implementations CVPR 2015 Haoxiang Li, Zhe Lin, Xiaohui Shen, Jonathan Brandt, Gang Hua

To improve localization effectiveness, and reduce the number of candidates at later stages, we introduce a CNN-based calibration stage after each of the detection stages in the cascade.

Face Detection

Multi-Class Multi-Annotator Active Learning With Robust Gaussian Process for Visual Recognition

no code implementations ICCV 2015 Chengjiang Long, Gang Hua

Based on the EP approximation inference, a generalized Expectation Maximization (GEM) algorithm is derived to estimate both the parameters for instances and the quality of each individual annotator.

Active Learning Bayesian Inference +2

Learning Discriminative Reconstructions for Unsupervised Outlier Removal

no code implementations ICCV 2015 Yan Xia, Xudong Cao, Fang Wen, Gang Hua, Jian Sun

We study the problem of automatically removing outliers from noisy data, with application for removing outlier images from an image collection.

Neural Aggregation Network for Video Face Recognition

no code implementations CVPR 2017 Jiaolong Yang, Peiran Ren, Dong-Qing Zhang, Dong Chen, Fang Wen, Hongdong Li, Gang Hua

The network takes a face video or face image set of a person with a variable number of face images as its input, and produces a compact, fixed-dimension feature representation for recognition.

Face Recognition Face Verification

Counting Grid Aggregation for Event Retrieval and Recognition

no code implementations5 Apr 2016 Zhanning Gao, Gang Hua, Dongqing Zhang, Jianru Xue, Nanning Zheng

Event retrieval and recognition in a large corpus of videos necessitates a holistic fixed-size visual representation at the video clip level that is comprehensive, compact, and yet discriminative.

Retrieval

Ordinal Regression With Multiple Output CNN for Age Estimation

no code implementations CVPR 2016 Zhenxing Niu, Mo Zhou, Le Wang, Xinbo Gao, Gang Hua

To address the non-stationary property of aging patterns, age estimation can be cast as an ordinal regression problem.

Age Estimation Binary Classification +3

A Multi-Level Contextual Model For Person Recognition in Photo Albums

no code implementations CVPR 2016 Haoxiang Li, Jonathan Brandt, Zhe Lin, Xiaohui Shen, Gang Hua

Our new framework enables efficient use of these complementary multi-level contextual cues to improve overall recognition rates on the photo album person recognition task, as demonstrated through state-of-the-art results on a challenging public dataset.

Person Recognition

Supervised Transformer Network for Efficient Face Detection

no code implementations19 Jul 2016 Dong Chen, Gang Hua, Fang Wen, Jian Sun

For real-time performance, we run the cascaded network only on regions of interests produced from a boosting cascade face detector.

Face Detection Region Proposal +1

Revisiting Deep Intrinsic Image Decompositions

no code implementations CVPR 2018 Qingnan Fan, Jiaolong Yang, Gang Hua, Baoquan Chen, David Wipf

While invaluable for many computer vision applications, decomposing a natural image into intrinsic reflectance and shading layers represents a challenging, underdetermined inverse problem.

Collaborative Deep Reinforcement Learning for Joint Object Search

no code implementations CVPR 2017 Xiangyu Kong, Bo Xin, Yizhou Wang, Gang Hua

We examine the problem of joint top-down active search of multiple objects under interaction, e. g., person riding a bicycle, cups held by the table, etc..

Active Object Localization Object +5

StyleBank: An Explicit Representation for Neural Image Style Transfer

1 code implementation CVPR 2017 Dongdong Chen, Lu Yuan, Jing Liao, Nenghai Yu, Gang Hua

It also enables us to conduct incremental learning to add a new image style by learning a new filter bank while holding the auto-encoder fixed.

Incremental Learning Style Transfer

Coherent Online Video Style Transfer

no code implementations ICCV 2017 Dongdong Chen, Jing Liao, Lu Yuan, Nenghai Yu, Gang Hua

Training a feed-forward network for fast neural style transfer of images is proven to be successful.

Image Stylization Video Style Transfer

Visual Attribute Transfer through Deep Image Analogy

5 code implementations2 May 2017 Jing Liao, Yuan YAO, Lu Yuan, Gang Hua, Sing Bing Kang

We propose a new technique for visual attribute transfer across images that may have very different appearance but have perceptually similar semantic structure.

Attribute

Hidden Talents of the Variational Autoencoder

1 code implementation16 Jun 2017 Bin Dai, Yu Wang, John Aston, Gang Hua, David Wipf

Variational autoencoders (VAE) represent a popular, flexible form of deep generative model that can be stochastically fit to samples from a given random process using an information-theoretic variational bound on the true underlying distribution.

Dimensionality Reduction

Correlational Gaussian Processes for Cross-Domain Visual Recognition

no code implementations CVPR 2017 Chengjiang Long, Gang Hua

A set of correlational tensors is adopted to model the relationship within a single domain as well as across multiple domains.

Gaussian Processes

Order-Preserving Wasserstein Distance for Sequence Matching

no code implementations CVPR 2017 Bing Su, Gang Hua

We present a new distance measure between sequences that can tackle local temporal distortion and periodic sequences with arbitrary starting points.

A Generic Deep Architecture for Single Image Reflection Removal and Image Smoothing

1 code implementation ICCV 2017 Qingnan Fan, Jiaolong Yang, Gang Hua, Baoquan Chen, David Wipf

This paper proposes a deep neural network structure that exploits edge information in addressing representative low-level vision tasks such as layer separation and image filtering.

image smoothing Reflection Removal +1

Fast, Accurate Thin-Structure Obstacle Detection for Autonomous Mobile Robots

no code implementations14 Aug 2017 Chen Zhou, Jiaolong Yang, Chunshui Zhao, Gang Hua

This work is devoted to a task that is indispensable for safety yet was largely overlooked in the past -- detecting obstacles that are of very thin structures, such as wires, cables and tree branches.

Self-Driving Cars Visual Odometry

Hierarchical Multimodal LSTM for Dense Visual-Semantic Embedding

no code implementations ICCV 2017 Zhenxing Niu, Mo Zhou, Le Wang, Xinbo Gao, Gang Hua

We address the problem of dense visual-semantic embedding that maps not only full sentences and whole images but also phrases within sentences and salient regions within images into a multimodal embedding space.

Sentence

Understanding and Predicting The Attractiveness of Human Action Shot

no code implementations2 Nov 2017 Bin Dai, Baoyuan Wang, Gang Hua

Selecting attractive photos from a human action shot sequence is quite challenging, because of the subjective nature of the "attractiveness", which is mainly a combined factor of human pose in action and the background.

Semi-supervised FusedGAN for Conditional Image Generation

no code implementations ECCV 2018 Navaneeth Bodla, Gang Hua, Rama Chellappa

We achieve this by fusing two generators: one for unconditional image generation, and the other for conditional image generation, where the two partly share a common latent space thereby disentangling the generation.

Attribute Conditional Image Generation +2

Stereoscopic Neural Style Transfer

no code implementations CVPR 2018 Dongdong Chen, Lu Yuan, Jing Liao, Nenghai Yu, Gang Hua

This paper presents the first attempt at stereoscopic neural style transfer, which responds to the emerging demand for 3D movies or AR/VR.

Style Transfer

Attention-based Temporal Weighted Convolutional Neural Network for Action Recognition

no code implementations19 Mar 2018 Jinliang Zang, Le Wang, Ziyi Liu, Qilin Zhang, Zhenxing Niu, Gang Hua, Nanning Zheng

Research in human action recognition has accelerated significantly since the introduction of powerful machine learning tools such as Convolutional Neural Networks (CNNs).

Action Recognition Temporal Action Localization

Stacked Cross Attention for Image-Text Matching

6 code implementations ECCV 2018 Kuang-Huei Lee, Xi Chen, Gang Hua, Houdong Hu, Xiaodong He

Prior work either simply aggregates the similarity of all possible pairs of regions and words without attending differentially to more and less important words or regions, or uses a multi-step attentional process to capture limited number of semantic alignments which is less interpretable.

Image Retrieval Image-text matching +5

Towards Open-Set Identity Preserving Face Synthesis

no code implementations CVPR 2018 Jianmin Bao, Dong Chen, Fang Wen, Houqiang Li, Gang Hua

We then recombine the identity vector and the attribute vector to synthesize a new face of the subject with the extracted attribute.

Attribute Face Generation

Decouple Learning for Parameterized Image Operators

1 code implementation ECCV 2018 Qingnan Fan, Dong-Dong Chen, Lu Yuan, Gang Hua, Nenghai Yu, Baoquan Chen

Many different deep networks have been used to approximate, accelerate or improve traditional image operators, such as image smoothing, super-resolution and denoising.

Denoising image smoothing +1

LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks

1 code implementation ECCV 2018 Dongqing Zhang, Jiaolong Yang, Dongqiangzi Ye, Gang Hua

Although weight and activation quantization is an effective approach for Deep Neural Network (DNN) compression and has a lot of potentials to increase inference speed leveraging bit-operations, there is still a noticeable gap in terms of prediction accuracy between the quantized model and the full-precision model.

Quantization

Conscious Inference for Object Detection

no code implementations27 Sep 2018 Jiahuan Zhou, Nikolaos Karianakis, Ying Wu, Gang Hua

Current Convolutional Neural Network (CNN)-based object detection models adopt strictly feedforward inference to predict the final detection results.

6D Pose Estimation using RGB Object +2

A General Decoupled Learning Framework for Parameterized Image Operators

no code implementations11 Jul 2019 Qingnan Fan, Dong-Dong Chen, Lu Yuan, Gang Hua, Nenghai Yu, Baoquan Chen

To overcome this limitation, we propose a new decoupled learning algorithm to learn from the operator parameters to dynamically adjust the weights of a deep network for image operators, denoted as the base network.

Any-Precision Deep Neural Networks

2 code implementations17 Nov 2019 Haichao Yu, Haoxiang Li, Honghui Shi, Thomas S. Huang, Gang Hua

When all layers are set to low-bits, we show that the model achieved accuracy comparable to dedicated models trained at the same precision.

Ladder Loss for Coherent Visual-Semantic Embedding

2 code implementations18 Nov 2019 Mo Zhou, Zhenxing Niu, Le Wang, Zhanning Gao, Qilin Zhang, Gang Hua

For visual-semantic embedding, the existing methods normally treat the relevance between queries and candidates in a bipolar way -- relevant or irrelevant, and all "irrelevant" candidates are uniformly pushed away from the query by an equal margin in the embedding space, regardless of their various proximity to the query.

Retrieval

Calibrated Domain-Invariant Learning for Highly Generalizable Large Scale Re-Identification

1 code implementation26 Nov 2019 Ye Yuan, Wuyang Chen, Tianlong Chen, Yang Yang, Zhou Ren, Zhangyang Wang, Gang Hua

Many real-world applications, such as city-scale traffic monitoring and control, requires large-scale re-identification.

Adversarial Ranking Attack and Defense

3 code implementations ECCV 2020 Mo Zhou, Zhenxing Niu, Le Wang, Qilin Zhang, Gang Hua

In this paper, we propose two attacks against deep ranking systems, i. e., Candidate Attack and Query Attack, that can raise or lower the rank of chosen candidates by adversarial perturbations.

Adversarial Attack Image Retrieval

gDLS*: Generalized Pose-and-Scale Estimation Given Scale and Gravity Priors

no code implementations CVPR 2020 Victor Fragoso, Joseph DeGol, Gang Hua

Many real-world applications in augmented reality (AR), 3D mapping, and robotics require both fast and accurate estimation of camera poses and scales from multiple images captured by multiple cameras or a single moving camera.

Few-Shot Open-Set Recognition using Meta-Learning

1 code implementation CVPR 2020 Bo Liu, Hao Kang, Haoxiang Li, Gang Hua, Nuno Vasconcelos

It is argued that the classic softmax classifier is a poor solution for open-set recognition, since it tends to overfit on the training classes.

Classification General Classification +3

Improving Person Re-identification with Iterative Impression Aggregation

no code implementations21 Sep 2020 Dengpan Fu, Bo Xin, Jingdong Wang, Dong-Dong Chen, Jianmin Bao, Gang Hua, Houqiang Li

Not only does such a simple method improve the performance of the baseline models, it also achieves comparable performance with latest advanced re-ranking methods.

Person Re-Identification Re-Ranking

Passport-aware Normalization for Deep Model Protection

1 code implementation NeurIPS 2020 Jie Zhang, Dongdong Chen, Jing Liao, Weiming Zhang, Gang Hua, Nenghai Yu

Only when the model IP is suspected to be stolen by someone, the private passport-aware branch is added back for ownership verification.

Model Compression

LG-GAN: Label Guided Adversarial Network for Flexible Targeted Attack of Point Cloud-based Deep Networks

no code implementations1 Nov 2020 Hang Zhou, Dongdong Chen, Jing Liao, Weiming Zhang, Kejiang Chen, Xiaoyi Dong, Kunlin Liu, Gang Hua, Nenghai Yu

To overcome these shortcomings, this paper proposes a novel label guided adversarial network (LG-GAN) for real-time flexible targeted point cloud attack.

Efficient Semantic Image Synthesis via Class-Adaptive Normalization

1 code implementation8 Dec 2020 Zhentao Tan, Dongdong Chen, Qi Chu, Menglei Chai, Jing Liao, Mingming He, Lu Yuan, Gang Hua, Nenghai Yu

Spatially-adaptive normalization (SPADE) is remarkably successful recently in conditional semantic image synthesis \cite{park2019semantic}, which modulates the normalized activation with spatially-varying transformations learned from semantic layouts, to prevent the semantic information from being washed away.

Image Generation

Practical Order Attack in Deep Ranking

no code implementations1 Jan 2021 Mo Zhou, Le Wang, Zhenxing Niu, Qilin Zhang, Xu Yinghui, Nanning Zheng, Gang Hua

The objective of this paper is to formalize and practically implement a new adversarial attack against deep ranking systems, i. e., the Order Attack, which covertly alters the relative order of a selected set of candidates according to a permutation vector predefined by the attacker, with only limited interference to other unrelated candidates.

Adversarial Attack Image Retrieval

Deep Model Intellectual Property Protection via Deep Watermarking

1 code implementation8 Mar 2021 Jie Zhang, Dongdong Chen, Jing Liao, Weiming Zhang, Huamin Feng, Gang Hua, Nenghai Yu

By jointly training the target model and watermark embedding, the extra barrier can even be absorbed into the target model.

Practical Relative Order Attack in Deep Ranking

2 code implementations ICCV 2021 Mo Zhou, Le Wang, Zhenxing Niu, Qilin Zhang, Yinghui Xu, Nanning Zheng, Gang Hua

In this paper, we formulate a new adversarial attack against deep ranking systems, i. e., the Order Attack, which covertly alters the relative order among a selected set of candidates according to an attacker-specified permutation, with limited interference to other unrelated candidates.

Adversarial Attack

Diverse Semantic Image Synthesis via Probability Distribution Modeling

1 code implementation CVPR 2021 Zhentao Tan, Menglei Chai, Dongdong Chen, Jing Liao, Qi Chu, Bin Liu, Gang Hua, Nenghai Yu

In this paper, we propose a novel diverse semantic image synthesis framework from the perspective of semantic class distributions, which naturally supports diverse generation at semantic or even instance level.

Image-to-Image Translation

Beyond Visual Attractiveness: Physically Plausible Single Image HDR Reconstruction for Spherical Panoramas

no code implementations24 Mar 2021 Wei Wei, Li Guan, Yue Liu, Hao Kang, Haoxiang Li, Ying Wu, Gang Hua

By the proposed physical regularization, our method can generate HDRs which are not only visually appealing but also physically plausible.

HDR Reconstruction Single-shot HDR Reconstruction

SGCN:Sparse Graph Convolution Network for Pedestrian Trajectory Prediction

3 code implementations4 Apr 2021 Liushuai Shi, Le Wang, Chengjiang Long, Sanping Zhou, Mo Zhou, Zhenxing Niu, Gang Hua

Meanwhile, we use a sparse directed temporal graph to model the motion tendency, thus to facilitate the prediction based on the observed direction.

Pedestrian Trajectory Prediction Trajectory Prediction

E2Style: Improve the Efficiency and Effectiveness of StyleGAN Inversion

2 code implementations15 Apr 2021 Tianyi Wei, Dongdong Chen, Wenbo Zhou, Jing Liao, Weiming Zhang, Lu Yuan, Gang Hua, Nenghai Yu

This paper studies the problem of StyleGAN inversion, which plays an essential role in enabling the pretrained StyleGAN to be used for real image editing tasks.

Face Parsing

Sparse Pose Trajectory Completion

no code implementations1 May 2021 Bo Liu, Mandar Dixit, Roland Kwitt, Gang Hua, Nuno Vasconcelos

In the absence of dense pose sampling in image space, these latent space trajectories provide cross-modal guidance for learning.

Novel View Synthesis Object

Breadcrumbs: Adversarial Class-Balanced Sampling for Long-tailed Recognition

no code implementations1 May 2021 Bo Liu, Haoxiang Li, Hao Kang, Gang Hua, Nuno Vasconcelos

It is shown that, unlike class-balanced sampling, this is an adversarial augmentation strategy.

GistNet: a Geometric Structure Transfer Network for Long-Tailed Recognition

no code implementations ICCV 2021 Bo Liu, Haoxiang Li, Hao Kang, Gang Hua, Nuno Vasconcelos

A new learning algorithm is then proposed for GeometrIc Structure Transfer (GIST), with resort to a combination of loss functions that combine class-balanced and random sampling to guarantee that, while overfitting to the popular classes is restricted to geometric parameters, it is leveraged to transfer class geometry from popular to few-shot classes.

Transfer Learning

Semi-supervised Long-tailed Recognition using Alternate Sampling

no code implementations1 May 2021 Bo Liu, Haoxiang Li, Hao Kang, Nuno Vasconcelos, Gang Hua

A consistency loss has been introduced to limit the impact from unlabeled data while leveraging them to update the feature embedding.

Learning Dynamics via Graph Neural Networks for Human Pose Estimation and Tracking

no code implementations CVPR 2021 Yiding Yang, Zhou Ren, Haoxiang Li, Chunluan Zhou, Xinchao Wang, Gang Hua

In this paper, we propose a novel online approach to learning the pose dynamics, which are independent of pose detections in current fame, and hence may serve as a robust estimation even in challenging scenarios including occlusion.

Multi-Person Pose Estimation Multi-Person Pose Estimation and Tracking +1

Adversarial Attack and Defense in Deep Ranking

1 code implementation7 Jun 2021 Mo Zhou, Le Wang, Zhenxing Niu, Qilin Zhang, Nanning Zheng, Gang Hua

In this paper, we propose two attacks against deep ranking systems, i. e., Candidate Attack and Query Attack, that can raise or lower the rank of chosen candidates by adversarial perturbations.

Adversarial Attack Adversarial Robustness

Video Imprint

no code implementations7 Jun 2021 Zhanning Gao, Le Wang, Nebojsa Jojic, Zhenxing Niu, Nanning Zheng, Gang Hua

In the proposed framework, a dedicated feature alignment module is incorporated for redundancy removal across frames to produce the tensor representation, i. e., the video imprint.

Language Modelling Retrieval

Learning View Selection for 3D Scenes

no code implementations CVPR 2021 Yifan Sun, QiXing Huang, Dun-Yu Hsiao, Li Guan, Gang Hua

Efficient 3D space sampling to represent an underlying3D object/scene is essential for 3D vision, robotics, and be-yond.

SGCN: Sparse Graph Convolution Network for Pedestrian Trajectory Prediction

no code implementations CVPR 2021 Liushuai Shi, Le Wang, Chengjiang Long, Sanping Zhou, Mo Zhou, Zhenxing Niu, Gang Hua

Specifically, the SGCN explicitly models the sparse directed interaction with a sparse directed spatial graph to capture adaptive interaction pedestrians.

Pedestrian Trajectory Prediction Trajectory Prediction

Unlimited Neighborhood Interaction for Heterogeneous Trajectory Prediction

1 code implementation ICCV 2021 Fang Zheng, Le Wang, Sanping Zhou, Wei Tang, Zhenxing Niu, Nanning Zheng, Gang Hua

Specifically, the proposed unlimited neighborhood interaction module generates the fused-features of all agents involved in an interaction simultaneously, which is adaptive to any number of agents and any range of interaction area.

Graph Attention Trajectory Prediction

Poison Ink: Robust and Invisible Backdoor Attack

1 code implementation5 Aug 2021 Jie Zhang, Dongdong Chen, Qidong Huang, Jing Liao, Weiming Zhang, Huamin Feng, Gang Hua, Nenghai Yu

As the image structure can keep its semantic meaning during the data transformation, such trigger pattern is inherently robust to data transformations.

Backdoor Attack Data Poisoning

Exploring Structure Consistency for Deep Model Watermarking

no code implementations5 Aug 2021 Jie Zhang, Dongdong Chen, Jing Liao, Han Fang, Zehua Ma, Weiming Zhang, Gang Hua, Nenghai Yu

However, little attention has been devoted to the protection of DNNs in image processing tasks.

Data Augmentation

DSSL: Deep Surroundings-person Separation Learning for Text-based Person Retrieval

1 code implementation12 Sep 2021 Aichun Zhu, Zijie Wang, Yifeng Li, Xili Wan, Jing Jin, Tian Wang, Fangqiang Hu, Gang Hua

Many previous methods on text-based person retrieval tasks are devoted to learning a latent common space mapping, with the purpose of extracting modality-invariant features from both visual and textual modality.

Person Retrieval Retrieval +2

Weakly-guided Self-supervised Pretraining for Temporal Activity Detection

1 code implementation26 Nov 2021 Kumara Kahatapitiya, Zhou Ren, Haoxiang Li, Zhenyu Wu, Michael S. Ryoo, Gang Hua

However, such pretrained models are not ideal for downstream detection, due to the disparity between the pretraining and the downstream fine-tuning tasks.

Action Detection Activity Detection +2

Robust Pose Estimation in Crowded Scenes with Direct Pose-Level Inference

1 code implementation NeurIPS 2021 Dongkai Wang, Shiliang Zhang, Gang Hua

Instead of inferring individual keypoints, the Pose-level Inference Network (PINet) directly infers the complete pose cues for a person from his/her visible body parts.

Multi-Person Pose Estimation

Progressive Backdoor Erasing via connecting Backdoor and Adversarial Attacks

no code implementations CVPR 2023 Bingxu Mu, Zhenxing Niu, Le Wang, Xue Wang, Rong Jin, Gang Hua

Deep neural networks (DNNs) are known to be vulnerable to both backdoor attacks as well as adversarial attacks.

backdoor defense

E^2TAD: An Energy-Efficient Tracking-based Action Detector

1 code implementation9 Apr 2022 Xin Hu, Zhenyu Wu, Hao-Yu Miao, Siqi Fan, Taiyu Long, Zhenyu Hu, Pengcheng Pi, Yi Wu, Zhou Ren, Zhangyang Wang, Gang Hua

Video action detection (spatio-temporal action localization) is usually the starting point for human-centric intelligent analysis of videos nowadays.

Fine-Grained Action Detection object-detection +3

PointCAT: Contrastive Adversarial Training for Robust Point Cloud Recognition

no code implementations16 Sep 2022 Qidong Huang, Xiaoyi Dong, Dongdong Chen, Hang Zhou, Weiming Zhang, Kui Zhang, Gang Hua, Nenghai Yu

Notwithstanding the prominent performance achieved in various applications, point cloud recognition models have often suffered from natural corruptions and adversarial perturbations.

Exploring Discrete Diffusion Models for Image Captioning

1 code implementation21 Nov 2022 Zixin Zhu, Yixuan Wei, JianFeng Wang, Zhe Gan, Zheng Zhang, Le Wang, Gang Hua, Lijuan Wang, Zicheng Liu, Han Hu

The image captioning task is typically realized by an auto-regressive method that decodes the text tokens one by one.

Image Captioning Image Generation

Boosted Dynamic Neural Networks

1 code implementation30 Nov 2022 Haichao Yu, Haoxiang Li, Gang Hua, Gao Huang, Humphrey Shi

To optimize the model, these prediction heads together with the network backbone are trained on every batch of training data.

DDM-NET: End-to-end learning of keypoint feature Detection, Description and Matching for 3D localization

1 code implementation8 Dec 2022 Xiangyu Xu, Li Guan, Enrique Dunn, Haoxiang Li, Gang Hua

In this paper, we propose an end-to-end framework that jointly learns keypoint detection, descriptor representation and cross-frame matching for the task of image-based 3D localization.

Keypoint Detection

Sparse Instance Conditioned Multimodal Trajectory Prediction

no code implementations ICCV 2023 Yonghao Dong, Le Wang, Sanping Zhou, Gang Hua

Specifically, SICNet learns comprehensive sparse instances, i. e., representative points of the future trajectory, through a mask generated by a long short-term memory encoder and uses the memory mechanism to store and retrieve such sparse instances.

Future prediction Pedestrian Trajectory Prediction +1

Learning from Noisy Pseudo Labels for Semi-Supervised Temporal Action Localization

1 code implementation ICCV 2023 Kun Xia, Le Wang, Sanping Zhou, Gang Hua, Wei Tang

To this end, we propose a unified framework, termed Noisy Pseudo-Label Learning, to handle both location biases and category errors.

Pseudo Label Temporal Action Localization

Parallel Attention Interaction Network for Few-Shot Skeleton-Based Action Recognition

no code implementations ICCV 2023 Xingyu Liu, Sanping Zhou, Le Wang, Gang Hua

Learning discriminative features from very few labeled samples to identify novel classes has received increasing attention in skeleton-based action recognition.

Action Recognition Skeleton Based Action Recognition

Diversity-Aware Meta Visual Prompting

1 code implementation CVPR 2023 Qidong Huang, Xiaoyi Dong, Dongdong Chen, Weiming Zhang, Feifei Wang, Gang Hua, Nenghai Yu

We present Diversity-Aware Meta Visual Prompting~(DAM-VP), an efficient and effective prompting method for transferring pre-trained models to downstream tasks with frozen backbone.

Visual Prompting

MotionTrack: Learning Robust Short-term and Long-term Motions for Multi-Object Tracking

no code implementations CVPR 2023 Zheng Qin, Sanping Zhou, Le Wang, Jinghai Duan, Gang Hua, Wei Tang

For dense crowds, we design a novel Interaction Module to learn interaction-aware motions from short-term trajectories, which can estimate the complex movement of each target.

motion prediction Multi-Object Tracking

Regularizing Second-Order Influences for Continual Learning

1 code implementation CVPR 2023 Zhicheng Sun, Yadong Mu, Gang Hua

Continual learning aims to learn on non-stationary data streams without catastrophically forgetting previous knowledge.

Continual Learning

Designing a Better Asymmetric VQGAN for StableDiffusion

2 code implementations7 Jun 2023 Zixin Zhu, Xuelu Feng, Dongdong Chen, Jianmin Bao, Le Wang, Yinpeng Chen, Lu Yuan, Gang Hua

The training cost of our asymmetric VQGAN is cheap, and we only need to retrain a new asymmetric decoder while keeping the vanilla VQGAN encoder and StableDiffusion unchanged.

Image Inpainting

HQ-50K: A Large-scale, High-quality Dataset for Image Restoration

1 code implementation8 Jun 2023 Qinhong Yang, Dongdong Chen, Zhentao Tan, Qiankun Liu, Qi Chu, Jianmin Bao, Lu Yuan, Gang Hua, Nenghai Yu

This paper introduces a new large-scale image restoration dataset, called HQ-50K, which contains 50, 000 high-quality images with rich texture details and semantic diversity.

Denoising Image Restoration +2

SOAR: Scene-debiasing Open-set Action Recognition

1 code implementation ICCV 2023 Yuanhao Zhai, Ziyi Liu, Zhenyu Wu, Yi Wu, Chunluan Zhou, David Doermann, Junsong Yuan, Gang Hua

The former prevents the decoder from reconstructing the video background given video features, and thus helps reduce the background information in feature learning.

Open Set Action Recognition Scene Classification

Flexible Visual Recognition by Evidential Modeling of Confusion and Ignorance

no code implementations ICCV 2023 Lei Fan, Bo Liu, Haoxiang Li, Ying Wu, Gang Hua

First, prediction uncertainty should be separately quantified as confusion depicting inter-class uncertainties and ignorance identifying out-of-distribution samples.

Decision Making

HairCLIPv2: Unifying Hair Editing via Proxy Feature Blending

1 code implementation ICCV 2023 Tianyi Wei, Dongdong Chen, Wenbo Zhou, Jing Liao, Weiming Zhang, Gang Hua, Nenghai Yu

Even though they can enable very fine-grained local control, such interaction modes are inefficient for the editing conditions that can be easily specified by language descriptions or reference images.

Attribute

Evidential Active Recognition: Intelligent and Prudent Open-World Embodied Perception

no code implementations23 Nov 2023 Lei Fan, Mingfu Liang, Yunxuan Li, Gang Hua, Ying Wu

Active recognition enables robots to intelligently explore novel observations, thereby acquiring more information while circumventing undesired viewing conditions.

Uncertainty Quantification

Sparse Pedestrian Character Learning for Trajectory Prediction

no code implementations27 Nov 2023 Yonghao Dong, Le Wang, Sanpin Zhou, Gang Hua, Changyin Sun

Specifically, TSNet learns the negative-removed characters in the sparse character representation stream to improve the trajectory embedding obtained in the trajectory representation stream.

Autonomous Driving Pedestrian Trajectory Prediction +1

UGG: Unified Generative Grasping

1 code implementation28 Nov 2023 Jiaxin Lu, Hao Kang, Haoxiang Li, Bo Liu, Yiding Yang, QiXing Huang, Gang Hua

Generation-based methods that generate grasping postures conditioned on the object can often produce diverse grasping, but they are insufficient for high grasping success due to lack of discriminative information.

Grasp Generation Object

DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision

no code implementations26 Dec 2023 Lu Ling, Yichen Sheng, Zhi Tu, Wentian Zhao, Cheng Xin, Kun Wan, Lantao Yu, Qianyu Guo, Zixun Yu, Yawen Lu, Xuanmao Li, Xingpeng Sun, Rohan Ashok, Aniruddha Mukherjee, Hao Kang, Xiangrui Kong, Gang Hua, Tianyi Zhang, Bedrich Benes, Aniket Bera

We have witnessed significant progress in deep learning-based 3D vision, ranging from neural radiance field (NeRF) based 3D representation learning to applications in novel view synthesis (NVS).

Novel View Synthesis Representation Learning

Jailbreaking Attack against Multimodal Large Language Model

1 code implementation4 Feb 2024 Zhenxing Niu, Haodong Ren, Xinbo Gao, Gang Hua, Rong Jin

This paper focuses on jailbreaking attacks against multi-modal large language models (MLLMs), seeking to elicit MLLMs to generate objectionable responses to harmful user queries.

Language Modelling Large Language Model

Deployment Prior Injection for Run-time Calibratable Object Detection

no code implementations27 Feb 2024 Mo Zhou, Yiding Yang, Haoxiang Li, Vishal M. Patel, Gang Hua

With a strong alignment between the training and test distributions, object relation as a context prior facilitates object detection.

Object object-detection +1

Recurrent Aligned Network for Generalized Pedestrian Trajectory Prediction

no code implementations9 Mar 2024 Yonghao Dong, Le Wang, Sanping Zhou, Gang Hua, Changyin Sun

Previous studies have tried to tackle this problem by leveraging a portion of the trajectory data from the target domain to adapt the model.

Domain Adaptation Pedestrian Trajectory Prediction +1

Boosting Semi-Supervised Temporal Action Localization by Learning from Non-Target Classes

no code implementations17 Mar 2024 Kun Xia, Le Wang, Sanping Zhou, Gang Hua, Wei Tang

To this end, we first devise innovative strategies to adaptively select high-quality positive and negative classes from the label space, by modeling both the confidence and rank of a class in relation to those of the target class.

Temporal Action Localization

Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object Segmentation

1 code implementation18 Mar 2024 Zixin Zhu, Xuelu Feng, Dongdong Chen, Junsong Yuan, Chunming Qiao, Gang Hua

We hypothesize that the latent representation learned from a pretrained generative T2V model encapsulates rich semantics and coherent temporal correspondences, thereby naturally facilitating video understanding.

Referring Video Object Segmentation Semantic Segmentation +2

Transformer based Pluralistic Image Completion with Reduced Information Loss

1 code implementation31 Mar 2024 Qiankun Liu, Yuqi Jiang, Zhentao Tan, Dongdong Chen, Ying Fu, Qi Chu, Gang Hua, Nenghai Yu

The indices of quantized pixels are used as tokens for the inputs and prediction targets of the transformer.

Image Inpainting Quantization

Cannot find the paper you are looking for? You can Submit a new open access paper.