no code implementations • ECCV 2020 • Wei-Lun Chen, Zhao-Xiang Zhang, Xiaolin Hu, Baoyuan Wu
Decision-based black-box adversarial attacks (decision-based attack) pose a severe threat to current deep neural networks, as they only need the predicted label of the target model to craft adversarial examples.
1 code implementation • 3 Oct 2024 • Xinhao Yao, Hongjin Qian, Xiaolin Hu, Gengze Xu, Yong liu
We present a theoretical analysis of these phenomena from two perspectives: (i) Generalization, where we demonstrate that fine-tuning only $\mathbf{W}_q$ and $\mathbf{W}_v$ improves generalization bounds, enhances memory efficiency, and (ii) Optimization, where we emphasize that the feature learning of the attention mechanism is efficient, particularly when using distinct learning rates for the matrices, which leads to more effective fine-tuning.
1 code implementation • 2 Oct 2024 • Kai Li, Wendi Sang, Chang Zeng, Runxuan Yang, Guo Chen, Xiaolin Hu
Leveraging SonicSim, we constructed a moving sound source benchmark dataset, SonicSet, using the Librispeech, the Freesound Dataset 50k (FSD50K) and Free Music Archive (FMA), and 90 scenes from the Matterport3D to evaluate speech separation and enhancement models.
no code implementations • 2 Oct 2024 • Mohan Xu, Kai Li, Guo Chen, Xiaolin Hu
Experimental results showed that models trained on EchoSet had better generalization ability than those trained on other datasets to the data collected in the physical world, which validated the practical value of the EchoSet.
no code implementations • 25 Sep 2024 • Qibin Wang, Xiaolin Hu, Weikai Xu, Wei Liu, Jian Luan, Bin Wang
Low-rank adaptation (LoRA) and its variants have recently gained much interest due to their ability to avoid excessive inference costs.
no code implementations • 1 Aug 2024 • Xiao Li, Wenxuan Sun, Huanran Chen, Qiongxiu Li, Yining Liu, Yingzhe He, Jie Shi, Xiaolin Hu
Recently Diffusion-based Purification (DiffPure) has been recognized as an effective defense method against adversarial examples.
1 code implementation • 15 Jul 2024 • Xiao Li, Yining Liu, Na Dong, Sitian Qin, Xiaolin Hu
With these annotations, we build part-based methods directly on the standard IN-1K dataset for robust recognition.
1 code implementation • 10 Jul 2024 • Xianghao Kong, Jinyu Chen, Wenguan Wang, Hang Su, Xiaolin Hu, Yi Yang, Si Liu
Leveraging the capabilities of Large Language Models (LLMs), we propose C-Instructor, which utilizes the chain-of-thought-style prompt for style-controllable and content-controllable instruction generation.
no code implementations • 20 Jun 2024 • Jiawei Gao, Ziqin Wang, Zeqi Xiao, Jingbo Wang, Tai Wang, Jinkun Cao, Xiaolin Hu, Si Liu, Jifeng Dai, Jiangmiao Pang
Following this, the agent learns to collaborate with others by considering the shared dynamics of the manipulated object during parallel training using Multi Agent Proximal Policy Optimization (MAPPO).
no code implementations • 17 Jun 2024 • Han Liu, Yupeng Zhang, Bingning Wang, WeiPeng Chen, Xiaolin Hu
Deep Neural Networks (DNNs) excel in various domains but face challenges in providing accurate uncertainty estimates, which are crucial for high-stakes applications.
1 code implementation • 6 Jun 2024 • Xinhao Yao, Xiaolin Hu, Shenzhi Yang, Yong liu
Pre-trained large language models (LLMs) based on Transformer have demonstrated striking in-context learning (ICL) abilities.
no code implementations • 30 May 2024 • Han Liu, Peng Cui, Bingning Wang, Jun Zhu, Xiaolin Hu
Deep Neural Networks (DNNs) have achieved remarkable success in a variety of tasks, especially when it comes to prediction accuracy.
no code implementations • CVPR 2024 • Xiaopei Zhu, Yuqiu Liu, Zhanhao Hu, Jianmin Li, Xiaolin Hu
In this work, we propose a physical attack method against infrared detectors based on 3D modeling, which is applied to a real car.
1 code implementation • 2 Apr 2024 • Kai Li, Guo Chen, Runxuan Yang, Xiaolin Hu
Existing CNN-based speech separation models face local receptive field limitations and cannot effectively capture long time dependencies.
1 code implementation • CVPR 2024 • Gang Zhang, Junnan Chen, Guohuan Gao, Jianmin Li, Si Liu, Xiaolin Hu
LiDAR-based 3D object detection plays an essential role in autonomous driving.
1 code implementation • 25 Jan 2024 • Samuel Pegg, Kai Li, Xiaolin Hu
TDANet serves as the architectural foundation for the auditory and visual networks within TDFNet, offering an efficient model with fewer parameters.
Ranked #2 on Speech Separation on LRS2
2 code implementations • 31 Oct 2023 • Sunhao Dai, Yuqi Zhou, Liang Pang, Weihao Liu, Xiaolin Hu, Yong liu, Xiao Zhang, Gang Wang, Jun Xu
Surprisingly, our findings indicate that neural retrieval models tend to rank LLM-generated documents higher.
1 code implementation • NeurIPS 2023 • Gang Zhang, Junnan Chen, Guohuan Gao, Jianmin Li, Xiaolin Hu
To reduce computational costs, these methods resort to submanifold sparse convolutions, which prevent the information exchange among spatially disconnected features.
Ranked #2 on 3D Object Detection on Waymo Open Dataset
1 code implementation • 29 Sep 2023 • Samuel Pegg, Kai Li, Xiaolin Hu
This is the first time-frequency domain audio-visual speech separation method to outperform all contemporary time-domain counterparts.
Ranked #2 on Speech Separation on VoxCeleb2
1 code implementation • 16 Aug 2023 • Kai Li, Runxuan Yang, Fuchun Sun, Xiaolin Hu
Recent research has made significant progress in designing fusion modules for audio-visual speech separation.
Ranked #1 on Speech Separation on LRS3
1 code implementation • 5 Aug 2023 • JianFeng Wang, Daniela Massiceti, Xiaolin Hu, Vladimir Pavlovic, Thomas Lukasiewicz
This is useful in a wide range of real-world applications where collecting pixel-wise labels is not feasible in time or cost.
1 code implementation • CVPR 2023 • Zhanhao Hu, Wenda Chu, Xiaopei Zhu, HUI ZHANG, Bo Zhang, Xiaolin Hu
In order to craft natural-looking adversarial clothes that can evade person detectors at multiple viewing angles, we propose adversarial camouflage textures (AdvCaT) that resemble one kind of the typical textures of daily clothes, camouflage textures.
1 code implementation • 31 May 2023 • Héctor Martel, Julius Richter, Kai Li, Xiaolin Hu, Timo Gerkmann
We propose Audio-Visual Lightweight ITerative model (AVLIT), an effective and lightweight neural network that uses Progressive Learning (PL) to perform audio-visual speech separation in noisy environments.
1 code implementation • 28 May 2023 • Zhanhao Hu, Jun Zhu, Bo Zhang, Xiaolin Hu
Recent works found that deep neural networks (DNNs) can be fooled by adversarial examples, which are crafted by adding adversarial noise on clean inputs.
no code implementations • 27 May 2023 • Xiao Li, Hang Chen, Xiaolin Hu
We argue that using adversarially pre-trained backbone networks is essential for enhancing the adversarial robustness of object detectors.
no code implementations • 20 Apr 2023 • Xiaolin Hu
Natural language understanding is one of the most challenging topics in artificial intelligence.
no code implementations • 10 Mar 2023 • Aminul Huq, Weiyi Zhang, Xiaolin Hu
We merge the capabilities of both supervised and unsupervised approaches in our method to generate new adversarial samples which aid in improving model robustness.
2 code implementations • 13 Feb 2023 • Gang Zhang, Ziyi Li, Chufeng Tang, Jianmin Li, Xiaolin Hu
A hallmark of CEDNet is its ability to incorporate high-level features from early stages to guide low-level feature learning in subsequent stages, thereby enhancing the effectiveness of multi-scale feature fusion.
1 code implementation • 31 Jan 2023 • JianFeng Wang, Xiaolin Hu, Thomas Lukasiewicz
In this work, we adjust neural processes (NPs) to the semi-supervised image classification task, resulting in a new method named NP-Match.
1 code implementation • CVPR 2024 • Xiao Li, Wei zhang, Yining Liu, Zhanhao Hu, Bo Zhang, Xiaolin Hu
Previous researches mainly focus on improving adversarial robustness in the fully supervised setting, leaving the challenging domain of zero-shot adversarial robustness an open question.
2 code implementations • 21 Dec 2022 • Kai Li, Fenghua Xie, Hang Chen, Kexin Yuan, Xiaolin Hu
Then, inspired by the large number of connections between cortical regions and the thalamus, the model fuses the auditory and visual information in a thalamic subnetwork through top-down connections.
Ranked #3 on Speech Separation on VoxCeleb2
1 code implementation • 4 Dec 2022 • Xiao Li, Ziqi Wang, Bo Zhang, Fuchun Sun, Xiaolin Hu
The first stage of ROCK corresponds to the process of decomposing objects into parts in human vision.
no code implementations • 30 Nov 2022 • Jianjin Xu, Zhaoxiang Zhang, Xiaolin Hu
Second, we train image-to-image translation networks on the synthesized datasets, enabling semantic-conditional image synthesis without human annotations.
no code implementations • 27 Oct 2022 • Yetao Wu, Han Liu, Jie Yan, Xiaolin Hu
After training, the model is used for virtual screening to find potential drugs for Alzheimer's disease (AD) treatment.
1 code implementation • 30 Sep 2022 • Kai Li, Runxuan Yang, Xiaolin Hu
In addition, a large-size version of TDANet obtained SOTA results on three datasets, with MACs still only 10\% of Sepformer and the CPU inference time only 24\% of Sepformer.
Ranked #4 on Speech Separation on WHAM!
1 code implementation • 17 Aug 2022 • Xiao Li, Qiongxiu Li, Zhanhao Hu, Xiaolin Hu
We demonstrate that the generalization gap and privacy leakage are less correlated than those of the previous results.
1 code implementation • CVPR 2023 • Chufeng Tang, Lingxi Xie, Xiaopeng Zhang, Xiaolin Hu, Qi Tian
Humans have the ability of recognizing visual semantics in an unlimited granularity, but existing visual recognition algorithms cannot achieve this goal.
1 code implementation • 23 Jul 2022 • Chufeng Tang, Lingxi Xie, Gang Zhang, Xiaopeng Zhang, Qi Tian, Xiaolin Hu
In this paper, we present an economic active learning setting, named active pointly-supervised instance segmentation (APIS), which starts with box-level annotations and iteratively samples a point within the box and asks if it falls on the object.
1 code implementation • 3 Jul 2022 • JianFeng Wang, Thomas Lukasiewicz, Daniela Massiceti, Xiaolin Hu, Vladimir Pavlovic, Alexandros Neophytou
Semi-supervised learning (SSL) has been widely explored in recent years, and it is an effective way of leveraging unlabeled data to reduce the reliance on labeled data.
no code implementations • 11 Jun 2022 • Han Liu, Bingning Wang, Ting Yao, Haijin Liang, Jianjin Xu, Xiaolin Hu
Large-scale pre-trained language models have achieved great success on natural language generation tasks.
no code implementations • 12 May 2022 • Xiaopei Zhu, Zhanhao Hu, Siyuan Huang, Jianmin Li, Xiaolin Hu
We simulated the process from cloth to clothing in the digital world and then designed the adversarial "QR code" pattern.
1 code implementation • CVPR 2022 • Zhanhao Hu, Siyuan Huang, Xiaopei Zhu, Fuchun Sun, Bo Zhang, Xiaolin Hu
Experiments showed that these clothes could fool person detectors in the physical world.
no code implementations • 7 Mar 2022 • Zhanhao Hu, Tao Wang, Xiaolin Hu
Compared with rate-based artificial neural networks, Spiking Neural Networks (SNN) provide a more biological plausible model for the brain.
1 code implementation • 22 Feb 2022 • Zhen Zhao, Yuqiu Liu, Gang Zhang, Liang Tang, Xiaolin Hu
This report introduces our solution to the iFLYTEK challenge 2021 cultivated land extraction from high-resolution remote sensing image.
no code implementations • CVPR 2022 • Xiaopei Zhu, Zhanhao Hu, Siyuan Huang, Jianmin Li, Xiaolin Hu
We simulated the process from cloth to clothing in the digital world and then designed the adversarial "QR code" pattern.
no code implementations • 30 Oct 2021 • Tianren Zhang, Shangqi Guo, Tian Tan, Xiaolin Hu, Feng Chen
Searching in a large goal space poses difficulty for both high-level subgoal generation and low-level policy learning.
1 code implementation • CVPR 2021 • JianFeng Wang, Thomas Lukasiewicz, Xiaolin Hu, Jianfei Cai, Zhenghua Xu
Imbalanced datasets widely exist in practice and area great challenge for training deep neural models with agood generalization on infrequent classes.
Ranked #19 on Long-tail Learning on Places-LT
1 code implementation • 5 Jun 2021 • JianFeng Wang, Xiaolin Hu
The critical element of RCNN is the recurrent convolutional layer (RCL), which incorporates recurrent connections between neurons in the standard convolutional layer.
1 code implementation • 19 May 2021 • Weiyi Zhang, Shuning Zhao, Le Liu, Jianmin Li, Xingliang Cheng, Thomas Fang Zheng, Xiaolin Hu
In authentication scenarios, applications of practical speaker verification systems usually require a person to read a dynamic authentication text.
Real-World Adversarial Attack Room Impulse Response (RIR) +3
1 code implementation • CVPR 2021 • Gang Zhang, Xin Lu, Jingru Tan, Jianmin Li, Zhaoxiang Zhang, Quanquan Li, Xiaolin Hu
In this work, we propose a new method called RefineMask for high-quality instance segmentation of objects and scenes, which incorporates fine-grained features during the instance-wise segmenting process in a multi-stage manner.
2 code implementations • CVPR 2021 • Chufeng Tang, Hang Chen, Xiao Li, Jianmin Li, Zhaoxiang Zhang, Xiaolin Hu
Tremendous efforts have been made on instance segmentation but the mask quality is still not satisfactory.
1 code implementation • 2 Mar 2021 • Ge Gao, Mikko Lauri, Xiaolin Hu, Jianwei Zhang, Simone Frintrop
In contrast, this domain gap is considerably smaller and easier to fill for depth information.
1 code implementation • 23 Feb 2021 • Xiao Li, Jianmin Li, Ting Dai, Jie Shi, Jun Zhu, Xiaolin Hu
A detection model based on the classification model EfficientNet-B7 achieved a top-1 accuracy of 53. 95%, surpassing previous state-of-the-art classification models trained on ImageNet, suggesting that accurate localization information can significantly boost the performance of classification models on ImageNet-A.
1 code implementation • 12 Feb 2021 • Haoran Chen, Jianmin Li, Simone Frintrop, Xiaolin Hu
We cleaned the MSR-VTT annotations by removing these problems, then tested several typical video captioning models on the cleaned dataset.
2 code implementations • 11 Feb 2021 • Jianjin Xu, Zheyang Xiong, Xiaolin Hu
To ensure temporal inconsistency between the frames of the stylized video, a common approach is to estimate the optic flow of the pixels in the original video and make the generated pixels match the estimated optical flow.
no code implementations • 20 Jan 2021 • Xiaopei Zhu, Xiao Li, Jianmin Li, Zheyao Wang, Xiaolin Hu
By using a combination method, we successfully hide from the visible light and infrared object detection systems at the same time.
no code implementations • ICCV 2021 • Jiaheng Liu, Yudong Wu, Yichao Wu, Chuming Li, Xiaolin Hu, Ding Liang, Mengyu Wang
To estimate the LID of each face image in the verification process, we propose two types of LID Estimation (LIDE) methods, which are reference-based and learning-based estimation methods, respectively.
5 code implementations • CVPR 2021 • Xiang Li, Wenhai Wang, Xiaolin Hu, Jun Li, Jinhui Tang, Jian Yang
Such a property makes the distribution statistics of a bounding box highly correlated to its real localization quality.
Ranked #26 on Object Detection on COCO-O
1 code implementation • NeurIPS 2020 • Tianren Zhang, Shangqi Guo, Tian Tan, Xiaolin Hu, Feng Chen
In this paper, we show that this problem can be effectively alleviated by restricting the high-level action space from the whole goal space to a $k$-step adjacent region of the current state using an adjacency constraint.
7 code implementations • NeurIPS 2020 • Xiang Li, Wenhai Wang, Lijun Wu, Shuo Chen, Xiaolin Hu, Jun Li, Jinhui Tang, Jian Yang
Specifically, we merge the quality estimation into the class prediction vector to form a joint representation of localization quality and classification, and use a vector to represent arbitrary distribution of box locations.
Ranked #105 on Object Detection on COCO test-dev
1 code implementation • 12 Feb 2020 • Zi Yin, Valentin Yiu, Xiaolin Hu, Liang Tang
Face parsing is an important computer vision task that requires accurate pixel segmentation of facial parts (such as eyes, nose, mouth, etc.
1 code implementation • 24 Jan 2020 • Ge Gao, Mikko Lauri, Yulong Wang, Xiaolin Hu, Jianwei Zhang, Simone Frintrop
We use depth information represented by point clouds as the input to both deep networks and geometry-based pose refinement and use separate networks for rotation and translation regression.
1 code implementation • 16 Jan 2020 • Haoran Chen, Jianmin Li, Xiaolin Hu
Video captioning is an advanced multi-modal task which aims to describe a video clip using a natural language sentence.
1 code implementation • ICCV 2019 • Chufeng Tang, Lu Sheng, Zhao-Xiang Zhang, Xiaolin Hu
To predict the existence of a particular attribute, it is demanded to localize the regions related to the attribute.
Ranked #1 on Pedestrian Attribute Recognition on RAP
no code implementations • 7 Oct 2019 • Yulong Wang, Xiaolin Hu, Hang Su
We also apply extracted subnetworks in visual explanation and adversarial example detection tasks by merely replacing the original full model with class-specific subnetworks.
1 code implementation • 27 Sep 2019 • Yulong Wang, Xiaolu Zhang, Lingxi Xie, Jun Zhou, Hang Su, Bo Zhang, Xiaolin Hu
Network pruning is an important research field aiming at reducing computational costs of neural networks.
2 code implementations • 31 Aug 2019 • Haoran Chen, Ke Lin, Alexander Maye, Jianming Li, Xiaolin Hu
Given the features of a video, recurrent neural networks can be used to automatically generate a caption for the video.
3 code implementations • 23 May 2019 • Xiang Li, Xiaolin Hu, Jian Yang
The Convolutional Neural Networks (CNNs) generate the feature representation of complex objects by collecting hierarchical and different parts of semantic sub-features.
Ranked #759 on Image Classification on ImageNet
1 code implementation • ICCV 2019 • Xiao Jin, Baoyun Peng, Yi-Chao Wu, Yu Liu, Jiaheng Liu, Ding Liang, Xiaolin Hu
However, we find that the representation of a converged heavy model is still a strong constraint for training a small student model, which leads to a high lower bound of congruence loss.
20 code implementations • CVPR 2019 • Xiang Li, Wenhai Wang, Xiaolin Hu, Jian Yang
A building block called Selective Kernel (SK) unit is designed, in which multiple branches with different kernel sizes are fused using softmax attention that is guided by the information in these branches.
Ranked #99 on Image Classification on CIFAR-100 (using extra training data)
no code implementations • 15 Mar 2019 • Wei Feng, Wentao Liu, Tong Li, Jing Peng, Chen Qian, Xiaolin Hu
Human-object interactions (HOI) recognition and pose estimation are two closely related tasks.
no code implementations • 28 Feb 2019 • Yingcheng Su, Shunfeng Zhou, Yi-Chao Wu, Tian Su, Ding Liang, Jiaheng Liu, Dixin Zheng, Yingxu Wang, Junjie Yan, Xiaolin Hu
Although deeper and larger neural networks have achieved better performance, the complex network structure and increasing computational cost cannot meet the demands of many resource-constrained applications.
no code implementations • 7 Jun 2018 • Yisu Zhou, Xiaolin Hu, Bo Zhang
It amounts to labeling each pixel with appropriate facial parts such as eyes and nose.
no code implementations • CVPR 2018 • Yulong Wang, Hang Su, Bo Zhang, Xiaolin Hu
Interpretability of a deep neural network aims to explain the rationale behind its decisions and enable the users to understand the intelligent agents, which has become an important issue due to its importance in practical applications.
5 code implementations • CVPR 2018 • Bo Li, Junjie Yan, Wei Wu, Zheng Zhu, Xiaolin Hu
Visual object tracking has been a fundamental topic in recent years and many deep learning based trackers have achieved state-of-the-art performance on multiple benchmarks.
Ranked #7 on Visual Object Tracking on VOT2017/18
1 code implementation • 31 Mar 2018 • Alexey Kurakin, Ian Goodfellow, Samy Bengio, Yinpeng Dong, Fangzhou Liao, Ming Liang, Tianyu Pang, Jun Zhu, Xiaolin Hu, Cihang Xie, Jian-Yu Wang, Zhishuai Zhang, Zhou Ren, Alan Yuille, Sangxia Huang, Yao Zhao, Yuzhe Zhao, Zhonglin Han, Junjiajia Long, Yerkebulan Berdibekov, Takuya Akiba, Seiya Tokui, Motoki Abe
To accelerate research on adversarial examples and robustness of machine learning classifiers, Google Brain organized a NIPS 2017 competition that encouraged researchers to develop new methods to generate adversarial examples as well as to develop new ways to defend against them.
5 code implementations • CVPR 2019 • Xiang Li, Shuo Chen, Xiaolin Hu, Jian Yang
Theoretically, we find that Dropout would shift the variance of a specific neural unit when we transfer the state of that network from train to test.
2 code implementations • 14 Dec 2017 • Jian Wu, Changran Hu, Yulong Wang, Xiaolin Hu, Jun Zhu
In this paper, we present a hierarchical recurrent neural network for melody generation, which consists of three Long-Short-Term-Memory (LSTM) subnetworks working in a coarse-to-fine manner along time.
Sound Multimedia
2 code implementations • CVPR 2018 • Fangzhou Liao, Ming Liang, Yinpeng Dong, Tianyu Pang, Xiaolin Hu, Jun Zhu
First, with HGD as a defense, the target model is more robust to either white-box or black-box adversarial attacks.
1 code implementation • NeurIPS 2017 • Jianfeng Wang, Xiaolin Hu
Its critical component, Gated Recurrent Convolution Layer (GRCL), is constructed by adding a gate to the Recurrent Convolution Layer (RCL), the critical component of RCNN.
11 code implementations • 22 Nov 2017 • Fangzhou Liao, Ming Liang, Zhe Li, Xiaolin Hu, Sen Song
The model consists of two modules.
7 code implementations • CVPR 2018 • Yinpeng Dong, Fangzhou Liao, Tianyu Pang, Hang Su, Jun Zhu, Xiaolin Hu, Jianguo Li
To further improve the success rates for black-box attacks, we apply momentum iterative algorithms to an ensemble of models, and show that the adversarially trained models with a strong defense ability are also vulnerable to our black-box attacks.
no code implementations • CVPR 2017 • Zekun Hao, Yu Liu, Hongwei Qin, Junjie Yan, Xiu Li, Xiaolin Hu
Then the scale histogram guides the zoom-in and zoom-out of the image.
1 code implementation • 13 Feb 2017 • Fangzhou Liao, Xi Chen, Xiaolin Hu, Sen Song
In 2016, Kaggle organized a competition to estimate the volume of LV from MRI images.
no code implementations • 14 Dec 2016 • Yi Zhang, Weichao Qiu, Qi Chen, Xiaolin Hu, Alan Yuille
We generate a large synthetic image dataset with automatically computed hazardous regions and analyze algorithms on these regions.
no code implementations • CVPR 2016 • Hongwei Qin, Junjie Yan, Xiu Li, Xiaolin Hu
Cascade has been widely used in face detection, where classifier with low computation cost can be firstly used to shrink most of the background while keeping the recall.
no code implementations • NeurIPS 2015 • Ming Liang, Xiaolin Hu, Bo Zhang
We adopt a deep recurrent convolutional neural network (RCNN) for this task, which is originally proposed for object recognition.
no code implementations • CVPR 2015 • Ming Liang, Xiaolin Hu
Inspired by this fact, we propose a recurrent CNN (RCNN) for object recognition by incorporating recurrent connections into each convolutional layer.
no code implementations • CVPR 2014 • Tianlin Shi, Liang Ming, Xiaolin Hu
A number of psychological and physiological evidences suggest that early visual attention works in a coarse-to-fine way, which lays a basis for the reverse hierarchy theory (RHT).