1 code implementation • 25 Apr 2022 • Xiao Tan, Jingbo Gao, Ruolin Li
As deep learning applications, especially programs of computer vision, are increasingly deployed in our lives, we have to think more urgently about the security of these applications. One effective way to improve the security of deep learning models is to perform adversarial training, which allows the model to be compatible with samples that are deliberately created for use in attacking the model. Based on this, we propose a simple architecture to build a model with a certain degree of robustness, which improves the robustness of the trained network by adding an adversarial sample detection network for cooperative training. At the same time, we design a new data sampling strategy that incorporates multiple existing attacks, allowing the model to adapt to many different adversarial attacks with a single training. We conducted some experiments to test the effectiveness of this design based on Cifar10 dataset, and the results indicate that it has some degree of positive effect on the robustness of the model. Our code could be found at https://github. com/dowdyboy/simple_structure_for_robust_model.
no code implementations • 16 Apr 2022 • Shi Gong, Xiaoqing Ye, Xiao Tan, Jingdong Wang, Errui Ding, Yu Zhou, Xiang Bai
Birds-eye-view (BEV) semantic segmentation is critical for autonomous driving for its powerful spatial representation ability.
1 code implementation • 8 Apr 2022 • Lin Chen, Huaian Chen, Zhixiang Wei, Xin Jin, Xiao Tan, Yi Jin, Enhong Chen
Such NWD can be coupled with the classifier to serve as a discriminator satisfying the K-Lipschitz constraint without the requirements of additional weight clipping or gradient penalty strategy.
no code implementations • 25 Mar 2022 • Xiaoqing Ye, Mao Shu, Hanyu Li, Yifeng Shi, YingYing Li, Guangjie Wang, Xiao Tan, Errui Ding
On the other hand, the data captured from roadside cameras have strengths over frontal-view data, which is believed to facilitate a safer and more intelligent autonomous driving system.
no code implementations • ICCV 2021 • Zhikang Zou, Xiaoqing Ye, Liang Du, Xianhui Cheng, Xiao Tan, Li Zhang, Jianfeng Feng, xiangyang xue, Errui Ding
Low-cost monocular 3D object detection plays a fundamental role in autonomous driving, whereas its accuracy is still far from satisfactory.
1 code implementation • 3 Dec 2021 • Zheyuan Zhou, Liang Du, Xiaoqing Ye, Zhikang Zou, Xiao Tan, Li Zhang, xiangyang xue, Jianfeng Feng
Monocular 3D object detection aims to predict the object location, dimension and orientation in 3D space alongside the object category given only a monocular image.
no code implementations • 9 Aug 2021 • Jie Wu, Wei zhang, Guanbin Li, Wenhao Wu, Xiao Tan, YingYing Li, Errui Ding, Liang Lin
In this paper, we introduce a novel task, referred to as Weakly-Supervised Spatio-Temporal Anomaly Detection (WSSTAD) in surveillance video.
1 code implementation • 25 May 2021 • Wenhao Wu, Yuxiang Zhao, Yanwu Xu, Xiao Tan, Dongliang He, Zhikang Zou, Jin Ye, YingYing Li, Mingde Yao, ZiChao Dong, Yifeng Shi
Long-range and short-range temporal modeling are two complementary and crucial aspects of video recognition.
1 code implementation • 9 May 2021 • Yuxiang Zhao, Wenhao Wu, Yue He, YingYing Li, Xiao Tan, Shifeng Chen
In this paper, we propose a straightforward and efficient framework that includes pre-processing, a dynamic track module, and post-processing.
no code implementations • 30 Apr 2021 • Xiao Tan, Dimos V. Dimarogonas
We then provide analytical results on how a certain parameter in the original quadratic program should be chosen to remove the undesired equilibrium points or to confine them in a small neighborhood of the origin.
no code implementations • 31 Mar 2021 • Xiao Tan, Wenceslao Shaw Cortez, Dimos V. Dimarogonas
Furthermore, the proposed formulation accounts for "performance-critical" control: it guarantees that a subset of the forward invariant set will admit any existing, bounded control law, while still ensuring forward invariance of the set.
no code implementations • ICCV 2021 • Zhi Chen, Xiaoqing Ye, Wei Yang, Zhenbo Xu, Xiao Tan, Zhikang Zou, Errui Ding, Xinming Zhang, Liusheng Huang
Second, we introduce an occlusion-aware distillation (OA Distillation) module, which leverages the predicted depths from StereoNet in non-occluded regions to train our monocular depth estimation network named SingleNet.
1 code implementation • 14 Dec 2020 • Xuanmeng Zhang, Minyue Jiang, Zhedong Zheng, Xiao Tan, Errui Ding, Yi Yang
We argue that the first phase equals building the k-nearest neighbor graph, while the second phase can be viewed as spreading the message within the graph.
no code implementations • 25 Oct 2020 • Mingyang Qian, Yi Fu, Xiao Tan, YingYing Li, Jinqing Qi, Huchuan Lu, Shilei Wen, Errui Ding
Video segmentation approaches are of great importance for numerous vision tasks especially in video manipulation for entertainment.
1 code implementation • NeurIPS 2020 • Di Hu, Rui Qian, Minyue Jiang, Xiao Tan, Shilei Wen, Errui Ding, Weiyao Lin, Dejing Dou
First, we propose to learn robust object representations by aggregating the candidate sound localization results in the single source scenes.
1 code implementation • 18 Sep 2020 • Chaofeng Chen, Xiao Tan, Kwan-Yee K. Wong
We utilize a fully convolutional neural network (FCNN) to create the content image, and propose a style transfer approach to introduce textures and shadings based on a newly proposed pyramid column feature.
1 code implementation • 3 Jul 2020 • Zhenbo Xu, Wei zhang, Xiao Tan, Wei Yang, Xiangbo Su, Yuchen Yuan, Hongwu Zhang, Shilei Wen, Errui Ding, Liusheng Huang
In this work, we present PointTrack++, an effective on-line framework for MOTS, which remarkably extends our recently proposed PointTrack framework.
1 code implementation • ECCV 2020 • Zhenbo Xu, Wei zhang, Xiao Tan, Wei Yang, Huan Huang, Shilei Wen, Errui Ding, Liusheng Huang
The resulting online MOTS framework, named PointTrack, surpasses all the state-of-the-art methods including 3D tracking methods by large margins (5. 4% higher MOTSA and 18 times faster over MOTSFusion) with the near real-time speed (22 FPS).
Multi-Object Tracking
Multi-Object Tracking and Segmentation
+1
no code implementations • CVPR 2020 • Liang Du, Xiaoqing Ye, Xiao Tan, Jianfeng Feng, Zhenbo Xu, Errui Ding, Shilei Wen
Object detection from 3D point clouds remains a challenging task, though recent studies pushed the envelope with the deep learning techniques.
1 code implementation • 1 Mar 2020 • Zhenbo Xu, Wei zhang, Xiaoqing Ye, Xiao Tan, Wei Yang, Shilei Wen, Errui Ding, Ajin Meng, Liusheng Huang
The pipeline of ZoomNet begins with an ordinary 2D object detection model which is used to obtain pairs of left-right bounding boxes.
no code implementations • 9 Feb 2020 • Wenhao Wu, Dongliang He, Xiao Tan, Shifeng Chen, Yi Yang, Shilei Wen
In a nutshell, we treat input frames and network depth of the computational graph as a 2-dimensional grid, and several checkpoints are placed on this grid in advance with a prediction module.
1 code implementation • ICCV 2019 • Zhaoyi Yan, Yuchen Yuan, WangMeng Zuo, Xiao Tan, Yezhen Wang, Shilei Wen, Errui Ding
In this paper, we propose a novel perspective-guided convolution (PGC) for convolutional neural network (CNN) based crowd counting (i. e. PGCNet), which aims to overcome the dramatic intra-scene scale variations of people due to the perspective effect.
1 code implementation • ICCV 2019 • Xiangyun Zhao, Yi Yang, Feng Zhou, Xiao Tan, Yuchen Yuan, Yingze Bao, Ying Wu
Although great progress has been made to apply object-level recognition, recognizing the attributes of parts remains less applicable since the training data for part attributes recognition is usually scarce especially for internet-scale applications.
no code implementations • ICCV 2019 • Wenhao Wu, Dongliang He, Xiao Tan, Shifeng Chen, Shilei Wen
Video Recognition has drawn great research interest and great progress has been made.
1 code implementation • 12 Dec 2018 • Chaofeng Chen, Wei Liu, Xiao Tan, Kwan-Yee K. Wong
Instead of supervising the network with ground truth sketches, we first perform patch matching in feature space between the input photo and photos in a small reference set of photo-sketch pairs.
Ranked #1 on
Face Sketch Synthesis
on CUHK
no code implementations • ECCV 2018 • Chen Zhu, Xiao Tan, Feng Zhou, Xiao Liu, Kaiyu Yue, Errui Ding, Yi Ma
Specifically, it firstly summarizes the video by weight-summing all feature vectors in the feature maps of selected frames with a spatio-temporal soft attention, and then predicts which channels to suppress or to enhance according to this summary with a learned non-linear transform.
2 code implementations • 19 Oct 2018 • Yaming Wang, Xiao Tan, Yi Yang, Ziyu Li, Xiao Liu, Feng Zhou, Larry S. Davis
Existing 3D pose datasets of object categories are limited to generic object types and lack of fine-grained information.
2 code implementations • 12 Jun 2018 • Yaming Wang, Xiao Tan, Yi Yang, Xiao Liu, Errui Ding, Feng Zhou, Larry S. Davis
The new dataset is available at www. umiacs. umd. edu/~wym/3dpose. html
no code implementations • CVPR 2014 • Xiao Tan, Changming Sun, Tuan D. Pham
By using the hybrid of the local polynomial model and color/intensity based range guidance, the proposed method not only preserves edges but also does a much better job in preserving spatial variation than existing popular filtering methods.