Search Results for author: Zuxuan Wu

Found 46 papers, 15 papers with code

Boosting the Transferability of Video Adversarial Examples via Temporal Translation

no code implementations18 Oct 2021 Zhipeng Wei, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang

To this end, we propose to boost the transferability of video adversarial examples for black-box attacks on video recognition models.

Adversarial Attack Translation +1

Self-supervised Learning for Semi-supervised Temporal Language Grounding

no code implementations23 Sep 2021 Fan Luo, Shaoxiang Chen, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang

To achieve good performance with limited annotations, we tackle this task in a semi-supervised way and propose a unified Semi-supervised Temporal Language Grounding (STLG) framework.

Contrastive Learning Self-Supervised Learning

Towards Transferable Adversarial Attacks on Vision Transformers

no code implementations9 Sep 2021 Zhipeng Wei, Jingjing Chen, Micah Goldblum, Zuxuan Wu, Tom Goldstein, Yu-Gang Jiang

The results of these experiments demonstrate that the proposed dual attack can greatly boost transferability between ViTs and from ViTs to CNNs.

A Multimodal Framework for Video Ads Understanding

no code implementations29 Aug 2021 Zejia Weng, Lingchen Meng, Rui Wang, Zuxuan Wu, Yu-Gang Jiang

There is a growing trend in placing video advertisements on social platforms for online marketing, which demands automatic approaches to understand the contents of advertisements effectively.

Optical Character Recognition Scene Segmentation +1

FT-TDR: Frequency-guided Transformer and Top-Down Refinement Network for Blind Face Inpainting

no code implementations10 Aug 2021 Junke Wang, Shaoxiang Chen, Zuxuan Wu, Yu-Gang Jiang

Blind face inpainting refers to the task of reconstructing visual contents without explicitly indicating the corrupted regions in a face image.

Facial Inpainting

Cross-domain Contrastive Learning for Unsupervised Domain Adaptation

no code implementations10 Jun 2021 Rui Wang, Zuxuan Wu, Zejia Weng, Jingjing Chen, Guo-Jun Qi, Yu-Gang Jiang

In addition, we demonstrate that CDCL is a general framework and can be adapted to the data-free setting, where the source data are unavailable during training, with minimal modification.

Contrastive Learning Self-Supervised Learning +1

Rethinking Pseudo Labels for Semi-Supervised Object Detection

no code implementations1 Jun 2021 Hengduo Li, Zuxuan Wu, Abhinav Shrivastava, Larry S. Davis

In this paper, we introduce certainty-aware pseudo labels tailored for object detection, which can effectively estimate the classification and localization quality of derived pseudo labels.

Classification Image Classification +2

VideoLT: Large-scale Long-tailed Video Recognition

1 code implementation ICCV 2021 Xing Zhang, Zuxuan Wu, Zejia Weng, Huazhu Fu, Jingjing Chen, Yu-Gang Jiang, Larry Davis

In this paper, we introduce VideoLT, a large-scale long-tailed video recognition dataset, as a step toward real-world video recognition.

Image Classification Video Recognition

M3DeTR: Multi-representation, Multi-scale, Mutual-relation 3D Object Detection with Transformers

1 code implementation24 Apr 2021 Tianrui Guan, Jun Wang, Shiyi Lan, Rohan Chandra, Zuxuan Wu, Larry Davis, Dinesh Manocha

We present a novel architecture for 3D object detection, M3DeTR, which combines different point cloud representations (raw, voxels, bird-eye view) with different feature scales based on multi-scale feature pyramids.

3D Object Detection

M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection

no code implementations20 Apr 2021 Junke Wang, Zuxuan Wu, Jingjing Chen, Yu-Gang Jiang

This demands effective approaches that can detect perceptually convincing Deepfakes generated by advanced manipulation techniques.

DeepFake Detection Face Swapping +1

HMS: Hierarchical Modality Selection for Efficient Video Recognition

no code implementations20 Apr 2021 Zejia Weng, Zuxuan Wu, Hengduo Li, Yu-Gang Jiang

Conventional video recognition pipelines typically fuse multimodal features for improved performance.

Video Recognition

THAT: Two Head Adversarial Training for Improving Robustness at Scale

no code implementations25 Mar 2021 Zuxuan Wu, Tom Goldstein, Larry S. Davis, Ser-Nam Lim

Many variants of adversarial training have been proposed, with most research focusing on problems with relatively few classes.

Deep Video Inpainting Detection

no code implementations26 Jan 2021 Peng Zhou, Ning Yu, Zuxuan Wu, Larry S. Davis, Abhinav Shrivastava, Ser-Nam Lim

This paper studies video inpainting detection, which localizes an inpainted region in a video both spatially and temporally.

Video Inpainting

GTA: Global Temporal Attention for Video Action Understanding

no code implementations15 Dec 2020 Bo He, Xitong Yang, Zuxuan Wu, Hao Chen, Ser-Nam Lim, Abhinav Shrivastava

To this end, we introduce Global Temporal Attention (GTA), which performs global temporal attention on top of spatial attention in a decoupled manner.

Action Recognition Action Understanding

Intentonomy: a Dataset and Study towards Human Intent Understanding

1 code implementation CVPR 2021 Menglin Jia, Zuxuan Wu, Austin Reiter, Claire Cardie, Serge Belongie, Ser-Nam Lim

Based on our findings, we conduct further study to quantify the effect of attending to object and context classes as well as textual information in the form of hashtags when training an intent classifier.

FLAG: Adversarial Data Augmentation for Graph Neural Networks

2 code implementations19 Oct 2020 Kezhi Kong, Guohao Li, Mucong Ding, Zuxuan Wu, Chen Zhu, Bernard Ghanem, Gavin Taylor, Tom Goldstein

Data augmentation helps neural networks generalize better, but it remains an open question how to effectively augment graph data to enhance the performance of GNNs (Graph Neural Networks).

Data Augmentation Graph Classification

Prepare for the Worst: Generalizing across Domain Shifts with Adversarial Batch Normalization

no code implementations18 Sep 2020 Manli Shu, Zuxuan Wu, Micah Goldblum, Tom Goldstein

Adversarial training is the industry standard for producing models that are robust to small adversarial perturbations.

Semantic Segmentation

Recognizing Instagram Filtered Images with Feature De-stylization

no code implementations30 Dec 2019 Zhe Wu, Zuxuan Wu, Bharat Singh, Larry S. Davis

Deep neural networks have been shown to suffer from poor generalization when small perturbations are added (like Gaussian noise), yet little work has been done to evaluate their robustness to more natural image transformations like photo filters.

Style Transfer

Learning from Noisy Anchors for One-stage Object Detection

1 code implementation CVPR 2020 Hengduo Li, Zuxuan Wu, Chen Zhu, Caiming Xiong, Richard Socher, Larry S. Davis

State-of-the-art object detectors rely on regressing and classifying an extensive list of possible anchors, which are divided into positive and negative samples based on their intersection-over-union (IoU) with corresponding groundtruth objects.

Classification General Classification +1

LiteEval: A Coarse-to-Fine Framework for Resource Efficient Video Recognition

no code implementations NeurIPS 2019 Zuxuan Wu, Caiming Xiong, Yu-Gang Jiang, Larry S. Davis

This paper presents LiteEval, a simple yet effective coarse-to-fine framework for resource efficient video recognition, suitable for both online and offline scenarios.

Video Recognition

ACE: Adapting to Changing Environments for Semantic Segmentation

no code implementations ICCV 2019 Zuxuan Wu, Xin Wang, Joseph E. Gonzalez, Tom Goldstein, Larry S. Davis

However, neural classifiers are often extremely brittle when confronted with domain shift---changes in the input distribution that occur over time.

Meta-Learning Semantic Segmentation

An Analysis of Pre-Training on Object Detection

no code implementations11 Apr 2019 Hengduo Li, Bharat Singh, Mahyar Najibi, Zuxuan Wu, Larry S. Davis

We analyze how well their features generalize to tasks like image classification, semantic segmentation and object detection on small datasets like PASCAL-VOC, Caltech-256, SUN-397, Flowers-102 etc.

Classification General Classification +3

M2KD: Multi-model and Multi-level Knowledge Distillation for Incremental Learning

no code implementations3 Apr 2019 Peng Zhou, Long Mai, Jianming Zhang, Ning Xu, Zuxuan Wu, Larry S. Davis

Instead of sequentially distilling knowledge only from the last model, we directly leverage all previous model snapshots.

Incremental Learning Knowledge Distillation

The Regretful Navigation Agent for Vision-and-Language Navigation

1 code implementation CVPR 2019 (Oral) 2019 Chih-Yao Ma, Zuxuan Wu, Ghassan AlRegib, Caiming Xiong, Zsolt Kira

As deep learning continues to make progress for challenging perception tasks, there is increased interest in combining vision, language, and decision-making.

Decision Making Vision and Language Navigation +2

Compatible and Diverse Fashion Image Inpainting

no code implementations4 Feb 2019 Xintong Han, Zuxuan Wu, Weilin Huang, Matthew R. Scott, Larry S. Davis

The latent representations are jointly optimized with the corresponding generation network to condition the synthesis process, encouraging a diverse set of generated results that are visually compatible with existing fashion garments.

Fashion Synthesis Image Inpainting

DCAN: Dual Channel-wise Alignment Networks for Unsupervised Scene Adaptation

no code implementations ECCV 2018 Zuxuan Wu, Xintong Han, Yen-Liang Lin, Mustafa Gkhan Uzunbas, Tom Goldstein, Ser Nam Lim, Larry S. Davis

In particular, given an image from the source domain and unlabeled samples from the target domain, the generator synthesizes new images on-the-fly to resemble samples from the target domain in appearance and the segmentation network further refines high-level features before predicting semantic maps, both of which leverage feature statistics of sampled images from the target domain.

Semantic Segmentation

VITON: An Image-based Virtual Try-on Network

5 code implementations CVPR 2018 Xintong Han, Zuxuan Wu, Zhe Wu, Ruichi Yu, Larry S. Davis

We present an image-based VIirtual Try-On Network (VITON) without using 3D information in any form, which seamlessly transfers a desired clothing item onto the corresponding region of a person using a coarse-to-fine strategy.

Virtual Try-on

BlockDrop: Dynamic Inference Paths in Residual Networks

1 code implementation CVPR 2018 Zuxuan Wu, Tushar Nagarajan, Abhishek Kumar, Steven Rennie, Larry S. Davis, Kristen Grauman, Rogerio Feris

Very deep convolutional neural networks offer excellent recognition results, yet their computational expense limits their impact for many real-world applications.

Learning Fashion Compatibility with Bidirectional LSTMs

1 code implementation18 Jul 2017 Xintong Han, Zuxuan Wu, Yu-Gang Jiang, Larry S. Davis

To this end, we propose to jointly learn a visual-semantic embedding and the compatibility relationships among fashion items in an end-to-end fashion.

Aggregating Frame-level Features for Large-Scale Video Classification

no code implementations4 Jul 2017 Shaoxiang Chen, Xi Wang, Yongyi Tang, Xinpeng Chen, Zuxuan Wu, Yu-Gang Jiang

This paper introduces the system we developed for the Google Cloud & YouTube-8M Video Understanding Challenge, which can be considered as a multi-label classification problem defined on top of the large scale YouTube-8M Dataset.

Classification General Classification +3

Modeling Multimodal Clues in a Hybrid Deep Learning Framework for Video Classification

no code implementations14 Jun 2017 Yu-Gang Jiang, Zuxuan Wu, Jinhui Tang, Zechao Li, xiangyang xue, Shih-Fu Chang

More specifically, we utilize three Convolutional Neural Networks (CNNs) operating on appearance, motion and audio signals to extract their corresponding features.

General Classification Video Classification

Weakly-Supervised Spatial Context Networks

no code implementations10 Apr 2017 Zuxuan Wu, Larry S. Davis, Leonid Sigal

In particular, we propose spatial context networks that learn to predict a representation of one image patch from another image patch, within the same image, conditioned on their real-valued relative spatial offset.

Deep Learning for Video Classification and Captioning

1 code implementation22 Sep 2016 Zuxuan Wu, Ting Yao, Yanwei Fu, Yu-Gang Jiang

Accelerated by the tremendous increase in Internet bandwidth and storage space, video data has been generated, published and spread explosively, becoming an indispensable part of today's big data.

Classification General Classification +2

Evaluating Two-Stream CNN for Video Classification

no code implementations8 Apr 2015 Hao Ye, Zuxuan Wu, Rui-Wei Zhao, Xi Wang, Yu-Gang Jiang, xiangyang xue

In this paper, we conduct an in-depth study to investigate important implementation options that may affect the performance of deep nets on video classification.

Classification General Classification +1

Modeling Spatial-Temporal Clues in a Hybrid Deep Learning Framework for Video Classification

1 code implementation7 Apr 2015 Zuxuan Wu, Xi Wang, Yu-Gang Jiang, Hao Ye, xiangyang xue

In this paper, we propose a hybrid deep learning framework for video classification, which is able to model static spatial information, short-term motion, as well as long-term temporal clues in the videos.

Classification General Classification +1

Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks

no code implementations25 Feb 2015 Yu-Gang Jiang, Zuxuan Wu, Jun Wang, xiangyang xue, Shih-Fu Chang

In this paper, we study the challenging problem of categorizing videos according to high-level semantics such as the existence of a particular human action or a complex event.

Cannot find the paper you are looking for? You can Submit a new open access paper.