no code implementations • 28 Oct 2022 • Jongwoo Park, Kumara Kahatapitiya, Donghyun Kim, Shivchander Sudalairaj, Quanfu Fan, Michael S. Ryoo
In this paper, we present a simple and efficient add-on component (termed GrafT) that considers global dependencies and multi-scale information throughout the network, in both high- and low-resolution features alike.
no code implementations • 3 Aug 2022 • Tim Frommknecht, Pedro Alves Zipf, Quanfu Fan, Nina Shvetsova, Hilde Kuehne
As the accuracy for ImageNet and similar datasets increased over time, the performance on tasks beyond the classification of natural images is yet to be explored.
1 code implementation • 13 Jun 2022 • Gaoyuan Zhang, Songtao Lu, Yihua Zhang, Xiangyi Chen, Pin-Yu Chen, Quanfu Fan, Lee Martie, Lior Horesh, Mingyi Hong, Sijia Liu
Spurred by that, we propose distributed adversarial training (DAT), a large-batch adversarial training framework implemented over multiple machines.
no code implementations • 25 Apr 2022 • Quanfu Fan, Donghyun Kim, Chun-Fu, Chen, Stan Sclaroff, Kate Saenko, Sarah Adel Bargal
In this paper, we provide a deep analysis of temporal modeling for action recognition, an important but underexplored problem in the literature.
no code implementations • 15 Apr 2022 • Quanfu Fan, Yilai Li, Yuguang Yao, John Cohn, Sijia Liu, Seychelle M. Vos, Michael A. Cianfrocco
Single-particle cryo-electron microscopy (cryo-EM) has become one of the mainstream structural biology techniques because of its ability to determine high-resolution structures of dynamic bio-molecules.
no code implementations • 29 Sep 2021 • Quanfu Fan, Kaidi Xu, Chun-Fu Chen, Sijia Liu, Gaoyuan Zhang, David Daniel Cox, Xue Lin
Physical adversarial attacks apply carefully crafted adversarial perturbations onto real objects to maliciously alter the prediction of object classifiers or detectors.
1 code implementation • ICLR 2022 • Quanfu Fan, Chun-Fu, Chen, Rameswar Panda
We explore a new perspective on video understanding by casting the video recognition problem as an image recognition task.
3 code implementations • ICLR 2022 • Chun-Fu Chen, Rameswar Panda, Quanfu Fan
The regional-to-local attention includes two steps: first, the regional self-attention extract global information among all regional tokens and then the local self-attention exchanges the information among one regional token and the associated local tokens via self-attention.
1 code implementation • ICCV 2021 • Rameswar Panda, Chun-Fu Chen, Quanfu Fan, Ximeng Sun, Kate Saenko, Aude Oliva, Rogerio Feris
Specifically, given a video segment, a multi-modal policy network is used to decide what modalities should be used for processing by the recognition model, with the goal of improving both accuracy and efficiency.
8 code implementations • ICCV 2021 • Chun-Fu Chen, Quanfu Fan, Rameswar Panda
To this end, we propose a dual-branch transformer to combine image patches (i. e., tokens in a transformer) of different sizes to produce stronger image features.
Ranked #398 on
Image Classification
on ImageNet
1 code implementation • ICLR 2021 • Shashank Srikant, Sijia Liu, Tamara Mitrovska, Shiyu Chang, Quanfu Fan, Gaoyuan Zhang, Una-May O'Reilly
We further show that our formulation is better at training models that are robust to adversarial attacks.
1 code implementation • CVPR 2021 • Chun-Fu Chen, Rameswar Panda, Kandan Ramakrishnan, Rogerio Feris, John Cohn, Aude Oliva, Quanfu Fan
In recent years, a number of approaches based on 2D or 3D convolutional neural networks (CNN) have emerged for video action recognition, achieving state-of-the-art results on several large-scale benchmark datasets.
1 code implementation • NeurIPS 2019 • Quanfu Fan, Chun-Fu Chen, Hilde Kuehne, Marco Pistoia, David Cox
Current state-of-the-art models for video action recognition are mostly based on expensive 3D ConvNets.
Ranked #72 on
Action Recognition
on Something-Something V2
(using extra training data)
2 code implementations • 1 Nov 2019 • Mathew Monfort, Bowen Pan, Kandan Ramakrishnan, Alex Andonian, Barry A McNamara, Alex Lascelles, Quanfu Fan, Dan Gutfreund, Rogerio Feris, Aude Oliva
Videos capture events that typically contain multiple sequential, and simultaneous, actions even in the span of only a few seconds.
1 code implementation • ECCV 2020 • Kaidi Xu, Gaoyuan Zhang, Sijia Liu, Quanfu Fan, Mengshu Sun, Hongge Chen, Pin-Yu Chen, Yanzhi Wang, Xue Lin
To the best of our knowledge, this is the first work that models the effect of deformation for designing physical adversarial examples with respect to-rigid objects such as T-shirts.
no code implementations • ICCV 2019 • Tete Xiao, Quanfu Fan, Dan Gutfreund, Mathew Monfort, Aude Oliva, Bolei Zhou
The model not only finds when an action is happening and which object is being manipulated, but also identifies which part of the object is being interacted with.
no code implementations • 3 Apr 2019 • Kaidi Xu, Sijia Liu, Gaoyuan Zhang, Mengshu Sun, Pu Zhao, Quanfu Fan, Chuang Gan, Xue Lin
It is widely known that convolutional neural networks (CNNs) are vulnerable to adversarial examples: images with imperceptible perturbations crafted to fool classifiers.
no code implementations • 7 Aug 2018 • Chun-Fu Chen, Quanfu Fan, Marco Pistoia, Gwo Giun Lee
We propose a new method to create compact convolutional neural networks (CNNs) by exploiting sparse convolutions.
1 code implementation • ICLR 2019 • Kaidi Xu, Sijia Liu, Pu Zhao, Pin-Yu Chen, huan zhang, Quanfu Fan, Deniz Erdogmus, Yanzhi Wang, Xue Lin
When generating adversarial examples to attack deep neural networks (DNNs), Lp norm of the added perturbation is usually used to measure the similarity between original image and adversarial example.
3 code implementations • ICLR 2019 • Chun-Fu Chen, Quanfu Fan, Neil Mallinar, Tom Sercu, Rogerio Feris
The proposed approach demonstrates improvement of model efficiency and performance on both object recognition and speech recognition tasks, using popular architectures including ResNet and ResNeXt.
4 code implementations • 9 Jan 2018 • Mathew Monfort, Alex Andonian, Bolei Zhou, Kandan Ramakrishnan, Sarah Adel Bargal, Tom Yan, Lisa Brown, Quanfu Fan, Dan Gutfruend, Carl Vondrick, Aude Oliva
We present the Moments in Time Dataset, a large-scale human-annotated collection of one million short videos corresponding to dynamic events unfolding within three seconds.
no code implementations • ICLR 2018 • Chun-Fu (Richard) Chen, Jinwook Oh, Quanfu Fan, Marco Pistoia, Gwo Giun (Chris) Lee
By simply replacing the convolution of a CNN with our sparse-complementary convolution, at the same FLOPs and parameters, we can improve top-1 accuracy on ImageNet by 0. 33% and 0. 18% for ResNet-101 and ResNet-152, respectively.
1 code implementation • 25 Jul 2016 • Zhaowei Cai, Quanfu Fan, Rogerio S. Feris, Nuno Vasconcelos
A unified deep neural network, denoted the multi-scale CNN (MS-CNN), is proposed for fast multi-scale object detection.
Ranked #20 on
Face Detection
on WIDER Face (Hard)
no code implementations • CVPR 2014 • Jiyan Yang, Vikas Sindhwani, Quanfu Fan, Haim Avron, Michael W. Mahoney
With the goal of accelerating the training and testing complexity of nonlinear kernel methods, several recent papers have proposed explicit embeddings of the input data into low-dimensional feature spaces, where fast linear methods can instead be used to generate approximate solutions.
no code implementations • CVPR 2014 • Yu Cheng, Quanfu Fan, Sharath Pankanti, Alok Choudhary
Based on this idea, we represent a video by a sequence of visual words learnt from the video, and apply the Sequence Memoizer [21] to capture long-range dependencies in a temporal context in the visual sequence.