Search Results for author: Boqing Gong

Found 92 papers, 42 papers with code

Spatiotemporal Contrastive Video Representation Learning

4 code implementations • CVPR 2021 • Rui Qian, Tianjian Meng, Boqing Gong, Ming-Hsuan Yang, Huisheng Wang, Serge Belongie, Yin Cui

Our representations are learned using a contrastive loss, where two augmented clips from the same short video are pulled together in the embedding space, while clips from different videos are pushed away.

Ranked #1 on Self-Supervised Action Recognition on Kinetics-600

Contrastive Learning Data Augmentation +4

76,591

Paper
Code

MoViNets: Mobile Video Networks for Efficient Video Recognition

3 code implementations • CVPR 2021 • Dan Kondratyuk, Liangzhe Yuan, Yandong Li, Li Zhang, Mingxing Tan, Matthew Brown, Boqing Gong

We present Mobile Video Networks (MoViNets), a family of computation and memory efficient video networks that can operate on streaming video for online inference.

Ranked #3 on Action Classification on Charades

Action Classification Action Recognition +4

76,589

Paper
Code

Contextualized Spatio-Temporal Contrastive Learning with Self-Supervision

1 code implementation • CVPR 2022 • Liangzhe Yuan, Rui Qian, Yin Cui, Boqing Gong, Florian Schroff, Ming-Hsuan Yang, Hartwig Adam, Ting Liu

Modern self-supervised learning algorithms typically enforce persistency of instance representations across views.

Action Recognition Contrastive Learning +4

76,589

Paper
Code

VideoGLUE: Video General Understanding Evaluation of Foundation Models

1 code implementation • 6 Jul 2023 • Liangzhe Yuan, Nitesh Bharadwaj Gundavarapu, Long Zhao, Hao Zhou, Yin Cui, Lu Jiang, Xuan Yang, Menglin Jia, Tobias Weyand, Luke Friedman, Mikhail Sirotenko, Huisheng Wang, Florian Schroff, Hartwig Adam, Ming-Hsuan Yang, Ting Liu, Boqing Gong

We evaluate existing foundation models video understanding capabilities using a carefully designed experiment protocol consisting of three hallmark tasks (action recognition, temporal localization, and spatiotemporal localization), eight datasets well received by the community, and four adaptation methods tailoring a foundation model (FM) for a downstream task.

Action Recognition Temporal Localization +1

76,589

Paper
Code

Ranking Neural Checkpoints

1 code implementation • CVPR 2021 • Yandong Li, Xuhui Jia, Ruoxin Sang, Yukun Zhu, Bradley Green, Liqiang Wang, Boqing Gong

This paper is concerned with ranking many pre-trained deep neural networks (DNNs), called checkpoints, for the transfer learning to a downstream task.

Ranked #6 on Transferability on classification benchmark

Transferability Transfer Learning

32,798

Paper
Code

Class-Balanced Distillation for Long-Tailed Visual Recognition

3 code implementations • 12 Apr 2021 • Ahmet Iscen, André Araujo, Boqing Gong, Cordelia Schmid

An effective and simple approach to long-tailed visual recognition is to learn feature representations and a classifier separately, with instance and class-balanced sampling, respectively.

Ranked #11 on Long-tail Learning on iNaturalist 2018

Image Classification Knowledge Distillation +1

32,798

Paper
Code

Video Timeline Modeling For News Story Understanding

1 code implementation • NeurIPS 2023 • Meng Liu, Mingda Zhang, Jialu Liu, Hanjun Dai, Ming-Hsuan Yang, Shuiwang Ji, Zheyun Feng, Boqing Gong

In this paper, we present a novel problem, namely video timeline modeling.

32,798

Paper
Code

VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text

2 code implementations • NeurIPS 2021 • Hassan Akbari, Liangzhe Yuan, Rui Qian, Wei-Hong Chuang, Shih-Fu Chang, Yin Cui, Boqing Gong

We train VATT end-to-end from scratch using multimodal contrastive losses and evaluate its performance by the downstream tasks of video action recognition, audio event classification, image classification, and text-to-video retrieval.

Ranked #3 on Zero-Shot Video Retrieval on YouCook2 (text-to-video Mean Rank metric)

Action Classification Action Recognition In Videos +9

32,792

Paper
Code

Adversarial Examples Improve Image Recognition

6 code implementations • CVPR 2020 • Cihang Xie, Mingxing Tan, Boqing Gong, Jiang Wang, Alan Yuille, Quoc V. Le

We show that AdvProp improves a wide range of models on various image recognition tasks and performs better when the models are bigger.

Ranked #4 on Domain Generalization on VizWiz-Classification

Domain Generalization Image Classification

29,735

Paper
Code

When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations

2 code implementations • ICLR 2022 • Xiangning Chen, Cho-Jui Hsieh, Boqing Gong

Vision Transformers (ViTs) and MLPs signal further efforts on replacing hand-wired features or inductive biases with general-purpose neural architectures.

Ranked #6 on Fine-Grained Image Classification on Oxford-IIIT Pets

Domain Generalization Fine-Grained Image Classification +1

9,246

Paper
Code

Surrogate Gap Minimization Improves Sharpness-Aware Training

1 code implementation • ICLR 2022 • Juntang Zhuang, Boqing Gong, Liangzhe Yuan, Yin Cui, Hartwig Adam, Nicha Dvornek, Sekhar Tatikonda, James Duncan, Ting Liu

Instead, we define a \textit{surrogate gap}, a measure equivalent to the dominant eigenvalue of Hessian at a local minimum when the radius of the neighborhood (to derive the perturbed loss) is small.

9,246

Paper
Code

Robust and Accurate Object Detection via Adversarial Learning

1 code implementation • CVPR 2021 • Xiangning Chen, Cihang Xie, Mingxing Tan, Li Zhang, Cho-Jui Hsieh, Boqing Gong

Data augmentation has become a de facto component for training high-performance deep image classifiers, but its potential is under-explored for object detection.

Ranked #17 on Object Detection on COCO-O

AutoML Data Augmentation +3

6,153

Paper
Code

Unified Visual Relationship Detection with Vision and Language Models

1 code implementation • ICCV 2023 • Long Zhao, Liangzhe Yuan, Boqing Gong, Yin Cui, Florian Schroff, Ming-Hsuan Yang, Hartwig Adam, Ting Liu

To address this challenge, we propose UniVRD, a novel bottom-up method for Unified Visual Relationship Detection by leveraging vision and language models (VLMs).

Human-Object Interaction Detection Relationship Detection +2

2,990

Paper
Code

Synthesized Classifiers for Zero-Shot Learning

2 code implementations • CVPR 2016 • Soravit Changpinyo, Wei-Lun Chao, Boqing Gong, Fei Sha

Given semantic descriptions of object classes, zero-shot learning aims to accurately recognize objects of the unseen classes, from which no examples are available at the training stage, by associating them to the seen classes, from which labeled examples are provided.

Ranked #1 on Few-Shot Image Classification on AWA - 0-Shot

Object Zero-Shot Learning

912

Paper
Code

Large-Scale Long-Tailed Recognition in an Open World

2 code implementations • CVPR 2019 • Ziwei Liu, Zhongqi Miao, Xiaohang Zhan, Jiayun Wang, Boqing Gong, Stella X. Yu

We define Open Long-Tailed Recognition (OLTR) as learning from such naturally distributed data and optimizing the classification accuracy over a balanced test set which include head, tail, and open classes.

Ranked #5 on Long-tail learning with class descriptors on ImageNet-LT-d

Classification Few-Shot Learning +4

825

Paper
Code

PolarNet: An Improved Grid Representation for Online LiDAR Point Clouds Semantic Segmentation

4 code implementations • CVPR 2020 • Yang Zhang, Zixiang Zhou, Philip David, Xiangyu Yue, Zerong Xi, Boqing Gong, Hassan Foroosh

The need for fine-grained perception in autonomous driving systems has resulted in recently increased research on online semantic segmentation of single-scan LiDAR.

Ranked #11 on Robust 3D Semantic Segmentation on nuScenes-C

Autonomous Driving LIDAR Semantic Segmentation +2

361

Paper
Code

End-to-End Learning of Motion Representation for Video Understanding

1 code implementation • CVPR 2018 • Lijie Fan, Wenbing Huang, Chuang Gan, Stefano Ermon, Boqing Gong, Junzhou Huang

Despite the recent success of end-to-end learned representations, hand-crafted optical flow features are still widely used in video analysis tasks.

Ranked #42 on Action Recognition on UCF101

Action Recognition Optical Flow Estimation +1

290

Paper
Code

A Fast and Accurate One-Stage Approach to Visual Grounding

2 code implementations • ICCV 2019 • Zhengyuan Yang, Boqing Gong, Li-Wei Wang, Wenbing Huang, Dong Yu, Jiebo Luo

We propose a simple, fast, and accurate one-stage approach to visual grounding, inspired by the following insight.

Referring Expression Referring Expression Comprehension +1

141

Paper
Code

Curriculum Domain Adaptation for Semantic Segmentation of Urban Scenes

1 code implementation • ICCV 2017 • Yang Zhang, Philip David, Boqing Gong

Hence, we propose a curriculum-style learning approach to minimize the domain gap in urban scenery semantic segmentation.

Ranked #27 on Image-to-Image Translation on SYNTHIA-to-Cityscapes

Autonomous Driving Domain Adaptation +4

128

Paper
Code

A Curriculum Domain Adaptation Approach to the Semantic Segmentation of Urban Scenes

2 code implementations • 24 Dec 2018 • Yang Zhang, Philip David, Hassan Foroosh, Boqing Gong

Hence, we propose a curriculum-style learning approach to minimizing the domain gap in urban scene semantic segmentation.

Ranked #26 on Image-to-Image Translation on SYNTHIA-to-Cityscapes

Autonomous Driving Domain Adaptation +4

128

Paper
Code

Constructing Self-motivated Pyramid Curriculums for Cross-Domain Semantic Segmentation: A Non-Adversarial Approach

1 code implementation • ICCV 2019 • Qing Lian, Fengmao Lv, Lixin Duan, Boqing Gong

We propose a new approach, called self-motivated pyramid curriculum domain adaptation (PyCDA), to facilitate the adaptation of semantic segmentation neural networks from synthetic source domains to real target domains.

Ranked #14 on Image-to-Image Translation on SYNTHIA-to-Cityscapes

Segmentation Semantic Segmentation +2

Paper
Code

Smooth Adversarial Training

1 code implementation • 25 Jun 2020 • Cihang Xie, Mingxing Tan, Boqing Gong, Alan Yuille, Quoc V. Le

SAT also works well with larger networks: it helps EfficientNet-L1 to achieve 82. 2% accuracy and 58. 6% robustness on ImageNet, outperforming the previous state-of-the-art defense by 9. 5% for accuracy and 11. 6% for robustness.

Ranked #1 on Adversarial Defense on ImageNet (non-targeted PGD, max perturbation=4)

Adversarial Defense Adversarial Robustness

Paper
Code

DHER: Hindsight Experience Replay for Dynamic Goals

1 code implementation • ICLR 2019 • Meng Fang, Cheng Zhou, Bei Shi, Boqing Gong, Jia Xu, Tong Zhang

Dealing with sparse rewards is one of the most important challenges in reinforcement learning (RL), especially when a goal is dynamic (e. g., to grasp a moving object).

Object Tracking Reinforcement Learning (RL)

Paper
Code

Improving the Improved Training of Wasserstein GANs: A Consistency Term and Its Dual Effect

1 code implementation • ICLR 2018 • Xiang Wei, Boqing Gong, Zixia Liu, Wei Lu, Liqiang Wang

Despite being impactful on a variety of problems and applications, the generative adversarial nets (GANs) are remarkably difficult to train.

Paper
Code

NATTACK: Learning the Distributions of Adversarial Examples for an Improved Black-Box Attack on Deep Neural Networks

1 code implementation • 1 May 2019 • Yandong Li, Lijun Li, Liqiang Wang, Tong Zhang, Boqing Gong

Powerful adversarial attack methods are vital for understanding how to construct robust deep neural networks (DNNs) and for thoroughly testing defense techniques.

Adversarial Attack

Paper
Code

An Empirical Study and Analysis of Generalized Zero-Shot Learning for Object Recognition in the Wild

1 code implementation • 13 May 2016 • Wei-Lun Chao, Soravit Changpinyo, Boqing Gong, Fei Sha

Zero-shot learning (ZSL) methods have been studied in the unrealistic setting where test data are assumed to come from unseen classes only.

Few-Shot Learning Generalized Zero-Shot Learning +1

Paper
Code

Classifier and Exemplar Synthesis for Zero-Shot Learning

1 code implementation • 16 Dec 2018 • Soravit Changpinyo, Wei-Lun Chao, Boqing Gong, Fei Sha

Zero-shot learning (ZSL) enables solving a task without the need to see its examples.

Denoising Zero-Shot Learning

Paper
Code

On Model Calibration for Long-Tailed Object Detection and Instance Segmentation

1 code implementation • NeurIPS 2021 • Tai-Yu Pan, Cheng Zhang, Yandong Li, Hexiang Hu, Dong Xuan, Soravit Changpinyo, Boqing Gong, Wei-Lun Chao

We propose NorCal, Normalized Calibration for long-tailed object detection and instance segmentation, a simple and straightforward recipe that reweighs the predicted scores of each class by its training sample size.

Instance Segmentation Long-tailed Object Detection +4

Paper
Code

End-to-End Video Captioning with Multitask Reinforcement Learning

1 code implementation • 21 Mar 2018 • Lijun Li, Boqing Gong

Although end-to-end (E2E) learning has led to impressive progress on a variety of visual understanding tasks, it is often impeded by hardware constraints (e. g., GPU memory) and is prone to overfitting.

reinforcement-learning Reinforcement Learning (RL) +2

Paper
Code

MosaicOS: A Simple and Effective Use of Object-Centric Images for Long-Tailed Object Detection

1 code implementation • ICCV 2021 • Cheng Zhang, Tai-Yu Pan, Yandong Li, Hexiang Hu, Dong Xuan, Soravit Changpinyo, Boqing Gong, Wei-Lun Chao

Many objects do not appear frequently enough in complex scenes (e. g., certain handbags in living rooms) for training an accurate object detector, but are often found frequently by themselves (e. g., in product images).

Imputation Instance Segmentation +5

Paper
Code

Domain Generalization with Adversarial Intensity Attack for Medical Image Segmentation

1 code implementation • 5 Apr 2023 • Zheyuan Zhang, Bin Wang, Lanhong Yao, Ugur Demir, Debesh Jha, Ismail Baris Turkbey, Boqing Gong, Ulas Bagci

In real-world scenarios, however, it is common for models to encounter data from new and different domains to which they were not exposed to during training.

Domain Generalization Image Segmentation +2

Paper
Code

MACER: Attack-free and Scalable Robust Training via Maximizing Certified Radius

2 code implementations • ICLR 2020 • Runtian Zhai, Chen Dan, Di He, huan zhang, Boqing Gong, Pradeep Ravikumar, Cho-Jui Hsieh, Li-Wei Wang

Adversarial training is one of the most popular ways to learn robust models but is usually attack-dependent and time costly.

Paper
Code

Rethinking Class-Balanced Methods for Long-Tailed Visual Recognition from a Domain Adaptation Perspective

1 code implementation • CVPR 2020 • Muhammad Abdullah Jamal, Matthew Brown, Ming-Hsuan Yang, Liqiang Wang, Boqing Gong

Object frequency in the real world often follows a power law, leading to a mismatch between datasets with long-tailed class distributions seen by a machine learning model and our expectation of the model to perform well on all classes.

Ranked #27 on Long-tail Learning on Places-LT

Domain Adaptation Long-tail Learning +1

Paper
Code

VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation

1 code implementation • ICCV 2017 • Chuang Gan, Yandong Li, Haoxiang Li, Chen Sun, Boqing Gong

Many seemingly distant annotations (e. g., semantic segmentation and visual question answering (VQA)) are inherently connected in that they reveal different levels and perspectives of human understandings about the same visual scenes --- and even the same set of images (e. g., of COCO).

Language Modelling Multiple-choice +4

Paper
Code

Neural Networks Are More Productive Teachers Than Human Raters: Active Mixup for Data-Efficient Knowledge Distillation from a Blackbox Model

1 code implementation • CVPR 2020 • Dongdong Wang, Yandong Li, Liqiang Wang, Boqing Gong

The other is that the number of images used for the knowledge distillation should be small; otherwise, it violates our expectation of reducing the dependence on large-scale datasets.

Active Learning Knowledge Distillation

Paper
Code

Synthesized Policies for Transfer and Adaptation across Tasks and Environments

2 code implementations • NeurIPS 2018 • Hexiang Hu, Liyu Chen, Boqing Gong, Fei Sha

The ability to transfer in reinforcement learning is key towards building an agent of general artificial intelligence.

Paper
Code

2.5D Visual Relationship Detection

1 code implementation • 26 Apr 2021 • Yu-Chuan Su, Soravit Changpinyo, Xiangning Chen, Sathish Thoppay, Cho-Jui Hsieh, Lior Shapira, Radu Soricut, Hartwig Adam, Matthew Brown, Ming-Hsuan Yang, Boqing Gong

To enable progress on this task, we create a new dataset consisting of 220k human-annotated 2. 5D relationships among 512K objects from 11K images.

Benchmarking Depth Estimation +2

Paper
Code

On Calibrating Semantic Segmentation Models: Analyses and An Algorithm

1 code implementation • CVPR 2023 • Dongdong Wang, Boqing Gong, Liqiang Wang

Then, we study popular existing calibration methods and compare them with selective scaling on semantic segmentation calibration.

Image Classification Segmentation +1

Paper
Code

CAMOU: Learning Physical Vehicle Camouflages to Adversarially Attack Detectors in the Wild

1 code implementation • ICLR 2019 • Yang Zhang, Hassan Foroosh, Philip David, Boqing Gong

In particular, we learn a camouflage pattern to hide vehicles from being detected by state-of-the-art convolutional neural network based detectors.

Adversarial Attack Object

Paper
Code

Look, Listen, and Act: Towards Audio-Visual Embodied Navigation

1 code implementation • 25 Dec 2019 • Chuang Gan, Yiwei Zhang, Jiajun Wu, Boqing Gong, Joshua B. Tenenbaum

In this paper, we attempt to approach the problem of Audio-Visual Embodied Navigation, the task of planning the shortest path from a random starting location in a scene to the sound source in an indoor environment, given only raw egocentric visual and audio sensory data.

Navigate

Paper
Code

Anti-Neuron Watermarking: Protecting Personal Data Against Unauthorized Neural Networks

1 code implementation • 18 Sep 2021 • Zihang Zou, Boqing Gong, Liqiang Wang

We study protecting a user's data (images in this work) against a learner's unauthorized use in training neural networks.

Paper
Code

medXGAN: Visual Explanations for Medical Classifiers through a Generative Latent Space

1 code implementation • 11 Apr 2022 • Amil Dravid, Florian Schiffers, Boqing Gong, Aggelos K. Katsaggelos

Despite the surge of deep learning in the past decade, some users are skeptical to deploy these models in practice due to their black-box nature.

Paper
Code

A Semi-Supervised Two-Stage Approach to Learning from Noisy Labels

no code implementations • 8 Feb 2018 • Yifan Ding, Liqiang Wang, Deliang Fan, Boqing Gong

In the first stage, we identify a small portion of images from the noisy training set of which the labels are correct with a high probability.

Vocal Bursts Valence Prediction

Paper
Add Code

Blind Pre-Processing: A Robust Defense Method Against Adversarial Examples

no code implementations • 5 Feb 2018 • Adnan Siraj Rakin, Zhezhi He, Boqing Gong, Deliang Fan

Blind pre-processing improves the white box attack accuracy of MNIST from 94. 3\% to 98. 7\%.

Adversarial Attack

Paper
Add Code

Infinite-Label Learning with Semantic Output Codes

no code implementations • 23 Aug 2016 • Yang Zhang, Rupam Acharyya, Ji Liu, Boqing Gong

We develop a new statistical machine learning paradigm, named infinite-label learning, to annotate a data point with more than one relevant labels from a candidate set, which pools both the finite labels observed at training and a potentially infinite number of previously unseen labels.

Multi-Label Learning Zero-Shot Learning

Paper
Add Code

Query-Focused Video Summarization: Dataset, Evaluation, and A Memory Network Based Approach

no code implementations • CVPR 2017 • Aidean Sharghi, Jacob S. Laurel, Boqing Gong

However, one of the main obstacles to the research on video summarization is the user subjectivity - users have various preferences over the summaries.

Video Summarization

Paper
Add Code

Improving Facial Attribute Prediction using Semantic Segmentation

no code implementations • CVPR 2017 • Mahdi M. Kalayeh, Boqing Gong, Mubarak Shah

We build our facial attribute prediction model jointly with a deep semantic segmentation network.

Ranked #2 on Facial Attribute Classification on LFWA

Attribute Face Parsing +4

Paper
Add Code

Improved Dropout for Shallow and Deep Learning

no code implementations • NeurIPS 2016 • Zhe Li, Boqing Gong, Tianbao Yang

To exhibit the optimal dropout probabilities, we analyze the shallow learning with multinomial dropout and establish the risk bound for stochastic optimization.

Stochastic Optimization

Paper
Add Code

Query-Focused Extractive Video Summarization

no code implementations • 18 Jul 2016 • Aidean Sharghi, Boqing Gong, Mubarak Shah

The decision to include a shot in the summary depends on the shot's relevance to the user query and importance in the context of the video, jointly.

Video Summarization

Paper
Add Code

Fast Zero-Shot Image Tagging

no code implementations • CVPR 2016 • Yang Zhang, Boqing Gong, Mubarak Shah

The well-known word analogy experiments show that the recent word vectors capture fine-grained linguistic regularities in words by linear vector offsets, but it is unclear how well the simple vector offsets can encode visual regularities over words.

Ranked #5 on Multi-label zero-shot learning on Open Images V4

Multi-label zero-shot learning

Paper
Add Code

Learning Attributes Equals Multi-Source Domain Generalization

no code implementations • CVPR 2016 • Chuang Gan, Tianbao Yang, Boqing Gong

Attributes possess appealing properties and benefit many computer vision problems, such as object recognition, learning with humans in the loop, and image retrieval.

Attribute Domain Generalization +3

Paper
Add Code

Large-Margin Determinantal Point Processes

no code implementations • 6 Nov 2014 • Boqing Gong, Wei-Lun Chao, Kristen Grauman, Fei Sha

Extensive empirical studies validate our contributions, including applications on challenging document and video summarization, where flexibility in modeling the kernel matrix and balancing different errors is indispensable.

Point Processes Video Summarization

Paper
Add Code

How Local is the Local Diversity? Reinforcing Sequential Determinantal Point Processes with Dynamic Ground Sets for Supervised Video Summarization

no code implementations • ECCV 2018 • Yandong Li, Liqiang Wang, Tianbao Yang, Boqing Gong

The large volume of video content and high viewing frequency demand automatic video summarization algorithms, of which a key property is the capability of modeling diversity.

Point Processes Supervised Video Summarization

Paper
Add Code

Defend Deep Neural Networks Against Adversarial Examples via Fixed and Dynamic Quantized Activation Functions

no code implementations • 18 Jul 2018 • Adnan Siraj Rakin, Jin-Feng Yi, Boqing Gong, Deliang Fan

Recent studies have shown that deep neural networks (DNNs) are vulnerable to adversarial attacks.

Quantization

Paper
Add Code

Optimize Deep Convolutional Neural Network with Ternarized Weights and High Accuracy

no code implementations • 20 Jul 2018 • Zhezhi He, Boqing Gong, Deliang Fan

Deep convolution neural network has achieved great success in many artificial intelligence applications.

Model Compression Vocal Bursts Intensity Prediction

Paper
Add Code

Improving Sequential Determinantal Point Processes for Supervised Video Summarization

no code implementations • ECCV 2018 • Aidean Sharghi, Ali Borji, Chengtao Li, Tianbao Yang, Boqing Gong

In terms of modeling, we design a new probabilistic distribution such that, when it is integrated into SeqDPP, the resulting model accepts user input about the expected length of the summary.

Point Processes Supervised Video Summarization

Paper
Add Code

Controllable Image-to-Video Translation: A Case Study on Facial Expression Generation

no code implementations • 9 Aug 2018 • Lijie Fan, Wenbing Huang, Chuang Gan, Junzhou Huang, Boqing Gong

The recent advances in deep learning have made it possible to generate photo-realistic images by using neural networks and even to extrapolate video frames from an input video clip.

Facial expression generation Image-to-Image Translation +2

Paper
Add Code

Synthesize Policies for Transfer and Adaptation across Tasks and Environments

no code implementations • NeurIPS 2018 • Hexiang Hu, Liyu Chen, Boqing Gong, Fei Sha

The ability to transfer in reinforcement learning is key towards building an agent of general artificial intelligence.

Paper
Add Code

Diverse Sequential Subset Selection for Supervised Video Summarization

no code implementations • NeurIPS 2014 • Boqing Gong, Wei-Lun Chao, Kristen Grauman, Fei Sha

Video summarization is a challenging problem with great application potential.

Supervised Video Summarization

Paper
Add Code

Reshaping Visual Datasets for Domain Adaptation

no code implementations • NeurIPS 2013 • Boqing Gong, Kristen Grauman, Fei Sha

By maximum distinctiveness, we require the underlying distributions of the identified domains to be different from each other; by maximum learnability, we ensure that a strong discriminative model can be learned from the domain.

Domain Adaptation Human Activity Recognition +1

Paper
Add Code

Geometry Guided Convolutional Neural Networks for Self-Supervised Video Representation Learning

no code implementations • CVPR 2018 • Chuang Gan, Boqing Gong, Kun Liu, Hao Su, Leonidas J. Guibas

In addition, we also find that a progressive training strategy can foster a better neural network for the video recognition task than blindly pooling the distinct sources of geometry cues together.

Action Recognition Representation Learning +5

Paper
Add Code

Deep Face Detector Adaptation Without Negative Transfer or Catastrophic Forgetting

no code implementations • CVPR 2018 • Muhammad Abdullah Jamal, Haoxiang Li, Boqing Gong

Arguably, no single face detector fits all real-life scenarios.

Domain Adaptation

Paper
Add Code

NATTACK: A STRONG AND UNIVERSAL GAUSSIAN BLACK-BOX ADVERSARIAL ATTACK

no code implementations • ICLR 2019 • Yandong Li, Lijun Li, Liqiang Wang, Tong Zhang, Boqing Gong

In other words, there is a population of adversarial examples, instead of only one, for any input to a DNN.

Adversarial Attack

Paper
Add Code

Joint Modeling of Dense and Incomplete Trajectories for Citywide Traffic Volume Inference

no code implementations • 25 Feb 2019 • Xianfeng Tang, Boqing Gong, Yanwei Yu, Huaxiu Yao, Yandong Li, Haiyong Xie, Xiaoyu Wang

In this paper, we propose a novel framework for the citywide traffic volume inference using both dense GPS trajectories and incomplete trajectories captured by camera surveillance systems.

Graph Embedding

Paper
Add Code

Defending Against Adversarial Attacks Using Random Forests

no code implementations • 16 Jun 2019 • Yifan Ding, Liqiang Wang, huan zhang, Jin-Feng Yi, Deliang Fan, Boqing Gong

As deep neural networks (DNNs) have become increasingly important and popular, the robustness of DNNs is the key to the safety of both the Internet and the physical world.

Paper
Add Code

Domain Randomization and Pyramid Consistency: Simulation-to-Real Generalization without Accessing Target Domain Data

no code implementations • ICCV 2019 • Xiangyu Yue, Yang Zhang, Sicheng Zhao, Alberto Sangiovanni-Vincentelli, Kurt Keutzer, Boqing Gong

To this end, we propose a new approach of domain randomization and pyramid consistency to learn a model with high generalizability.

Ranked #14 on Domain Generalization on GTA-to-Avg(Cityscapes,BDD,Mapillary)

Domain Generalization Semantic Segmentation

Paper
Add Code

Open Compound Domain Adaptation

no code implementations • CVPR 2020 • Ziwei Liu, Zhongqi Miao, Xingang Pan, Xiaohang Zhan, Dahua Lin, Stella X. Yu, Boqing Gong

A typical domain adaptation approach is to adapt models trained on the annotated data in a source domain (e. g., sunny weather) for achieving high performance on the test data in a target domain (e. g., rainy weather).

Domain Adaptation Facial Expression Recognition +2

Paper
Add Code

When Ensembling Smaller Models is More Efficient than Single Large Models

no code implementations • 1 May 2020 • Dan Kondratyuk, Mingxing Tan, Matthew Brown, Boqing Gong

Ensembling is a simple and popular technique for boosting evaluation performance by training multiple models (e. g., with different initializations) and aggregating their predictions.

Paper
Add Code

Complete & Label: A Domain Adaptation Approach to Semantic Segmentation of LiDAR Point Clouds

no code implementations • CVPR 2021 • Li Yi, Boqing Gong, Thomas Funkhouser

We study an unsupervised domain adaptation problem for the semantic labeling of 3D point clouds, with a particular focus on domain discrepancies induced by different LiDAR sensors.

Semantic Segmentation Unsupervised Domain Adaptation

Paper
Add Code

Improving Object Detection with Selective Self-supervised Self-training

no code implementations • ECCV 2020 • Yandong Li, Di Huang, Danfeng Qin, Liqiang Wang, Boqing Gong

They fail to improve object detectors in their vanilla forms due to the domain gap between the Web images and curated datasets.

Image Classification Image Retrieval +4

Paper
Add Code

A Lazy Approach to Long-Horizon Gradient-Based Meta-Learning

no code implementations • ICCV 2021 • Muhammad Abdullah Jamal, Liqiang Wang, Boqing Gong

Gradient-based meta-learning relates task-specific models to a meta-model by gradients.

Few-Shot Learning

Paper
Add Code

Contrastive Learning for Label-Efficient Semantic Segmentation

no code implementations • 13 Dec 2020 • Xiangyun Zhao, Raviteja Vemulapalli, Philip Mansfield, Boqing Gong, Bradley Green, Lior Shapira, Ying Wu

While recent Convolutional Neural Network (CNN) based semantic segmentation approaches have achieved impressive results by using large amounts of labeled training data, their performance drops significantly as the amount of labeled data decreases.

Contrastive Learning Segmentation +1

Paper
Add Code

Large-Scale Meta-Learning with Continual Trajectory Shifting

no code implementations • 14 Feb 2021 • Jaewoong Shin, Hae Beom Lee, Boqing Gong, Sung Ju Hwang

Meta-learning of shared initialization parameters has shown to be highly effective in solving few-shot learning tasks.

Few-Shot Learning Multi-Task Learning

Paper
Add Code

Adversarially Adaptive Normalization for Single Domain Generalization

no code implementations • CVPR 2021 • Xinjie Fan, Qifei Wang, Junjie Ke, Feng Yang, Boqing Gong, Mingyuan Zhou

As a generic tool, the improvement introduced by ASR-Norm is agnostic to the choice of ADA methods.

Domain Generalization

Paper
Add Code

Bridging the Gap Between Object Detection and User Intent via Query-Modulation

no code implementations • 18 Jun 2021 • Marco Fornoni, Chaochao Yan, Liangchen Luo, Kimberly Wilber, Alex Stark, Yin Cui, Boqing Gong, Andrew Howard

When interacting with objects through cameras, or pictures, users often have a specific intent.

Object object-detection +2

Paper
Add Code

Federated Multi-Target Domain Adaptation

no code implementations • 17 Aug 2021 • Chun-Han Yao, Boqing Gong, Yin Cui, Hang Qi, Yukun Zhu, Ming-Hsuan Yang

We further take the server-client and inter-client domain shifts into account and pose a domain adaptation problem with one source (centralized server data) and multiple targets (distributed client data).

Domain Adaptation Federated Learning +3

Paper
Add Code

Contrastive Learning for Label Efficient Semantic Segmentation

no code implementations • ICCV 2021 • Xiangyun Zhao, Raviteja Vemulapalli, Philip Andrew Mansfield, Boqing Gong, Bradley Green, Lior Shapira, Ying Wu

Contrastive Learning Segmentation +1

Paper
Add Code

CrossVQA: Scalably Generating Benchmarks for Systematically Testing VQA Generalization

no code implementations • EMNLP 2021 • Arjun Akula, Soravit Changpinyo, Boqing Gong, Piyush Sharma, Song-Chun Zhu, Radu Soricut

One challenge in evaluating visual question answering (VQA) models in the cross-dataset adaptation setting is that the distribution shifts are multi-modal, making it difficult to identify if it is the shifts in visual or language features that play a key role.

Answer Generation Question-Answer-Generation +2

Paper
Add Code

Exploring Temporal Granularity in Self-Supervised Video Representation Learning

no code implementations • 8 Dec 2021 • Rui Qian, Yeqing Li, Liangzhe Yuan, Boqing Gong, Ting Liu, Matthew Brown, Serge Belongie, Ming-Hsuan Yang, Hartwig Adam, Yin Cui

The training objective consists of two parts: a fine-grained temporal learning objective to maximize the similarity between corresponding temporal embeddings in the short clip and the long clip, and a persistent temporal learning objective to pull together global embeddings of the two clips.

Representation Learning Self-Supervised Learning

Paper
Add Code

Towards a Unified Foundation Model: Jointly Pre-Training Transformers on Unpaired Images and Text

no code implementations • 14 Dec 2021 • Qing Li, Boqing Gong, Yin Cui, Dan Kondratyuk, Xianzhi Du, Ming-Hsuan Yang, Matthew Brown

The experiments show that the resultant unified foundation transformer works surprisingly well on both the vision-only and text-only tasks, and the proposed knowledge distillation and gradient masking strategy can effectively lift the performance to approach the level of separately-trained models.

Image Classification Knowledge Distillation +1

Paper
Add Code

Open Long-Tailed Recognition in a Dynamic World

no code implementations • 17 Aug 2022 • Ziwei Liu, Zhongqi Miao, Xiaohang Zhan, Jiayun Wang, Boqing Gong, Stella X. Yu

A practical recognition system must balance between majority (head) and minority (tail) classes, generalize across the distribution, and acknowledge novelty upon the instances of unseen classes (open classes).

Active Learning Classification +4

Paper
Add Code

LESS: Label-Efficient Semantic Segmentation for LiDAR Point Clouds

no code implementations • 14 Oct 2022 • Minghua Liu, Yin Zhou, Charles R. Qi, Boqing Gong, Hao Su, Dragomir Anguelov

Our method co-designs an efficient labeling process with semi/weakly supervised learning and is applicable to nearly any 3D semantic segmentation backbones.

3D Semantic Segmentation Autonomous Driving +3

Paper
Add Code

Structured Video-Language Modeling with Temporal Grouping and Spatial Grounding

no code implementations • 28 Mar 2023 • Yuanhao Xiong, Long Zhao, Boqing Gong, Ming-Hsuan Yang, Florian Schroff, Ting Liu, Cho-Jui Hsieh, Liangzhe Yuan

Existing video-language pre-training methods primarily focus on instance-level alignment between video clips and captions via global contrastive learning but neglect rich fine-grained local information in both videos and text, which is of importance to downstream tasks requiring temporal localization and semantic reasoning.

Action Recognition Contrastive Learning +7

Paper
Add Code

Taming Encoder for Zero Fine-tuning Image Customization with Text-to-Image Diffusion Models

no code implementations • 5 Apr 2023 • Xuhui Jia, Yang Zhao, Kelvin C. K. Chan, Yandong Li, Han Zhang, Boqing Gong, Tingbo Hou, Huisheng Wang, Yu-Chuan Su

This paper proposes a method for generating images of customized objects specified by users.

Caption Generation Image Generation +1

Paper
Add Code

Federated Learning of Shareable Bases for Personalization-Friendly Image Classification

no code implementations • 16 Apr 2023 • Hong-You Chen, Jike Zhong, Mingda Zhang, Xuhui Jia, Hang Qi, Boqing Gong, Wei-Lun Chao, Li Zhang

FedBasis learns a set of few shareable ``basis'' models, which can be linearly combined to form personalized models for clients.

Image Classification Personalized Federated Learning

Paper
Add Code

Identity Encoder for Personalized Diffusion

no code implementations • 14 Apr 2023 • Yu-Chuan Su, Kelvin C. K. Chan, Yandong Li, Yang Zhao, Han Zhang, Boqing Gong, Huisheng Wang, Xuhui Jia

Our approach greatly reduces the overhead for personalized image generation and is more applicable in many potential applications.

Image Enhancement Image Generation

Paper
Add Code

Multi-modal Domain Adaptation for REG via Relation Transfer

no code implementations • 23 Sep 2023 • Yifan Ding, Liqiang Wang, Boqing Gong

Domain adaptation, which aims to transfer knowledge between domains, has been well studied in many areas such as image classification and object detection.

Domain Adaptation Image Classification +4

Paper
Add Code

Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation

no code implementations • 9 Oct 2023 • Lijun Yu, José Lezama, Nitesh B. Gundavarapu, Luca Versari, Kihyuk Sohn, David Minnen, Yong Cheng, Vighnesh Birodkar, Agrim Gupta, Xiuye Gu, Alexander G. Hauptmann, Boqing Gong, Ming-Hsuan Yang, Irfan Essa, David A. Ross, Lu Jiang

While Large Language Models (LLMs) are the dominant models for generative tasks in language, they do not perform as well as diffusion models on image and video generation.

Ranked #2 on Video Prediction on Kinetics-600 12 frames, 64x64

Action Recognition Image Generation +4

Paper
Add Code

Towards A Unified Neural Architecture for Visual Recognition and Reasoning

no code implementations • 10 Nov 2023 • Calvin Luo, Boqing Gong, Ting Chen, Chen Sun

Motivated by the recent success of multi-task transformers for visual recognition and language understanding, we propose a unified neural architecture for visual recognition and reasoning with a generic interface (e. g., tokens) for both.

Object object-detection +2

Paper
Add Code

Instruct-Imagen: Image Generation with Multi-modal Instruction

no code implementations • 3 Jan 2024 • Hexiang Hu, Kelvin C. K. Chan, Yu-Chuan Su, Wenhu Chen, Yandong Li, Kihyuk Sohn, Yang Zhao, Xue Ben, Boqing Gong, William Cohen, Ming-Wei Chang, Xuhui Jia

We introduce *multi-modal instruction* for image generation, a task representation articulating a range of generation intents with precision.

Image Generation Retrieval

Paper
Add Code

Distilling Vision-Language Models on Millions of Videos

no code implementations • 11 Jan 2024 • Yue Zhao, Long Zhao, Xingyi Zhou, Jialin Wu, Chun-Te Chu, Hui Miao, Florian Schroff, Hartwig Adam, Ting Liu, Boqing Gong, Philipp Krähenbühl, Liangzhe Yuan

Our best model outperforms state-of-the-art methods on MSR-VTT zero-shot text-to-video retrieval by 6%.

Language Modelling Retrieval +2

Paper
Add Code

VideoPrism: A Foundational Visual Encoder for Video Understanding

no code implementations • 20 Feb 2024 • Long Zhao, Nitesh B. Gundavarapu, Liangzhe Yuan, Hao Zhou, Shen Yan, Jennifer J. Sun, Luke Friedman, Rui Qian, Tobias Weyand, Yue Zhao, Rachel Hornung, Florian Schroff, Ming-Hsuan Yang, David A. Ross, Huisheng Wang, Hartwig Adam, Mikhail Sirotenko, Ting Liu, Boqing Gong

We introduce VideoPrism, a general-purpose video encoder that tackles diverse video understanding tasks with a single frozen model.

Question Answering Video Question Answering +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.