Search Results for author: Boqing Gong

Found 92 papers, 42 papers with code

Spatiotemporal Contrastive Video Representation Learning

4 code implementations CVPR 2021 Rui Qian, Tianjian Meng, Boqing Gong, Ming-Hsuan Yang, Huisheng Wang, Serge Belongie, Yin Cui

Our representations are learned using a contrastive loss, where two augmented clips from the same short video are pulled together in the embedding space, while clips from different videos are pushed away.

Contrastive Learning Data Augmentation +4

MoViNets: Mobile Video Networks for Efficient Video Recognition

3 code implementations CVPR 2021 Dan Kondratyuk, Liangzhe Yuan, Yandong Li, Li Zhang, Mingxing Tan, Matthew Brown, Boqing Gong

We present Mobile Video Networks (MoViNets), a family of computation and memory efficient video networks that can operate on streaming video for online inference.

Action Classification Action Recognition +4

VideoGLUE: Video General Understanding Evaluation of Foundation Models

1 code implementation6 Jul 2023 Liangzhe Yuan, Nitesh Bharadwaj Gundavarapu, Long Zhao, Hao Zhou, Yin Cui, Lu Jiang, Xuan Yang, Menglin Jia, Tobias Weyand, Luke Friedman, Mikhail Sirotenko, Huisheng Wang, Florian Schroff, Hartwig Adam, Ming-Hsuan Yang, Ting Liu, Boqing Gong

We evaluate existing foundation models video understanding capabilities using a carefully designed experiment protocol consisting of three hallmark tasks (action recognition, temporal localization, and spatiotemporal localization), eight datasets well received by the community, and four adaptation methods tailoring a foundation model (FM) for a downstream task.

Action Recognition Temporal Localization +1

Ranking Neural Checkpoints

1 code implementation CVPR 2021 Yandong Li, Xuhui Jia, Ruoxin Sang, Yukun Zhu, Bradley Green, Liqiang Wang, Boqing Gong

This paper is concerned with ranking many pre-trained deep neural networks (DNNs), called checkpoints, for the transfer learning to a downstream task.

Transferability Transfer Learning

Class-Balanced Distillation for Long-Tailed Visual Recognition

3 code implementations12 Apr 2021 Ahmet Iscen, André Araujo, Boqing Gong, Cordelia Schmid

An effective and simple approach to long-tailed visual recognition is to learn feature representations and a classifier separately, with instance and class-balanced sampling, respectively.

Image Classification Knowledge Distillation +1

VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text

2 code implementations NeurIPS 2021 Hassan Akbari, Liangzhe Yuan, Rui Qian, Wei-Hong Chuang, Shih-Fu Chang, Yin Cui, Boqing Gong

We train VATT end-to-end from scratch using multimodal contrastive losses and evaluate its performance by the downstream tasks of video action recognition, audio event classification, image classification, and text-to-video retrieval.

Ranked #3 on Zero-Shot Video Retrieval on YouCook2 (text-to-video Mean Rank metric)

Action Classification Action Recognition In Videos +9

Adversarial Examples Improve Image Recognition

6 code implementations CVPR 2020 Cihang Xie, Mingxing Tan, Boqing Gong, Jiang Wang, Alan Yuille, Quoc V. Le

We show that AdvProp improves a wide range of models on various image recognition tasks and performs better when the models are bigger.

Domain Generalization Image Classification

Surrogate Gap Minimization Improves Sharpness-Aware Training

1 code implementation ICLR 2022 Juntang Zhuang, Boqing Gong, Liangzhe Yuan, Yin Cui, Hartwig Adam, Nicha Dvornek, Sekhar Tatikonda, James Duncan, Ting Liu

Instead, we define a \textit{surrogate gap}, a measure equivalent to the dominant eigenvalue of Hessian at a local minimum when the radius of the neighborhood (to derive the perturbed loss) is small.

Robust and Accurate Object Detection via Adversarial Learning

1 code implementation CVPR 2021 Xiangning Chen, Cihang Xie, Mingxing Tan, Li Zhang, Cho-Jui Hsieh, Boqing Gong

Data augmentation has become a de facto component for training high-performance deep image classifiers, but its potential is under-explored for object detection.

AutoML Data Augmentation +3

Unified Visual Relationship Detection with Vision and Language Models

1 code implementation ICCV 2023 Long Zhao, Liangzhe Yuan, Boqing Gong, Yin Cui, Florian Schroff, Ming-Hsuan Yang, Hartwig Adam, Ting Liu

To address this challenge, we propose UniVRD, a novel bottom-up method for Unified Visual Relationship Detection by leveraging vision and language models (VLMs).

Human-Object Interaction Detection Relationship Detection +2

Synthesized Classifiers for Zero-Shot Learning

2 code implementations CVPR 2016 Soravit Changpinyo, Wei-Lun Chao, Boqing Gong, Fei Sha

Given semantic descriptions of object classes, zero-shot learning aims to accurately recognize objects of the unseen classes, from which no examples are available at the training stage, by associating them to the seen classes, from which labeled examples are provided.

Object Zero-Shot Learning

Large-Scale Long-Tailed Recognition in an Open World

2 code implementations CVPR 2019 Ziwei Liu, Zhongqi Miao, Xiaohang Zhan, Jiayun Wang, Boqing Gong, Stella X. Yu

We define Open Long-Tailed Recognition (OLTR) as learning from such naturally distributed data and optimizing the classification accuracy over a balanced test set which include head, tail, and open classes.

Classification Few-Shot Learning +4

End-to-End Learning of Motion Representation for Video Understanding

1 code implementation CVPR 2018 Lijie Fan, Wenbing Huang, Chuang Gan, Stefano Ermon, Boqing Gong, Junzhou Huang

Despite the recent success of end-to-end learned representations, hand-crafted optical flow features are still widely used in video analysis tasks.

Action Recognition Optical Flow Estimation +1

Constructing Self-motivated Pyramid Curriculums for Cross-Domain Semantic Segmentation: A Non-Adversarial Approach

1 code implementation ICCV 2019 Qing Lian, Fengmao Lv, Lixin Duan, Boqing Gong

We propose a new approach, called self-motivated pyramid curriculum domain adaptation (PyCDA), to facilitate the adaptation of semantic segmentation neural networks from synthetic source domains to real target domains.

Segmentation Semantic Segmentation +2

Smooth Adversarial Training

1 code implementation25 Jun 2020 Cihang Xie, Mingxing Tan, Boqing Gong, Alan Yuille, Quoc V. Le

SAT also works well with larger networks: it helps EfficientNet-L1 to achieve 82. 2% accuracy and 58. 6% robustness on ImageNet, outperforming the previous state-of-the-art defense by 9. 5% for accuracy and 11. 6% for robustness.

Adversarial Defense Adversarial Robustness

DHER: Hindsight Experience Replay for Dynamic Goals

1 code implementation ICLR 2019 Meng Fang, Cheng Zhou, Bei Shi, Boqing Gong, Jia Xu, Tong Zhang

Dealing with sparse rewards is one of the most important challenges in reinforcement learning (RL), especially when a goal is dynamic (e. g., to grasp a moving object).

Object Tracking Reinforcement Learning (RL)

Improving the Improved Training of Wasserstein GANs: A Consistency Term and Its Dual Effect

1 code implementation ICLR 2018 Xiang Wei, Boqing Gong, Zixia Liu, Wei Lu, Liqiang Wang

Despite being impactful on a variety of problems and applications, the generative adversarial nets (GANs) are remarkably difficult to train.

NATTACK: Learning the Distributions of Adversarial Examples for an Improved Black-Box Attack on Deep Neural Networks

1 code implementation1 May 2019 Yandong Li, Lijun Li, Liqiang Wang, Tong Zhang, Boqing Gong

Powerful adversarial attack methods are vital for understanding how to construct robust deep neural networks (DNNs) and for thoroughly testing defense techniques.

Adversarial Attack

An Empirical Study and Analysis of Generalized Zero-Shot Learning for Object Recognition in the Wild

1 code implementation13 May 2016 Wei-Lun Chao, Soravit Changpinyo, Boqing Gong, Fei Sha

Zero-shot learning (ZSL) methods have been studied in the unrealistic setting where test data are assumed to come from unseen classes only.

Few-Shot Learning Generalized Zero-Shot Learning +1

Classifier and Exemplar Synthesis for Zero-Shot Learning

1 code implementation16 Dec 2018 Soravit Changpinyo, Wei-Lun Chao, Boqing Gong, Fei Sha

Zero-shot learning (ZSL) enables solving a task without the need to see its examples.

Denoising Zero-Shot Learning

On Model Calibration for Long-Tailed Object Detection and Instance Segmentation

1 code implementation NeurIPS 2021 Tai-Yu Pan, Cheng Zhang, Yandong Li, Hexiang Hu, Dong Xuan, Soravit Changpinyo, Boqing Gong, Wei-Lun Chao

We propose NorCal, Normalized Calibration for long-tailed object detection and instance segmentation, a simple and straightforward recipe that reweighs the predicted scores of each class by its training sample size.

Instance Segmentation Long-tailed Object Detection +4

End-to-End Video Captioning with Multitask Reinforcement Learning

1 code implementation21 Mar 2018 Lijun Li, Boqing Gong

Although end-to-end (E2E) learning has led to impressive progress on a variety of visual understanding tasks, it is often impeded by hardware constraints (e. g., GPU memory) and is prone to overfitting.

reinforcement-learning Reinforcement Learning (RL) +2

MosaicOS: A Simple and Effective Use of Object-Centric Images for Long-Tailed Object Detection

1 code implementation ICCV 2021 Cheng Zhang, Tai-Yu Pan, Yandong Li, Hexiang Hu, Dong Xuan, Soravit Changpinyo, Boqing Gong, Wei-Lun Chao

Many objects do not appear frequently enough in complex scenes (e. g., certain handbags in living rooms) for training an accurate object detector, but are often found frequently by themselves (e. g., in product images).

Imputation Instance Segmentation +5

Domain Generalization with Adversarial Intensity Attack for Medical Image Segmentation

1 code implementation5 Apr 2023 Zheyuan Zhang, Bin Wang, Lanhong Yao, Ugur Demir, Debesh Jha, Ismail Baris Turkbey, Boqing Gong, Ulas Bagci

In real-world scenarios, however, it is common for models to encounter data from new and different domains to which they were not exposed to during training.

Domain Generalization Image Segmentation +2

MACER: Attack-free and Scalable Robust Training via Maximizing Certified Radius

2 code implementations ICLR 2020 Runtian Zhai, Chen Dan, Di He, huan zhang, Boqing Gong, Pradeep Ravikumar, Cho-Jui Hsieh, Li-Wei Wang

Adversarial training is one of the most popular ways to learn robust models but is usually attack-dependent and time costly.

Rethinking Class-Balanced Methods for Long-Tailed Visual Recognition from a Domain Adaptation Perspective

1 code implementation CVPR 2020 Muhammad Abdullah Jamal, Matthew Brown, Ming-Hsuan Yang, Liqiang Wang, Boqing Gong

Object frequency in the real world often follows a power law, leading to a mismatch between datasets with long-tailed class distributions seen by a machine learning model and our expectation of the model to perform well on all classes.

Domain Adaptation Long-tail Learning +1

VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation

1 code implementation ICCV 2017 Chuang Gan, Yandong Li, Haoxiang Li, Chen Sun, Boqing Gong

Many seemingly distant annotations (e. g., semantic segmentation and visual question answering (VQA)) are inherently connected in that they reveal different levels and perspectives of human understandings about the same visual scenes --- and even the same set of images (e. g., of COCO).

Language Modelling Multiple-choice +4

Neural Networks Are More Productive Teachers Than Human Raters: Active Mixup for Data-Efficient Knowledge Distillation from a Blackbox Model

1 code implementation CVPR 2020 Dongdong Wang, Yandong Li, Liqiang Wang, Boqing Gong

The other is that the number of images used for the knowledge distillation should be small; otherwise, it violates our expectation of reducing the dependence on large-scale datasets.

Active Learning Knowledge Distillation

Synthesized Policies for Transfer and Adaptation across Tasks and Environments

2 code implementations NeurIPS 2018 Hexiang Hu, Liyu Chen, Boqing Gong, Fei Sha

The ability to transfer in reinforcement learning is key towards building an agent of general artificial intelligence.

2.5D Visual Relationship Detection

1 code implementation26 Apr 2021 Yu-Chuan Su, Soravit Changpinyo, Xiangning Chen, Sathish Thoppay, Cho-Jui Hsieh, Lior Shapira, Radu Soricut, Hartwig Adam, Matthew Brown, Ming-Hsuan Yang, Boqing Gong

To enable progress on this task, we create a new dataset consisting of 220k human-annotated 2. 5D relationships among 512K objects from 11K images.

Benchmarking Depth Estimation +2

On Calibrating Semantic Segmentation Models: Analyses and An Algorithm

1 code implementation CVPR 2023 Dongdong Wang, Boqing Gong, Liqiang Wang

Then, we study popular existing calibration methods and compare them with selective scaling on semantic segmentation calibration.

Image Classification Segmentation +1

CAMOU: Learning Physical Vehicle Camouflages to Adversarially Attack Detectors in the Wild

1 code implementation ICLR 2019 Yang Zhang, Hassan Foroosh, Philip David, Boqing Gong

In particular, we learn a camouflage pattern to hide vehicles from being detected by state-of-the-art convolutional neural network based detectors.

Adversarial Attack Object

Look, Listen, and Act: Towards Audio-Visual Embodied Navigation

1 code implementation25 Dec 2019 Chuang Gan, Yiwei Zhang, Jiajun Wu, Boqing Gong, Joshua B. Tenenbaum

In this paper, we attempt to approach the problem of Audio-Visual Embodied Navigation, the task of planning the shortest path from a random starting location in a scene to the sound source in an indoor environment, given only raw egocentric visual and audio sensory data.

Navigate

Anti-Neuron Watermarking: Protecting Personal Data Against Unauthorized Neural Networks

1 code implementation18 Sep 2021 Zihang Zou, Boqing Gong, Liqiang Wang

We study protecting a user's data (images in this work) against a learner's unauthorized use in training neural networks.

medXGAN: Visual Explanations for Medical Classifiers through a Generative Latent Space

1 code implementation11 Apr 2022 Amil Dravid, Florian Schiffers, Boqing Gong, Aggelos K. Katsaggelos

Despite the surge of deep learning in the past decade, some users are skeptical to deploy these models in practice due to their black-box nature.

A Semi-Supervised Two-Stage Approach to Learning from Noisy Labels

no code implementations8 Feb 2018 Yifan Ding, Liqiang Wang, Deliang Fan, Boqing Gong

In the first stage, we identify a small portion of images from the noisy training set of which the labels are correct with a high probability.

Vocal Bursts Valence Prediction

Blind Pre-Processing: A Robust Defense Method Against Adversarial Examples

no code implementations5 Feb 2018 Adnan Siraj Rakin, Zhezhi He, Boqing Gong, Deliang Fan

Blind pre-processing improves the white box attack accuracy of MNIST from 94. 3\% to 98. 7\%.

Adversarial Attack

Infinite-Label Learning with Semantic Output Codes

no code implementations23 Aug 2016 Yang Zhang, Rupam Acharyya, Ji Liu, Boqing Gong

We develop a new statistical machine learning paradigm, named infinite-label learning, to annotate a data point with more than one relevant labels from a candidate set, which pools both the finite labels observed at training and a potentially infinite number of previously unseen labels.

Multi-Label Learning Zero-Shot Learning

Query-Focused Video Summarization: Dataset, Evaluation, and A Memory Network Based Approach

no code implementations CVPR 2017 Aidean Sharghi, Jacob S. Laurel, Boqing Gong

However, one of the main obstacles to the research on video summarization is the user subjectivity - users have various preferences over the summaries.

Video Summarization

Improved Dropout for Shallow and Deep Learning

no code implementations NeurIPS 2016 Zhe Li, Boqing Gong, Tianbao Yang

To exhibit the optimal dropout probabilities, we analyze the shallow learning with multinomial dropout and establish the risk bound for stochastic optimization.

Stochastic Optimization

Query-Focused Extractive Video Summarization

no code implementations18 Jul 2016 Aidean Sharghi, Boqing Gong, Mubarak Shah

The decision to include a shot in the summary depends on the shot's relevance to the user query and importance in the context of the video, jointly.

Video Summarization

Fast Zero-Shot Image Tagging

no code implementations CVPR 2016 Yang Zhang, Boqing Gong, Mubarak Shah

The well-known word analogy experiments show that the recent word vectors capture fine-grained linguistic regularities in words by linear vector offsets, but it is unclear how well the simple vector offsets can encode visual regularities over words.

Multi-label zero-shot learning

Learning Attributes Equals Multi-Source Domain Generalization

no code implementations CVPR 2016 Chuang Gan, Tianbao Yang, Boqing Gong

Attributes possess appealing properties and benefit many computer vision problems, such as object recognition, learning with humans in the loop, and image retrieval.

Attribute Domain Generalization +3

Large-Margin Determinantal Point Processes

no code implementations6 Nov 2014 Boqing Gong, Wei-Lun Chao, Kristen Grauman, Fei Sha

Extensive empirical studies validate our contributions, including applications on challenging document and video summarization, where flexibility in modeling the kernel matrix and balancing different errors is indispensable.

Point Processes Video Summarization

Improving Sequential Determinantal Point Processes for Supervised Video Summarization

no code implementations ECCV 2018 Aidean Sharghi, Ali Borji, Chengtao Li, Tianbao Yang, Boqing Gong

In terms of modeling, we design a new probabilistic distribution such that, when it is integrated into SeqDPP, the resulting model accepts user input about the expected length of the summary.

Point Processes Supervised Video Summarization

Controllable Image-to-Video Translation: A Case Study on Facial Expression Generation

no code implementations9 Aug 2018 Lijie Fan, Wenbing Huang, Chuang Gan, Junzhou Huang, Boqing Gong

The recent advances in deep learning have made it possible to generate photo-realistic images by using neural networks and even to extrapolate video frames from an input video clip.

Facial expression generation Image-to-Image Translation +2

Synthesize Policies for Transfer and Adaptation across Tasks and Environments

no code implementations NeurIPS 2018 Hexiang Hu, Liyu Chen, Boqing Gong, Fei Sha

The ability to transfer in reinforcement learning is key towards building an agent of general artificial intelligence.

Reshaping Visual Datasets for Domain Adaptation

no code implementations NeurIPS 2013 Boqing Gong, Kristen Grauman, Fei Sha

By maximum distinctiveness, we require the underlying distributions of the identified domains to be different from each other; by maximum learnability, we ensure that a strong discriminative model can be learned from the domain.

Domain Adaptation Human Activity Recognition +1

Geometry Guided Convolutional Neural Networks for Self-Supervised Video Representation Learning

no code implementations CVPR 2018 Chuang Gan, Boqing Gong, Kun Liu, Hao Su, Leonidas J. Guibas

In addition, we also find that a progressive training strategy can foster a better neural network for the video recognition task than blindly pooling the distinct sources of geometry cues together.

Action Recognition Representation Learning +5

Joint Modeling of Dense and Incomplete Trajectories for Citywide Traffic Volume Inference

no code implementations25 Feb 2019 Xianfeng Tang, Boqing Gong, Yanwei Yu, Huaxiu Yao, Yandong Li, Haiyong Xie, Xiaoyu Wang

In this paper, we propose a novel framework for the citywide traffic volume inference using both dense GPS trajectories and incomplete trajectories captured by camera surveillance systems.

Graph Embedding

Defending Against Adversarial Attacks Using Random Forests

no code implementations16 Jun 2019 Yifan Ding, Liqiang Wang, huan zhang, Jin-Feng Yi, Deliang Fan, Boqing Gong

As deep neural networks (DNNs) have become increasingly important and popular, the robustness of DNNs is the key to the safety of both the Internet and the physical world.

Open Compound Domain Adaptation

no code implementations CVPR 2020 Ziwei Liu, Zhongqi Miao, Xingang Pan, Xiaohang Zhan, Dahua Lin, Stella X. Yu, Boqing Gong

A typical domain adaptation approach is to adapt models trained on the annotated data in a source domain (e. g., sunny weather) for achieving high performance on the test data in a target domain (e. g., rainy weather).

Domain Adaptation Facial Expression Recognition +2

When Ensembling Smaller Models is More Efficient than Single Large Models

no code implementations1 May 2020 Dan Kondratyuk, Mingxing Tan, Matthew Brown, Boqing Gong

Ensembling is a simple and popular technique for boosting evaluation performance by training multiple models (e. g., with different initializations) and aggregating their predictions.

Complete & Label: A Domain Adaptation Approach to Semantic Segmentation of LiDAR Point Clouds

no code implementations CVPR 2021 Li Yi, Boqing Gong, Thomas Funkhouser

We study an unsupervised domain adaptation problem for the semantic labeling of 3D point clouds, with a particular focus on domain discrepancies induced by different LiDAR sensors.

Semantic Segmentation Unsupervised Domain Adaptation

Contrastive Learning for Label-Efficient Semantic Segmentation

no code implementations13 Dec 2020 Xiangyun Zhao, Raviteja Vemulapalli, Philip Mansfield, Boqing Gong, Bradley Green, Lior Shapira, Ying Wu

While recent Convolutional Neural Network (CNN) based semantic segmentation approaches have achieved impressive results by using large amounts of labeled training data, their performance drops significantly as the amount of labeled data decreases.

Contrastive Learning Segmentation +1

Large-Scale Meta-Learning with Continual Trajectory Shifting

no code implementations14 Feb 2021 Jaewoong Shin, Hae Beom Lee, Boqing Gong, Sung Ju Hwang

Meta-learning of shared initialization parameters has shown to be highly effective in solving few-shot learning tasks.

Few-Shot Learning Multi-Task Learning

Federated Multi-Target Domain Adaptation

no code implementations17 Aug 2021 Chun-Han Yao, Boqing Gong, Yin Cui, Hang Qi, Yukun Zhu, Ming-Hsuan Yang

We further take the server-client and inter-client domain shifts into account and pose a domain adaptation problem with one source (centralized server data) and multiple targets (distributed client data).

Domain Adaptation Federated Learning +3

Contrastive Learning for Label Efficient Semantic Segmentation

no code implementations ICCV 2021 Xiangyun Zhao, Raviteja Vemulapalli, Philip Andrew Mansfield, Boqing Gong, Bradley Green, Lior Shapira, Ying Wu

While recent Convolutional Neural Network (CNN) based semantic segmentation approaches have achieved impressive results by using large amounts of labeled training data, their performance drops significantly as the amount of labeled data decreases.

Contrastive Learning Segmentation +1

CrossVQA: Scalably Generating Benchmarks for Systematically Testing VQA Generalization

no code implementations EMNLP 2021 Arjun Akula, Soravit Changpinyo, Boqing Gong, Piyush Sharma, Song-Chun Zhu, Radu Soricut

One challenge in evaluating visual question answering (VQA) models in the cross-dataset adaptation setting is that the distribution shifts are multi-modal, making it difficult to identify if it is the shifts in visual or language features that play a key role.

Answer Generation Question-Answer-Generation +2

Exploring Temporal Granularity in Self-Supervised Video Representation Learning

no code implementations8 Dec 2021 Rui Qian, Yeqing Li, Liangzhe Yuan, Boqing Gong, Ting Liu, Matthew Brown, Serge Belongie, Ming-Hsuan Yang, Hartwig Adam, Yin Cui

The training objective consists of two parts: a fine-grained temporal learning objective to maximize the similarity between corresponding temporal embeddings in the short clip and the long clip, and a persistent temporal learning objective to pull together global embeddings of the two clips.

Representation Learning Self-Supervised Learning

Towards a Unified Foundation Model: Jointly Pre-Training Transformers on Unpaired Images and Text

no code implementations14 Dec 2021 Qing Li, Boqing Gong, Yin Cui, Dan Kondratyuk, Xianzhi Du, Ming-Hsuan Yang, Matthew Brown

The experiments show that the resultant unified foundation transformer works surprisingly well on both the vision-only and text-only tasks, and the proposed knowledge distillation and gradient masking strategy can effectively lift the performance to approach the level of separately-trained models.

Image Classification Knowledge Distillation +1

Open Long-Tailed Recognition in a Dynamic World

no code implementations17 Aug 2022 Ziwei Liu, Zhongqi Miao, Xiaohang Zhan, Jiayun Wang, Boqing Gong, Stella X. Yu

A practical recognition system must balance between majority (head) and minority (tail) classes, generalize across the distribution, and acknowledge novelty upon the instances of unseen classes (open classes).

Active Learning Classification +4

LESS: Label-Efficient Semantic Segmentation for LiDAR Point Clouds

no code implementations14 Oct 2022 Minghua Liu, Yin Zhou, Charles R. Qi, Boqing Gong, Hao Su, Dragomir Anguelov

Our method co-designs an efficient labeling process with semi/weakly supervised learning and is applicable to nearly any 3D semantic segmentation backbones.

3D Semantic Segmentation Autonomous Driving +3

Structured Video-Language Modeling with Temporal Grouping and Spatial Grounding

no code implementations28 Mar 2023 Yuanhao Xiong, Long Zhao, Boqing Gong, Ming-Hsuan Yang, Florian Schroff, Ting Liu, Cho-Jui Hsieh, Liangzhe Yuan

Existing video-language pre-training methods primarily focus on instance-level alignment between video clips and captions via global contrastive learning but neglect rich fine-grained local information in both videos and text, which is of importance to downstream tasks requiring temporal localization and semantic reasoning.

Action Recognition Contrastive Learning +7

Identity Encoder for Personalized Diffusion

no code implementations14 Apr 2023 Yu-Chuan Su, Kelvin C. K. Chan, Yandong Li, Yang Zhao, Han Zhang, Boqing Gong, Huisheng Wang, Xuhui Jia

Our approach greatly reduces the overhead for personalized image generation and is more applicable in many potential applications.

Image Enhancement Image Generation

Multi-modal Domain Adaptation for REG via Relation Transfer

no code implementations23 Sep 2023 Yifan Ding, Liqiang Wang, Boqing Gong

Domain adaptation, which aims to transfer knowledge between domains, has been well studied in many areas such as image classification and object detection.

Domain Adaptation Image Classification +4

Towards A Unified Neural Architecture for Visual Recognition and Reasoning

no code implementations10 Nov 2023 Calvin Luo, Boqing Gong, Ting Chen, Chen Sun

Motivated by the recent success of multi-task transformers for visual recognition and language understanding, we propose a unified neural architecture for visual recognition and reasoning with a generic interface (e. g., tokens) for both.

Object object-detection +2

Instruct-Imagen: Image Generation with Multi-modal Instruction

no code implementations3 Jan 2024 Hexiang Hu, Kelvin C. K. Chan, Yu-Chuan Su, Wenhu Chen, Yandong Li, Kihyuk Sohn, Yang Zhao, Xue Ben, Boqing Gong, William Cohen, Ming-Wei Chang, Xuhui Jia

We introduce *multi-modal instruction* for image generation, a task representation articulating a range of generation intents with precision.

Image Generation Retrieval

Cannot find the paper you are looking for? You can Submit a new open access paper.