Search Results for author: Yin Cui

Found 38 papers, 23 papers with code

Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception

no code implementations10 May 2023 Hassan Akbari, Dan Kondratyuk, Yin Cui, Rachel Hornung, Huisheng Wang, Hartwig Adam

IMP achieves competitive performance on a wide range of downstream tasks including image classification, video classification, image-text, and video-text retrieval.

 Ranked #1 on Zero-Shot Action Recognition on UCF101 (using extra training data)

Classification Image Classification +6

Towards Understanding the Effect of Pretraining Label Granularity

no code implementations29 Mar 2023 Guan Zhe Hong, Yin Cui, Ariel Fuxman, Stanley H. Chan, Enming Luo

In this paper, we study how pretraining label granularity affects the generalization of deep neural networks in image classification tasks.

Image Classification Transfer Learning

Unified Visual Relationship Detection with Vision and Language Models

no code implementations16 Mar 2023 Long Zhao, Liangzhe Yuan, Boqing Gong, Yin Cui, Florian Schroff, Ming-Hsuan Yang, Hartwig Adam, Ting Liu

To address this challenge, we propose UniVRD, a novel bottom-up method for Unified Visual Relationship Detection by leveraging vision and language models (VLMs).

Human-Object Interaction Detection Relationship Detection +2

A Simple Zero-shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models

no code implementations13 Feb 2023 James Urquhart Allingham, Jie Ren, Michael W Dusenberry, Jeremiah Zhe Liu, Xiuye Gu, Yin Cui, Dustin Tran, Balaji Lakshminarayanan

In particular, we ask "Given a large pool of prompts, can we automatically score the prompts and ensemble those that are most suitable for a particular downstream dataset, without needing access to labeled validation data?".

Prompt Engineering Zero-Shot Learning

Train-Once-for-All Personalization

no code implementations CVPR 2023 Hong-You Chen, Yandong Li, Yin Cui, Mingda Zhang, Wei-Lun Chao, Li Zhang

We study the problem of how to train a "personalization-friendly" model such that given only the task descriptions, the model can be adapted to different end-users' needs, e. g., for accurately classifying different subsets of objects.

MovieCLIP: Visual Scene Recognition in Movies

1 code implementation20 Oct 2022 Digbalay Bose, Rajat Hebbar, Krishna Somandepalli, Haoyang Zhang, Yin Cui, Kree Cole-McLaughlin, Huisheng Wang, Shrikanth Narayanan

Longform media such as movies have complex narrative structures, with events spanning a rich variety of ambient visual scenes.

Genre classification Scene Recognition

F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models

1 code implementation30 Sep 2022 Weicheng Kuo, Yin Cui, Xiuye Gu, AJ Piergiovanni, Anelia Angelova

We present F-VLM, a simple open-vocabulary object detection method built upon Frozen Vision and Language Models.

Knowledge Distillation object-detection +1

Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models

no code implementations15 Jul 2022 Rui Qian, Yeqing Li, Zheng Xu, Ming-Hsuan Yang, Serge Belongie, Yin Cui

Utilizing vision and language models (VLMs) pre-trained on large-scale image-text pairs is becoming a promising paradigm for open-vocabulary visual recognition.

Optical Flow Estimation Video Classification +1

Surrogate Gap Minimization Improves Sharpness-Aware Training

1 code implementation ICLR 2022 Juntang Zhuang, Boqing Gong, Liangzhe Yuan, Yin Cui, Hartwig Adam, Nicha Dvornek, Sekhar Tatikonda, James Duncan, Ting Liu

Instead, we define a \textit{surrogate gap}, a measure equivalent to the dominant eigenvalue of Hessian at a local minimum when the radius of the neighborhood (to derive the perturbed loss) is small.

Scaling Open-Vocabulary Image Segmentation with Image-Level Labels

1 code implementation22 Dec 2021 Golnaz Ghiasi, Xiuye Gu, Yin Cui, Tsung-Yi Lin

We propose OpenSeg to address the above issue while still making use of scalable image-level supervision of captions.

Image Segmentation Semantic Segmentation

Towards a Unified Foundation Model: Jointly Pre-Training Transformers on Unpaired Images and Text

no code implementations14 Dec 2021 Qing Li, Boqing Gong, Yin Cui, Dan Kondratyuk, Xianzhi Du, Ming-Hsuan Yang, Matthew Brown

The experiments show that the resultant unified foundation transformer works surprisingly well on both the vision-only and text-only tasks, and the proposed knowledge distillation and gradient masking strategy can effectively lift the performance to approach the level of separately-trained models.

Image Classification Knowledge Distillation +1

Exploring Temporal Granularity in Self-Supervised Video Representation Learning

no code implementations8 Dec 2021 Rui Qian, Yeqing Li, Liangzhe Yuan, Boqing Gong, Ting Liu, Matthew Brown, Serge Belongie, Ming-Hsuan Yang, Hartwig Adam, Yin Cui

The training objective consists of two parts: a fine-grained temporal learning objective to maximize the similarity between corresponding temporal embeddings in the short clip and the long clip, and a persistent temporal learning objective to pull together global embeddings of the two clips.

Representation Learning Self-Supervised Learning

Revisiting 3D ResNets for Video Recognition

1 code implementation3 Sep 2021 Xianzhi Du, Yeqing Li, Yin Cui, Rui Qian, Jing Li, Irwan Bello

A recent work from Bello shows that training and scaling strategies may be more significant than model architectures for visual recognition.

Action Classification Contrastive Learning +1

Federated Multi-Target Domain Adaptation

no code implementations17 Aug 2021 Chun-Han Yao, Boqing Gong, Yin Cui, Hang Qi, Yukun Zhu, Ming-Hsuan Yang

We further take the server-client and inter-client domain shifts into account and pose a domain adaptation problem with one source (centralized server data) and multiple targets (distributed client data).

Domain Adaptation Federated Learning +3

VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text

2 code implementations NeurIPS 2021 Hassan Akbari, Liangzhe Yuan, Rui Qian, Wei-Hong Chuang, Shih-Fu Chang, Yin Cui, Boqing Gong

We train VATT end-to-end from scratch using multimodal contrastive losses and evaluate its performance by the downstream tasks of video action recognition, audio event classification, image classification, and text-to-video retrieval.

Ranked #8 on Action Classification on Moments in Time (using extra training data)

Action Classification Action Recognition In Videos +8

Spatiotemporal Contrastive Video Representation Learning

3 code implementations CVPR 2021 Rui Qian, Tianjian Meng, Boqing Gong, Ming-Hsuan Yang, Huisheng Wang, Serge Belongie, Yin Cui

Our representations are learned using a contrastive loss, where two augmented clips from the same short video are pulled together in the embedding space, while clips from different videos are pushed away.

Contrastive Learning Data Augmentation +4

Rethinking Pre-training and Self-training

2 code implementations NeurIPS 2020 Barret Zoph, Golnaz Ghiasi, Tsung-Yi Lin, Yin Cui, Hanxiao Liu, Ekin D. Cubuk, Quoc V. Le

For example, on the COCO object detection dataset, pre-training benefits when we use one fifth of the labeled data, and hurts accuracy when we use all labeled data.

Data Augmentation object-detection +2

Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset

5 code implementations ECCV 2020 Menglin Jia, Mengyun Shi, Mikhail Sirotenko, Yin Cui, Claire Cardie, Bharath Hariharan, Hartwig Adam, Serge Belongie

In this work we explore the task of instance segmentation with attribute localization, which unifies instance segmentation (detect and segment each object instance) and fine-grained visual attribute categorization (recognize one or multiple attributes).

Fine-Grained Visual Categorization Fine-Grained Visual Recognition +3

Measuring Dataset Granularity

1 code implementation21 Dec 2019 Yin Cui, Zeqi Gu, Dhruv Mahajan, Laurens van der Maaten, Serge Belongie, Ser-Nam Lim

We also investigate the interplay between dataset granularity with a variety of factors and find that fine-grained datasets are more difficult to learn from, more difficult to transfer to, more difficult to perform few-shot learning with, and more vulnerable to adversarial attacks.

Few-Shot Learning

SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization

5 code implementations CVPR 2020 Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song

We propose SpineNet, a backbone with scale-permuted intermediate features and cross-scale connections that is learned on an object detection task by Neural Architecture Search.

General Classification Image Classification +5

The iMaterialist Fashion Attribute Dataset

1 code implementation13 Jun 2019 Sheng Guo, Weilin Huang, Xiao Zhang, Prasanna Srikhanta, Yin Cui, Yuan Li, Matthew R. Scott, Hartwig Adam, Serge Belongie

The dataset is constructed from over one million fashion images with a label space that includes 8 groups of 228 fine-grained attributes in total.

General Classification Image Classification +1

Class-Balanced Loss Based on Effective Number of Samples

8 code implementations CVPR 2019 Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang song, Serge Belongie

We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss.

Image Classification Long-tail Learning

Learning to Evaluate Image Captioning

1 code implementation CVPR 2018 Yin Cui, Guandao Yang, Andreas Veit, Xun Huang, Serge Belongie

To address these two challenges, we propose a novel learning based discriminative evaluation metric that is directly trained to distinguish between human and machine-generated captions.

Data Augmentation Image Captioning

Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning

1 code implementation CVPR 2018 Yin Cui, Yang song, Chen Sun, Andrew Howard, Serge Belongie

We propose a measure to estimate domain similarity via Earth Mover's Distance and demonstrate that transfer learning benefits from pre-training on a source domain that is similar to the target domain by this measure.

Fine-Grained Image Classification Fine-Grained Visual Categorization +1

The iNaturalist Species Classification and Detection Dataset

12 code implementations CVPR 2018 Grant Van Horn, Oisin Mac Aodha, Yang song, Yin Cui, Chen Sun, Alex Shepard, Hartwig Adam, Pietro Perona, Serge Belongie

Existing image classification datasets used in computer vision tend to have a uniform distribution of images across object categories.

General Classification Image Classification

Kernel Pooling for Convolutional Neural Networks

no code implementations CVPR 2017 Yin Cui, Feng Zhou, Jiang Wang, Xiao Liu, Yuanqing Lin, Serge Belongie

We demonstrate how to approximate kernels such as Gaussian RBF up to a given order using compact explicit feature maps in a parameter-free manner.

Face Recognition Fine-Grained Visual Categorization +2

Learning Deep Representations for Ground-to-Aerial Geolocalization

no code implementations CVPR 2015 Tsung-Yi Lin, Yin Cui, Serge Belongie, James Hays

Most approaches predict the location of a query image by matching to ground-level images with known locations (e. g., street-view data).

Face Verification

Building A Large Concept Bank for Representing Events in Video

no code implementations29 Mar 2014 Yin Cui, Dong Liu, Jiawei Chen, Shih-Fu Chang

In this paper, we propose to build Concept Bank, the largest concept library consisting of 4, 876 concepts specifically designed to cover 631 real-world events.

Event Detection Retrieval

Cannot find the paper you are looking for? You can Submit a new open access paper.