no code implementations • 10 May 2023 • Hassan Akbari, Dan Kondratyuk, Yin Cui, Rachel Hornung, Huisheng Wang, Hartwig Adam
IMP achieves competitive performance on a wide range of downstream tasks including image classification, video classification, image-text, and video-text retrieval.
Ranked #1 on
Zero-Shot Action Recognition
on UCF101
(using extra training data)
no code implementations • 29 Mar 2023 • Guan Zhe Hong, Yin Cui, Ariel Fuxman, Stanley H. Chan, Enming Luo
In this paper, we study how pretraining label granularity affects the generalization of deep neural networks in image classification tasks.
no code implementations • 16 Mar 2023 • Long Zhao, Liangzhe Yuan, Boqing Gong, Yin Cui, Florian Schroff, Ming-Hsuan Yang, Hartwig Adam, Ting Liu
To address this challenge, we propose UniVRD, a novel bottom-up method for Unified Visual Relationship Detection by leveraging vision and language models (VLMs).
Human-Object Interaction Detection
Relationship Detection
+2
no code implementations • 13 Feb 2023 • James Urquhart Allingham, Jie Ren, Michael W Dusenberry, Jeremiah Zhe Liu, Xiuye Gu, Yin Cui, Dustin Tran, Balaji Lakshminarayanan
In particular, we ask "Given a large pool of prompts, can we automatically score the prompts and ensemble those that are most suitable for a particular downstream dataset, without needing access to labeled validation data?".
no code implementations • CVPR 2023 • Hong-You Chen, Yandong Li, Yin Cui, Mingda Zhang, Wei-Lun Chao, Li Zhang
We study the problem of how to train a "personalization-friendly" model such that given only the task descriptions, the model can be adapted to different end-users' needs, e. g., for accurately classifying different subsets of objects.
1 code implementation • 20 Oct 2022 • Digbalay Bose, Rajat Hebbar, Krishna Somandepalli, Haoyang Zhang, Yin Cui, Kree Cole-McLaughlin, Huisheng Wang, Shrikanth Narayanan
Longform media such as movies have complex narrative structures, with events spanning a rich variety of ambient visual scenes.
1 code implementation • 30 Sep 2022 • Weicheng Kuo, Yin Cui, Xiuye Gu, AJ Piergiovanni, Anelia Angelova
We present F-VLM, a simple open-vocabulary object detection method built upon Frozen Vision and Language Models.
no code implementations • 15 Jul 2022 • Rui Qian, Yeqing Li, Zheng Xu, Ming-Hsuan Yang, Serge Belongie, Yin Cui
Utilizing vision and language models (VLMs) pre-trained on large-scale image-text pairs is becoming a promising paradigm for open-vocabulary visual recognition.
Ranked #1 on
Zero-Shot Action Recognition
on HMDB51
1 code implementation • ICLR 2022 • Juntang Zhuang, Boqing Gong, Liangzhe Yuan, Yin Cui, Hartwig Adam, Nicha Dvornek, Sekhar Tatikonda, James Duncan, Ting Liu
Instead, we define a \textit{surrogate gap}, a measure equivalent to the dominant eigenvalue of Hessian at a local minimum when the radius of the neighborhood (to derive the perturbed loss) is small.
1 code implementation • 22 Dec 2021 • Golnaz Ghiasi, Xiuye Gu, Yin Cui, Tsung-Yi Lin
We propose OpenSeg to address the above issue while still making use of scalable image-level supervision of captions.
no code implementations • 14 Dec 2021 • Qing Li, Boqing Gong, Yin Cui, Dan Kondratyuk, Xianzhi Du, Ming-Hsuan Yang, Matthew Brown
The experiments show that the resultant unified foundation transformer works surprisingly well on both the vision-only and text-only tasks, and the proposed knowledge distillation and gradient masking strategy can effectively lift the performance to approach the level of separately-trained models.
1 code implementation • CVPR 2022 • Liangzhe Yuan, Rui Qian, Yin Cui, Boqing Gong, Florian Schroff, Ming-Hsuan Yang, Hartwig Adam, Ting Liu
Modern self-supervised learning algorithms typically enforce persistency of instance representations across views.
no code implementations • 8 Dec 2021 • Rui Qian, Yeqing Li, Liangzhe Yuan, Boqing Gong, Ting Liu, Matthew Brown, Serge Belongie, Ming-Hsuan Yang, Hartwig Adam, Yin Cui
The training objective consists of two parts: a fine-grained temporal learning objective to maximize the similarity between corresponding temporal embeddings in the short clip and the long clip, and a persistent temporal learning objective to pull together global embeddings of the two clips.
1 code implementation • 3 Sep 2021 • Xianzhi Du, Yeqing Li, Yin Cui, Rui Qian, Jing Li, Irwan Bello
A recent work from Bello shows that training and scaling strategies may be more significant than model architectures for visual recognition.
Ranked #37 on
Action Classification
on Kinetics-600
no code implementations • 17 Aug 2021 • Chun-Han Yao, Boqing Gong, Yin Cui, Hang Qi, Yukun Zhu, Ming-Hsuan Yang
We further take the server-client and inter-client domain shifts into account and pose a domain adaptation problem with one source (centralized server data) and multiple targets (distributed client data).
2 code implementations • 25 Jun 2021 • Boyi Li, Yin Cui, Tsung-Yi Lin, Serge Belongie
In this paper, we propose and explore the problem of image translation for data augmentation.
no code implementations • 18 Jun 2021 • Marco Fornoni, Chaochao Yan, Liangchen Luo, Kimberly Wilber, Alex Stark, Yin Cui, Boqing Gong, Andrew Howard
When interacting with objects through cameras, or pictures, users often have a specific intent.
4 code implementations • ICLR 2022 • Xiuye Gu, Tsung-Yi Lin, Weicheng Kuo, Yin Cui
On COCO, ViLD outperforms the previous state-of-the-art by 4. 8 on novel AP and 11. 4 on overall AP.
Ranked #1 on
Open Vocabulary Object Detection
on LVIS v1.0
2 code implementations • NeurIPS 2021 • Hassan Akbari, Liangzhe Yuan, Rui Qian, Wei-Hong Chuang, Shih-Fu Chang, Yin Cui, Boqing Gong
We train VATT end-to-end from scratch using multimodal contrastive losses and evaluate its performance by the downstream tasks of video action recognition, audio event classification, image classification, and text-to-video retrieval.
Ranked #8 on
Action Classification
on Moments in Time
(using extra training data)
5 code implementations • CVPR 2021 • Golnaz Ghiasi, Yin Cui, Aravind Srinivas, Rui Qian, Tsung-Yi Lin, Ekin D. Cubuk, Quoc V. Le, Barret Zoph
Our baseline model outperforms the LVIS 2020 Challenge winning entry by +3. 6 mask AP on rare categories.
Ranked #2 on
Semantic Segmentation
on PASCAL VOC 2012 val
no code implementations • ECCV 2020 • Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Yin Cui, Mingxing Tan, Quoc Le, Xiaodan Song
Furthermore, SpineNet is built with a uniform resource distribution over operations.
1 code implementation • 9 Sep 2020 • Yi Zhou, Yin Cui, Xiaoke Xu, Jidong Suo, Xiaoming Liu
It is challenging to detect small-floating object in the sea clutter for a surface radar.
3 code implementations • CVPR 2021 • Rui Qian, Tianjian Meng, Boqing Gong, Ming-Hsuan Yang, Huisheng Wang, Serge Belongie, Yin Cui
Our representations are learned using a contrastive loss, where two augmented clips from the same short video are pulled together in the embedding space, while clips from different videos are pushed away.
Ranked #1 on
Self-Supervised Action Recognition
on Kinetics-600
2 code implementations • NeurIPS 2020 • Barret Zoph, Golnaz Ghiasi, Tsung-Yi Lin, Yin Cui, Hanxiao Liu, Ekin D. Cubuk, Quoc V. Le
For example, on the COCO object detection dataset, pre-training benefits when we use one fifth of the labeled data, and hurts accuracy when we use all labeled data.
Ranked #1 on
Semantic Segmentation
on PASCAL VOC 2012 val
5 code implementations • ECCV 2020 • Menglin Jia, Mengyun Shi, Mikhail Sirotenko, Yin Cui, Claire Cardie, Bharath Hariharan, Hartwig Adam, Serge Belongie
In this work we explore the task of instance segmentation with attribute localization, which unifies instance segmentation (detect and segment each object instance) and fine-grained visual attribute categorization (recognize one or multiple attributes).
Fine-Grained Visual Categorization
Fine-Grained Visual Recognition
+3
1 code implementation • 21 Dec 2019 • Yin Cui, Zeqi Gu, Dhruv Mahajan, Laurens van der Maaten, Serge Belongie, Ser-Nam Lim
We also investigate the interplay between dataset granularity with a variety of factors and find that fine-grained datasets are more difficult to learn from, more difficult to transfer to, more difficult to perform few-shot learning with, and more vulnerable to adversarial attacks.
5 code implementations • CVPR 2020 • Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song
We propose SpineNet, a backbone with scale-permuted intermediate features and cross-scale connections that is learned on an object detection task by Neural Architecture Search.
Ranked #8 on
Image Classification
on iNaturalist
1 code implementation • 13 Jun 2019 • Sheng Guo, Weilin Huang, Xiao Zhang, Prasanna Srikhanta, Yin Cui, Yuan Li, Matthew R. Scott, Hartwig Adam, Serge Belongie
The dataset is constructed from over one million fashion images with a label space that includes 8 groups of 228 fine-grained attributes in total.
8 code implementations • CVPR 2019 • Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang song, Serge Belongie
We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss.
Ranked #2 on
Long-tail Learning
on EGTEA
1 code implementation • ECCV 2018 • Guandao Yang, Yin Cui, Serge Belongie, Bharath Hariharan
It is expensive to label images with 3D structure or precise camera pose.
1 code implementation • CVPR 2018 • Yin Cui, Guandao Yang, Andreas Veit, Xun Huang, Serge Belongie
To address these two challenges, we propose a novel learning based discriminative evaluation metric that is directly trained to distinguish between human and machine-generated captions.
1 code implementation • CVPR 2018 • Yin Cui, Yang song, Chen Sun, Andrew Howard, Serge Belongie
We propose a measure to estimate domain similarity via Earth Mover's Distance and demonstrate that transfer learning benefits from pre-training on a source domain that is similar to the target domain by this measure.
Ranked #27 on
Fine-Grained Image Classification
on CUB-200-2011
Fine-Grained Image Classification
Fine-Grained Visual Categorization
+1
12 code implementations • CVPR 2018 • Grant Van Horn, Oisin Mac Aodha, Yang song, Yin Cui, Chen Sun, Alex Shepard, Hartwig Adam, Pietro Perona, Serge Belongie
Existing image classification datasets used in computer vision tend to have a uniform distribution of images across object categories.
Ranked #7 on
Image Classification
on iNaturalist
no code implementations • CVPR 2017 • Yin Cui, Feng Zhou, Jiang Wang, Xiao Liu, Yuanqing Lin, Serge Belongie
We demonstrate how to approximate kernels such as Gaussian RBF up to a given order using compact explicit feature maps in a parameter-free manner.
1 code implementation • WWW 2017 • Cheng-Kang Hsieh, Longqi Yang, Yin Cui, Tsung-Yi Lin, Serge Belongie, Deborah Estrin
Metric learning algorithms produce distance metrics that capture the important relationships among data.
Ranked #1 on
Recommendation Systems
on MovieLens 20M
(Recall@100 metric)
no code implementations • CVPR 2016 • Yin Cui, Feng Zhou, Yuanqing Lin, Serge Belongie
To demonstrate the effectiveness of the proposed framework, we bootstrap a fine-grained flower dataset with 620 categories from Instagram images.
no code implementations • CVPR 2015 • Tsung-Yi Lin, Yin Cui, Serge Belongie, James Hays
Most approaches predict the location of a query image by matching to ground-level images with known locations (e. g., street-view data).
no code implementations • 29 Mar 2014 • Yin Cui, Dong Liu, Jiawei Chen, Shih-Fu Chang
In this paper, we propose to build Concept Bank, the largest concept library consisting of 4, 876 concepts specifically designed to cover 631 real-world events.