1 code implementation • 27 Feb 2024 • MingJie Sun, Xinlei Chen, J. Zico Kolter, Zhuang Liu
We observe an empirical phenomenon in Large Language Models (LLMs) -- very few activations exhibit significantly larger values than others (e. g., 100, 000 times larger).
1 code implementation • arXiv preprint 2024 • Adrien Bardes, Quentin Garrido, Jean Ponce, Xinlei Chen, Michael Rabbat, Yann Lecun, Mahmoud Assran, Nicolas Ballas
This paper explores feature prediction as a stand-alone objective for unsupervised learning from video and introduces V-JEPA, a collection of vision models trained solely using a feature prediction objective, without the use of pretrained image encoders, text, negative examples, reconstruction, or other sources of supervision.
1 code implementation • 25 Jan 2024 • Xinlei Chen, Zhuang Liu, Saining Xie, Kaiming He
In this study, we examine the representation learning abilities of Denoising Diffusion Models (DDM) that were originally purposed for image generation.
1 code implementation • 20 Oct 2023 • Yu Sun, Xinhao Li, Karan Dalal, Chloe Hsu, Sanmi Koyejo, Carlos Guestrin, Xiaolong Wang, Tatsunori Hashimoto, Xinlei Chen
Our inner loop turns out to be equivalent to linear attention when the inner-loop learner is only a linear model, and to self-attention when it is a kernel estimator.
no code implementations • 11 Jul 2023 • Renhao Wang, Yu Sun, Yossi Gandelsman, Xinlei Chen, Alexei A. Efros, Xiaolong Wang
Before making a prediction on each test instance, the model is trained on the same instance using a self-supervised task, such as image reconstruction with masked autoencoders.
1 code implementation • CVPR 2023 • Corentin Dancette, Spencer Whitehead, Rishabh Maheshwary, Ramakrishna Vedantam, Stefan Scherer, Xinlei Chen, Matthieu Cord, Marcus Rohrbach
In this work, we explore Selective VQA in both in-distribution (ID) and OOD scenarios, where models are presented with mixtures of ID and OOD data.
1 code implementation • 8 Jun 2023 • Duy-Kien Nguyen, Vaibhav Aggarwal, Yanghao Li, Martin R. Oswald, Alexander Kirillov, Cees G. M. Snoek, Xinlei Chen
In this work, we explore regions as a potential visual analogue of words for self-supervised image representation learning.
10 code implementations • CVPR 2023 • Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, Saining Xie
This co-design of self-supervised learning techniques and architectural improvement results in a new model family called ConvNeXt V2, which significantly improves the performance of pure ConvNets on various recognition benchmarks, including ImageNet classification, COCO detection, and ADE20K segmentation.
Ranked #45 on Semantic Segmentation on ADE20K
no code implementations • ICCV 2023 • Dave Zhenyu Chen, Ronghang Hu, Xinlei Chen, Matthias Nießner, Angel X. Chang
Performing 3D dense captioning and visual grounding requires a common and shared understanding of the underlying multimodal relationships.
1 code implementation • 23 Nov 2022 • Minghao Xu, Yuanfan Guo, Yi Xu, Jian Tang, Xinlei Chen, Yuandong Tian
We study EurNets in two important domains for image and protein structure modeling.
1 code implementation • 13 Oct 2022 • Ronghang Hu, Shoubhik Debnath, Saining Xie, Xinlei Chen
Masked Autoencoding (MAE) has emerged as an effective approach for pre-training representations across multiple domains.
1 code implementation • 15 Sep 2022 • Yossi Gandelsman, Yu Sun, Xinlei Chen, Alexei A. Efros
Test-time training adapts to a new test distribution on the fly by optimizing a model for each test input using self-supervision.
1 code implementation • CVPR 2022 • Xiao Wang, Haoqi Fan, Yuandong Tian, Daisuke Kihara, Xinlei Chen
Many recent self-supervised frameworks for visual representation learning are based on certain forms of Siamese networks.
no code implementations • 10 Mar 2022 • Jie Lei, Xinlei Chen, Ning Zhang, Mengjiao Wang, Mohit Bansal, Tamara L. Berg, Licheng Yu
In this work, we propose LoopITR, which combines them in the same network for joint learning.
1 code implementation • CVPR 2022 • Yutong Bai, Xinlei Chen, Alexander Kirillov, Alan Yuille, Alexander C. Berg
In this work we present point-level region contrast, a self-supervised pre-training approach for the task of object detection.
2 code implementations • 22 Nov 2021 • Yanghao Li, Saining Xie, Xinlei Chen, Piotr Dollar, Kaiming He, Ross Girshick
The complexity of object detection methods can make this benchmarking non-trivial when new architectures, such as Vision Transformer (ViT) models, arrive.
49 code implementations • CVPR 2022 • Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick
Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels.
Ranked #1 on Out-of-Distribution Generalization on ImageNet-W
2 code implementations • 11 Oct 2021 • Xiang Wang, Xinlei Chen, Simon S. Du, Yuandong Tian
Non-contrastive methods of self-supervised learning (such as BYOL and SimSiam) learn representations by minimizing the distance between two views of the same image.
1 code implementation • ICLR 2022 • Chengyue Gong, Dilin Wang, Meng Li, Xinlei Chen, Zhicheng Yan, Yuandong Tian, Qiang Liu, Vikas Chandra
In this work, we observe that the poor performance is due to a gradient conflict issue: the gradients of different sub-networks conflict with that of the supernet more severely in ViTs than CNNs, which leads to early saturation in training and inferior convergence.
Ranked #7 on Neural Architecture Search on ImageNet
no code implementations • 17 May 2021 • Xinlei Chen, Moosa Moghimi Haji, Omid Ardakanian
With the emergence of cost effective battery storage and the decline in the solar photovoltaic (PV) levelized cost of energy (LCOE), the number of behind-the-meter solar PV systems is expected to increase steadily.
8 code implementations • ICCV 2021 • Xinlei Chen, Saining Xie, Kaiming He
In this work, we go back to basics and investigate the effects of several fundamental components for training self-supervised ViT.
Ranked #1 on Out-of-Distribution Generalization on ImageNet-W
Out-of-Distribution Generalization Self-Supervised Image Classification +1
5 code implementations • 12 Feb 2021 • Yuandong Tian, Xinlei Chen, Surya Ganguli
While contrastive approaches of self-supervised learning (SSL) learn representations by minimizing the distance between two augmented views of the same data point (positive pairs) and maximizing views from different data points (negative pairs), recent \emph{non-contrastive} SSL (e. g., BYOL and SimSiam) show remarkable performance {\it without} negative pairs, with an extra learnable predictor and a stop-gradient operation.
no code implementations • CVPR 2021 • Kenneth Marino, Xinlei Chen, Devi Parikh, Abhinav Gupta, Marcus Rohrbach
One of the most challenging question types in VQA is when answering the question requires outside knowledge not present in the image.
Ranked #6 on Visual Question Answering (VQA) on A-OKVQA
26 code implementations • CVPR 2021 • Xinlei Chen, Kaiming He
Our experiments show that collapsing solutions do exist for the loss and structure, but a stop-gradient operation plays an essential role in preventing collapsing.
Ranked #94 on Self-Supervised Image Classification on ImageNet
Representation Learning Self-Supervised Image Classification
2 code implementations • 1 Oct 2020 • Yuandong Tian, Lantao Yu, Xinlei Chen, Surya Ganguli
We propose a novel theoretical framework to understand contrastive self-supervised learning (SSL) methods that employ dual pairs of deep ReLU networks (e. g., SimCLR).
no code implementations • ECCV 2020 • Medhini Narasimhan, Erik Wijmans, Xinlei Chen, Trevor Darrell, Dhruv Batra, Devi Parikh, Amanpreet Singh
We also demonstrate that reducing the task of room navigation to point navigation improves the performance further.
1 code implementation • 17 Jun 2020 • Corentin Dancette, Remi Cadene, Xinlei Chen, Matthieu Cord
First, we propose the Modifying Count Distribution (MCD) protocol, which penalizes models that over-rely on statistical shortcuts.
1 code implementation • ICLR 2021 • Duy-Kien Nguyen, Vedanuj Goswami, Xinlei Chen
This paper focuses on visual counting, which aims to predict the number of occurrences given a natural image and a query (e. g. a question or a category).
Ranked #1 on Object Counting on HowMany-QA
36 code implementations • 9 Mar 2020 • Xinlei Chen, Haoqi Fan, Ross Girshick, Kaiming He
Contrastive unsupervised learning has recently shown encouraging progress, e. g., in Momentum Contrast (MoCo) and SimCLR.
Ranked #3 on Contrastive Learning on imagenet-1k
1 code implementation • CVPR 2020 • Charles R. Qi, Xinlei Chen, Or Litany, Leonidas J. Guibas
Compared to prior work on multi-modal detection, we explicitly extract both geometric and semantic features from the 2D images.
Ranked #2 on 3D Object Detection on SUN-RGBD (using extra training data)
2 code implementations • CVPR 2020 • Huaizu Jiang, Ishan Misra, Marcus Rohrbach, Erik Learned-Miller, Xinlei Chen
Popularized as 'bottom-up' attention, bounding box (or region) based visual features have recently surpassed vanilla grid-based convolutional features as the de facto standard for vision and language tasks like visual question answering (VQA).
Ranked #18 on Visual Question Answering (VQA) on VQA v2 test-std
7 code implementations • CVPR 2019 • Amanpreet Singh, Vivek Natarajan, Meet Shah, Yu Jiang, Xinlei Chen, Dhruv Batra, Devi Parikh, Marcus Rohrbach
We show that LoRRA outperforms existing state-of-the-art VQA models on our TextVQA dataset.
Ranked #3 on Visual Question Answering (VQA) on VizWiz 2018
no code implementations • ICCV 2019 • Yuyin Zhou, Zhe Li, Song Bai, Chong Wang, Xinlei Chen, Mei Han, Elliot Fishman, Alan Yuille
Accurate multi-organ abdominal CT segmentation is essential to many clinical applications such as computer-aided intervention.
no code implementations • 9 Apr 2019 • Jianwei Yang, Zhile Ren, Mingze Xu, Xinlei Chen, David Crandall, Devi Parikh, Dhruv Batra
Passive visual systems typically fail to recognize objects in the amodal setting where they are heavily occluded.
1 code implementation • CVPR 2019 • Licheng Yu, Xinlei Chen, Georgia Gkioxari, Mohit Bansal, Tamara L. Berg, Dhruv Batra
To address this, we propose a modular architecture composed of a program generator, a controller, a navigator, and a VQA module.
2 code implementations • ICCV 2019 • Xinlei Chen, Ross Girshick, Kaiming He, Piotr Dollár
To formalize this, we treat dense instance segmentation as a prediction task over 4D tensors and present a general framework called TensorMask that explicitly captures this geometry and enables novel operators on 4D tensors.
Ranked #90 on Instance Segmentation on COCO test-dev
no code implementations • CVPR 2019 • Meet Shah, Xinlei Chen, Marcus Rohrbach, Devi Parikh
Despite significant progress in Visual Question Answering over the years, robustness of today's VQA models leave much to be desired.
2 code implementations • ICCV 2019 • Harsh Agrawal, Karan Desai, YuFei Wang, Xinlei Chen, Rishabh Jain, Mark Johnson, Dhruv Batra, Devi Parikh, Stefan Lee, Peter Anderson
To encourage the development of image captioning models that can learn visual concepts from alternative data sources, such as object detection datasets, we present the first large-scale benchmark for this task.
2 code implementations • CVPR 2019 • Luowei Zhou, Yannis Kalantidis, Xinlei Chen, Jason J. Corso, Marcus Rohrbach
Our dataset, ActivityNet-Entities, augments the challenging ActivityNet Captions dataset with 158k bounding box annotations, each grounding a noun phrase.
9 code implementations • 26 Jul 2018 • Yu Jiang, Vivek Natarajan, Xinlei Chen, Marcus Rohrbach, Dhruv Batra, Devi Parikh
We demonstrate that by making subtle but important changes to the model architecture and the learning rate schedule, fine-tuning image features, and adding data augmentation, we can significantly improve the performance of the up-down model on VQA v2. 0 dataset -- from 65. 67% to 70. 22%.
Ranked #10 on Visual Question Answering (VQA) on A-OKVQA
no code implementations • CVPR 2018 • Xinlei Chen, Li-Jia Li, Li Fei-Fei, Abhinav Gupta
The framework consists of two core modules: a local module that uses spatial memory to store previous beliefs with parallel updates; and a global graph-reasoning module.
2 code implementations • ACL 2019 • Jin-Hwa Kim, Nikita Kitaev, Xinlei Chen, Marcus Rohrbach, Byoung-Tak Zhang, Yuandong Tian, Dhruv Batra, Devi Parikh
The game involves two players: a Teller and a Drawer.
36 code implementations • ICCV 2017 • Xinlei Chen, Abhinav Gupta
On the other hand, modeling object-object relationships requires {\bf spatial} reasoning -- not only do we need a memory to store the spatial layout, but also a effective reasoning module to extract spatial patterns.
1 code implementation • 21 Feb 2017 • Aayush Bansal, Xinlei Chen, Bryan Russell, Abhinav Gupta, Deva Ramanan
We explore design principles for general pixel-level prediction problems, from low-level edge detection to mid-level surface normal estimation to high-level semantic segmentation.
48 code implementations • 7 Feb 2017 • Xinlei Chen, Abhinav Gupta
We adapted the join-training scheme of Faster RCNN framework from Caffe to TensorFlow as a baseline implementation for object detection.
no code implementations • 21 Sep 2016 • Aayush Bansal, Xinlei Chen, Bryan Russell, Abhinav Gupta, Deva Ramanan
We explore architectures for general pixel-level prediction problems, from low-level edge detection to mid-level surface normal estimation to high-level semantic segmentation.
1 code implementation • 14 Apr 2016 • Gunnar A. Sigurdsson, Xinlei Chen, Abhinav Gupta
What does a typical visit to Paris look like?
1 code implementation • NAACL 2016 • Jiwei Li, Xinlei Chen, Eduard Hovy, Dan Jurafsky
While neural networks have been successfully applied to many NLP tasks the resulting vector-based models are very difficult to interpret.
no code implementations • CVPR 2015 • Xinlei Chen, Alan Ritter, Abhinav Gupta, Tom Mitchell
We present a co-clustering framework that can be used to discover multiple semantic and visual senses of a given Noun Phrase (NP).
no code implementations • CVPR 2015 • Xinlei Chen, C. Lawrence Zitnick
Results are better than or comparable to state-of-the-art results on the image and sentence retrieval tasks for methods using similar visual features.
no code implementations • ICCV 2015 • Xinlei Chen, Abhinav Gupta
Specifically inspired by curriculum learning, we present a two-step approach for CNN training.
18 code implementations • 1 Apr 2015 • Xinlei Chen, Hao Fang, Tsung-Yi Lin, Ramakrishna Vedantam, Saurabh Gupta, Piotr Dollar, C. Lawrence Zitnick
In this paper we describe the Microsoft COCO Caption dataset and evaluation server.
no code implementations • 20 Nov 2014 • Xinlei Chen, C. Lawrence Zitnick
Results are better than or comparable to state-of-the-art results on the image and sentence retrieval tasks for methods using similar visual features.
no code implementations • CVPR 2014 • Xinlei Chen, Abhinav Shrivastava, Abhinav Gupta
In this paper, we propose to enrich these knowledge bases by automatically discovering objects and their segmentations from noisy Internet images.