Search Results for author: Xinlei Chen

Found 54 papers, 38 papers with code

Massive Activations in Large Language Models

1 code implementation • 27 Feb 2024 • MingJie Sun, Xinlei Chen, J. Zico Kolter, Zhuang Liu

We observe an empirical phenomenon in Large Language Models (LLMs) -- very few activations exhibit significantly larger values than others (e. g., 100, 000 times larger).

Paper
Code

Revisiting Feature Prediction for Learning Visual Representations from Video

1 code implementation • arXiv preprint 2024 • Adrien Bardes, Quentin Garrido, Jean Ponce, Xinlei Chen, Michael Rabbat, Yann Lecun, Mahmoud Assran, Nicolas Ballas

This paper explores feature prediction as a stand-alone objective for unsupervised learning from video and introduces V-JEPA, a collection of vision models trained solely using a feature prediction objective, without the use of pretrained image encoders, text, negative examples, reconstruction, or other sources of supervision.

2,345

Paper
Code

Deconstructing Denoising Diffusion Models for Self-Supervised Learning

1 code implementation • 25 Jan 2024 • Xinlei Chen, Zhuang Liu, Saining Xie, Kaiming He

In this study, we examine the representation learning abilities of Denoising Diffusion Models (DDM) that were originally purposed for image generation.

Denoising Image Generation +3

100

Paper
Code

Learning to (Learn at Test Time)

1 code implementation • 20 Oct 2023 • Yu Sun, Xinhao Li, Karan Dalal, Chloe Hsu, Sanmi Koyejo, Carlos Guestrin, Xiaolong Wang, Tatsunori Hashimoto, Xinlei Chen

Our inner loop turns out to be equivalent to linear attention when the inner-loop learner is only a linear model, and to self-attention when it is a kernel estimator.

Paper
Code

Test-Time Training on Video Streams

no code implementations • 11 Jul 2023 • Renhao Wang, Yu Sun, Yossi Gandelsman, Xinlei Chen, Alexei A. Efros, Xiaolong Wang

Before making a prediction on each test instance, the model is trained on the same instance using a self-supervised task, such as image reconstruction with masked autoencoders.

Image Reconstruction Panoptic Segmentation

Paper
Add Code

Improving Selective Visual Question Answering by Learning from Your Peers

1 code implementation • CVPR 2023 • Corentin Dancette, Spencer Whitehead, Rishabh Maheshwary, Ramakrishna Vedantam, Stefan Scherer, Xinlei Chen, Matthieu Cord, Marcus Rohrbach

In this work, we explore Selective VQA in both in-distribution (ID) and OOD scenarios, where models are presented with mixtures of ID and OOD data.

Question Answering Visual Question Answering

Paper
Code

R-MAE: Regions Meet Masked Autoencoders

1 code implementation • 8 Jun 2023 • Duy-Kien Nguyen, Vaibhav Aggarwal, Yanghao Li, Martin R. Oswald, Alexander Kirillov, Cees G. M. Snoek, Xinlei Chen

In this work, we explore regions as a potential visual analogue of words for self-supervised image representation learning.

Contrastive Learning Interactive Segmentation +4

105

Paper
Code

ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders

10 code implementations • CVPR 2023 • Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, Saining Xie

This co-design of self-supervised learning techniques and architectural improvement results in a new model family called ConvNeXt V2, which significantly improves the performance of pure ConvNets on various recognition benchmarks, including ImageNet classification, COCO detection, and ADE20K segmentation.

Ranked #45 on Semantic Segmentation on ADE20K

Object Detection Representation Learning +2

29,735

Paper
Code

UniT3D: A Unified Transformer for 3D Dense Captioning and Visual Grounding

no code implementations • ICCV 2023 • Dave Zhenyu Chen, Ronghang Hu, Xinlei Chen, Matthias Nießner, Angel X. Chang

Performing 3D dense captioning and visual grounding requires a common and shared understanding of the underlying multimodal relationships.

3D dense captioning Dense Captioning +1

Paper
Add Code

EurNet: Efficient Multi-Range Relational Modeling of Spatial Multi-Relational Data

1 code implementation • 23 Nov 2022 • Minghao Xu, Yuanfan Guo, Yi Xu, Jian Tang, Xinlei Chen, Yuandong Tian

We study EurNets in two important domains for image and protein structure modeling.

Image Classification Instance Segmentation +5

Paper
Code

Exploring Long-Sequence Masked Autoencoders

1 code implementation • 13 Oct 2022 • Ronghang Hu, Shoubhik Debnath, Saining Xie, Xinlei Chen

Masked Autoencoding (MAE) has emerged as an effective approach for pre-training representations across multiple domains.

Object Detection Segmentation +1

Paper
Code

Test-Time Training with Masked Autoencoders

1 code implementation • 15 Sep 2022 • Yossi Gandelsman, Yu Sun, Xinlei Chen, Alexei A. Efros

Test-time training adapts to a new test distribution on the fly by optimizing a model for each test input using self-supervision.

Paper
Code

On the Importance of Asymmetry for Siamese Representation Learning

1 code implementation • CVPR 2022 • Xiao Wang, Haoqi Fan, Yuandong Tian, Daisuke Kihara, Xinlei Chen

Many recent self-supervised frameworks for visual representation learning are based on certain forms of Siamese networks.

Representation Learning

Paper
Code

LoopITR: Combining Dual and Cross Encoder Architectures for Image-Text Retrieval

no code implementations • 10 Mar 2022 • Jie Lei, Xinlei Chen, Ning Zhang, Mengjiao Wang, Mohit Bansal, Tamara L. Berg, Licheng Yu

In this work, we propose LoopITR, which combines them in the same network for joint learning.

Retrieval Text Retrieval

Paper
Add Code

Point-Level Region Contrast for Object Detection Pre-Training

1 code implementation • CVPR 2022 • Yutong Bai, Xinlei Chen, Alexander Kirillov, Alan Yuille, Alexander C. Berg

In this work we present point-level region contrast, a self-supervised pre-training approach for the task of object detection.

Contrastive Learning Knowledge Distillation +2

Paper
Code

Benchmarking Detection Transfer Learning with Vision Transformers

2 code implementations • 22 Nov 2021 • Yanghao Li, Saining Xie, Xinlei Chen, Piotr Dollar, Kaiming He, Ross Girshick

The complexity of object detection methods can make this benchmarking non-trivial when new architectures, such as Vision Transformer (ViT) models, arrive.

Benchmarking object-detection +3

326

Paper
Code

Masked Autoencoders Are Scalable Vision Learners

49 code implementations • CVPR 2022 • Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick

Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels.

Ranked #1 on Out-of-Distribution Generalization on ImageNet-W

Domain Generalization Object Detection +4

6,748

Paper
Code

Towards Demystifying Representation Learning with Non-contrastive Self-supervision

2 code implementations • 11 Oct 2021 • Xiang Wang, Xinlei Chen, Simon S. Du, Yuandong Tian

Non-contrastive methods of self-supervised learning (such as BYOL and SimSiam) learn representations by minimizing the distance between two views of the same image.

Representation Learning Self-Supervised Learning

Paper
Code

NASViT: Neural Architecture Search for Efficient Vision Transformers with Gradient Conflict aware Supernet Training

1 code implementation • ICLR 2022 • Chengyue Gong, Dilin Wang, Meng Li, Xinlei Chen, Zhicheng Yan, Yuandong Tian, Qiang Liu, Vikas Chandra

In this work, we observe that the poor performance is due to a gradient conflict issue: the gradients of different sub-networks conflict with that of the supernet more severely in ViTs than CNNs, which leads to early saturation in training and inferior convergence.

Ranked #7 on Neural Architecture Search on ImageNet

Data Augmentation Image Classification +2

Paper
Code

A Data-Efficient Approach to Behind-the-Meter Solar Generation Disaggregation

no code implementations • 17 May 2021 • Xinlei Chen, Moosa Moghimi Haji, Omid Ardakanian

With the emergence of cost effective battery storage and the decline in the solar photovoltaic (PV) levelized cost of energy (LCOE), the number of behind-the-meter solar PV systems is expected to increase steadily.

Non-Intrusive Load Monitoring

Paper
Add Code

An Empirical Study of Training Self-Supervised Vision Transformers

8 code implementations • ICCV 2021 • Xinlei Chen, Saining Xie, Kaiming He

In this work, we go back to basics and investigate the effects of several fundamental components for training self-supervised ViT.

Ranked #1 on Out-of-Distribution Generalization on ImageNet-W

Out-of-Distribution Generalization Self-Supervised Image Classification +1

3,082

Paper
Code

Understanding self-supervised Learning Dynamics without Contrastive Pairs

5 code implementations • 12 Feb 2021 • Yuandong Tian, Xinlei Chen, Surya Ganguli

While contrastive approaches of self-supervised learning (SSL) learn representations by minimizing the distance between two augmented views of the same data point (positive pairs) and maximizing views from different data points (negative pairs), recent \emph{non-contrastive} SSL (e. g., BYOL and SimSiam) show remarkable performance {\it without} negative pairs, with an extra learnable predictor and a stop-gradient operation.

Self-Supervised Learning

1,687

Paper
Code

KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA

no code implementations • CVPR 2021 • Kenneth Marino, Xinlei Chen, Devi Parikh, Abhinav Gupta, Marcus Rohrbach

One of the most challenging question types in VQA is when answering the question requires outside knowledge not present in the image.

Ranked #6 on Visual Question Answering (VQA) on A-OKVQA

Visual Question Answering (VQA)

Paper
Add Code

Exploring Simple Siamese Representation Learning

26 code implementations • CVPR 2021 • Xinlei Chen, Kaiming He

Our experiments show that collapsing solutions do exist for the loss and structure, but a stop-gradient operation plays an essential role in preventing collapsing.

Ranked #94 on Self-Supervised Image Classification on ImageNet

Representation Learning Self-Supervised Image Classification

3,082

Paper
Code

Understanding Self-supervised Learning with Dual Deep Networks

2 code implementations • 1 Oct 2020 • Yuandong Tian, Lantao Yu, Xinlei Chen, Surya Ganguli

We propose a novel theoretical framework to understand contrastive self-supervised learning (SSL) methods that employ dual pairs of deep ReLU networks (e. g., SimCLR).

Self-Supervised Learning

265

Paper
Code

Seeing the Un-Scene: Learning Amodal Semantic Maps for Room Navigation

no code implementations • ECCV 2020 • Medhini Narasimhan, Erik Wijmans, Xinlei Chen, Trevor Darrell, Dhruv Batra, Devi Parikh, Amanpreet Singh

We also demonstrate that reducing the task of room navigation to point navigation improves the performance further.

Navigate

Paper
Add Code

Overcoming Statistical Shortcuts for Open-ended Visual Counting

1 code implementation • 17 Jun 2020 • Corentin Dancette, Remi Cadene, Xinlei Chen, Matthieu Cord

First, we propose the Modifying Count Distribution (MCD) protocol, which penalizes models that over-rely on statistical shortcuts.

Paper
Code

MoVie: Revisiting Modulated Convolutions for Visual Counting and Beyond

1 code implementation • ICLR 2021 • Duy-Kien Nguyen, Vedanuj Goswami, Xinlei Chen

This paper focuses on visual counting, which aims to predict the number of occurrences given a natural image and a query (e. g. a question or a category).

Ranked #1 on Object Counting on HowMany-QA

Object Counting Question Answering +1

5,415

Paper
Code

Improved Baselines with Momentum Contrastive Learning

36 code implementations • 9 Mar 2020 • Xinlei Chen, Haoqi Fan, Ross Girshick, Kaiming He

Contrastive unsupervised learning has recently shown encouraging progress, e. g., in Momentum Contrast (MoCo) and SimCLR.

Ranked #3 on Contrastive Learning on imagenet-1k

Contrastive Learning Data Augmentation +3

27,765

Paper
Code

ImVoteNet: Boosting 3D Object Detection in Point Clouds with Image Votes

1 code implementation • CVPR 2020 • Charles R. Qi, Xinlei Chen, Or Litany, Leonidas J. Guibas

Compared to prior work on multi-modal detection, we explicitly extract both geometric and semantic features from the 2D images.

Ranked #2 on 3D Object Detection on SUN-RGBD (using extra training data)

3D Object Detection object-detection +1

118

Paper
Code

In Defense of Grid Features for Visual Question Answering

2 code implementations • CVPR 2020 • Huaizu Jiang, Ishan Misra, Marcus Rohrbach, Erik Learned-Miller, Xinlei Chen

Popularized as 'bottom-up' attention, bounding box (or region) based visual features have recently surpassed vanilla grid-based convolutional features as the de facto standard for vision and language tasks like visual question answering (VQA).

Ranked #18 on Visual Question Answering (VQA) on VQA v2 test-std

Image Captioning Question Answering +1

381

Paper
Code

Towards VQA Models That Can Read

7 code implementations • CVPR 2019 • Amanpreet Singh, Vivek Natarajan, Meet Shah, Yu Jiang, Xinlei Chen, Dhruv Batra, Devi Parikh, Marcus Rohrbach

We show that LoRRA outperforms existing state-of-the-art VQA models on our TextVQA dataset.

Ranked #3 on Visual Question Answering (VQA) on VizWiz 2018

Visual Question Answering (VQA)

5,414

Paper
Code

Prior-aware Neural Network for Partially-Supervised Multi-Organ Segmentation

no code implementations • ICCV 2019 • Yuyin Zhou, Zhe Li, Song Bai, Chong Wang, Xinlei Chen, Mei Han, Elliot Fishman, Alan Yuille

Accurate multi-organ abdominal CT segmentation is essential to many clinical applications such as computer-aided intervention.

Medical Image Segmentation Organ Segmentation +2

Paper
Add Code

Embodied Visual Recognition

no code implementations • 9 Apr 2019 • Jianwei Yang, Zhile Ren, Mingze Xu, Xinlei Chen, David Crandall, Devi Parikh, Dhruv Batra

Passive visual systems typically fail to recognize objects in the amodal setting where they are heavily occluded.

Object Object Localization +1

Paper
Add Code

Multi-Target Embodied Question Answering

1 code implementation • CVPR 2019 • Licheng Yu, Xinlei Chen, Georgia Gkioxari, Mohit Bansal, Tamara L. Berg, Dhruv Batra

To address this, we propose a modular architecture composed of a program generator, a controller, a navigator, and a VQA module.

Embodied Question Answering Navigate +1

287

Paper
Code

TensorMask: A Foundation for Dense Object Segmentation

2 code implementations • ICCV 2019 • Xinlei Chen, Ross Girshick, Kaiming He, Piotr Dollár

To formalize this, we treat dense instance segmentation as a prediction task over 4D tensors and present a general framework called TensorMask that explicitly captures this geometry and enables novel operators on 4D tensors.

Ranked #90 on Instance Segmentation on COCO test-dev

Instance Segmentation Object +4

28,686

Paper
Code

Cycle-Consistency for Robust Visual Question Answering

no code implementations • CVPR 2019 • Meet Shah, Xinlei Chen, Marcus Rohrbach, Devi Parikh

Despite significant progress in Visual Question Answering over the years, robustness of today's VQA models leave much to be desired.

Question Answering Question Generation +2

Paper
Add Code

nocaps: novel object captioning at scale

2 code implementations • ICCV 2019 • Harsh Agrawal, Karan Desai, YuFei Wang, Xinlei Chen, Rishabh Jain, Mark Johnson, Dhruv Batra, Devi Parikh, Stefan Lee, Peter Anderson

To encourage the development of image captioning models that can learn visual concepts from alternative data sources, such as object detection datasets, we present the first large-scale benchmark for this task.

Image Captioning Object +2

Paper
Code

Grounded Video Description

2 code implementations • CVPR 2019 • Luowei Zhou, Yannis Kalantidis, Xinlei Chen, Jason J. Corso, Marcus Rohrbach

Our dataset, ActivityNet-Entities, augments the challenging ActivityNet Captions dataset with 158k bounding box annotations, each grounding a noun phrase.

Sentence Video Description

311

Paper
Code

Pythia v0.1: the Winning Entry to the VQA Challenge 2018

9 code implementations • 26 Jul 2018 • Yu Jiang, Vivek Natarajan, Xinlei Chen, Marcus Rohrbach, Dhruv Batra, Devi Parikh

We demonstrate that by making subtle but important changes to the model architecture and the learning rate schedule, fine-tuning image features, and adding data augmentation, we can significantly improve the performance of the up-down model on VQA v2. 0 dataset -- from 65. 67% to 70. 22%.

Ranked #10 on Visual Question Answering (VQA) on A-OKVQA

Data Augmentation Visual Question Answering (VQA)

5,414

Paper
Code

Iterative Visual Reasoning Beyond Convolutions

no code implementations • CVPR 2018 • Xinlei Chen, Li-Jia Li, Li Fei-Fei, Abhinav Gupta

The framework consists of two core modules: a local module that uses spatial memory to store previous beliefs with parallel updates; and a global graph-reasoning module.

Visual Reasoning

Paper
Add Code

CoDraw: Collaborative Drawing as a Testbed for Grounded Goal-driven Communication

2 code implementations • ACL 2019 • Jin-Hwa Kim, Nikita Kitaev, Xinlei Chen, Marcus Rohrbach, Byoung-Tak Zhang, Yuandong Tian, Dhruv Batra, Devi Parikh

The game involves two players: a Teller and a Drawer.

Imitation Learning

Paper
Code

Spatial Memory for Context Reasoning in Object Detection

36 code implementations • ICCV 2017 • Xinlei Chen, Abhinav Gupta

On the other hand, modeling object-object relationships requires {\bf spatial} reasoning -- not only do we need a memory to store the spatial layout, but also a effective reasoning module to extract spatial patterns.

Object Object Detection

3,642

Paper
Code

PixelNet: Representation of the pixels, by the pixels, and for the pixels

1 code implementation • 21 Feb 2017 • Aayush Bansal, Xinlei Chen, Bryan Russell, Abhinav Gupta, Deva Ramanan

We explore design principles for general pixel-level prediction problems, from low-level edge detection to mid-level surface normal estimation to high-level semantic segmentation.

Edge Detection Segmentation +2

Paper
Code

An Implementation of Faster RCNN with Study for Region Sampling

48 code implementations • 7 Feb 2017 • Xinlei Chen, Abhinav Gupta

We adapted the join-training scheme of Faster RCNN framework from Caffe to TensorFlow as a baseline implementation for object detection.

General Classification Object Detection

3,642

Paper
Code

PixelNet: Towards a General Pixel-level Architecture

no code implementations • 21 Sep 2016 • Aayush Bansal, Xinlei Chen, Bryan Russell, Abhinav Gupta, Deva Ramanan

We explore architectures for general pixel-level prediction problems, from low-level edge detection to mid-level surface normal estimation to high-level semantic segmentation.

Edge Detection Semantic Segmentation +1

Paper
Add Code

Learning Visual Storylines with Skipping Recurrent Neural Networks

1 code implementation • 14 Apr 2016 • Gunnar A. Sigurdsson, Xinlei Chen, Abhinav Gupta

What does a typical visit to Paris look like?

Paper
Code

Visualizing and Understanding Neural Models in NLP

1 code implementation • NAACL 2016 • Jiwei Li, Xinlei Chen, Eduard Hovy, Dan Jurafsky

While neural networks have been successfully applied to many NLP tasks the resulting vector-based models are very difficult to interpret.

Negation Sentence

Paper
Code

Sense Discovery via Co-Clustering on Images and Text

no code implementations • CVPR 2015 • Xinlei Chen, Alan Ritter, Abhinav Gupta, Tom Mitchell

We present a co-clustering framework that can be used to discover multiple semantic and visual senses of a given Noun Phrase (NP).

Clustering

Paper
Add Code

Mind's Eye: A Recurrent Visual Representation for Image Caption Generation

no code implementations • CVPR 2015 • Xinlei Chen, C. Lawrence Zitnick

Results are better than or comparable to state-of-the-art results on the image and sentence retrieval tasks for methods using similar visual features.

Caption Generation Image Retrieval +2

Paper
Add Code

Webly Supervised Learning of Convolutional Networks

no code implementations • ICCV 2015 • Xinlei Chen, Abhinav Gupta

Specifically inspired by curriculum learning, we present a two-step approach for CNN training.

Image Retrieval

Paper
Add Code

Microsoft COCO Captions: Data Collection and Evaluation Server

18 code implementations • 1 Apr 2015 • Xinlei Chen, Hao Fang, Tsung-Yi Lin, Ramakrishna Vedantam, Saurabh Gupta, Piotr Dollar, C. Lawrence Zitnick

In this paper we describe the Microsoft COCO Caption dataset and evaluation server.

Caption Generation

1,062

Paper
Code

Learning a Recurrent Visual Representation for Image Caption Generation

no code implementations • 20 Nov 2014 • Xinlei Chen, C. Lawrence Zitnick

Results are better than or comparable to state-of-the-art results on the image and sentence retrieval tasks for methods using similar visual features.

Caption Generation Image Retrieval +2

Paper
Add Code

Enriching Visual Knowledge Bases via Object Discovery and Segmentation

no code implementations • CVPR 2014 • Xinlei Chen, Abhinav Shrivastava, Abhinav Gupta

In this paper, we propose to enrich these knowledge bases by automatically discovering objects and their segmentations from noisy Internet images.

Object Discovery Segmentation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.