Search Results for author: Tete Xiao

Found 17 papers, 11 papers with code

Masked Visual Pre-training for Motor Control

no code implementations11 Mar 2022 Tete Xiao, Ilija Radosavovic, Trevor Darrell, Jitendra Malik

This paper shows that self-supervised visual pre-training from real-world images is effective for learning motor control tasks from pixels.

Early Convolutions Help Transformers See Better

1 code implementation NeurIPS 2021 Tete Xiao, Mannat Singh, Eric Mintun, Trevor Darrell, Piotr Dollár, Ross Girshick

To test whether this atypical design choice causes an issue, we analyze the optimization behavior of ViT models with their original patchify stem versus a simple counterpart where we replace the ViT stem by a small number of stacked stride-two 3*3 convolutions.

Region Similarity Representation Learning

1 code implementation ICCV 2021 Tete Xiao, Colorado J Reed, Xiaolong Wang, Kurt Keutzer, Trevor Darrell

We present Region Similarity Representation Learning (ReSim), a new approach to self-supervised representation learning for localization-based tasks such as object detection and segmentation.

Instance Segmentation Object Detection +3

Reducing Class Collapse in Metric Learning with Easy Positive Sampling

no code implementations28 Sep 2020 Elad Levi, Tete Xiao, Xiaolong Wang, Trevor Darrell

We theoretically prove and empirically show that under reasonable noise assumptions, prevalent embedding losses in metric learning, e. g., triplet loss, tend to project all samples of a class with various modes onto a single point in the embedding space, resulting in a class collapse that usually renders the space ill-sorted for classification or retrieval.

Image Retrieval Metric Learning

What Should Not Be Contrastive in Contrastive Learning

no code implementations ICLR 2021 Tete Xiao, Xiaolong Wang, Alexei A. Efros, Trevor Darrell

Recent self-supervised contrastive methods have been able to produce impressive transferable visual representations by learning to be invariant to different data augmentations.

Contrastive Learning

Rethinking preventing class-collapsing in metric learning with margin-based losses

no code implementations ICCV 2021 Elad Levi, Tete Xiao, Xiaolong Wang, Trevor Darrell

We theoretically prove and empirically show that under reasonable noise assumptions, margin-based losses tend to project all samples of a class with various modes onto a single point in the embedding space, resulting in a class collapse that usually renders the space ill-sorted for classification or retrieval.

Image Retrieval Metric Learning

Something-Else: Compositional Action Recognition with Spatial-Temporal Interaction Networks

1 code implementation CVPR 2020 Joanna Materzynska, Tete Xiao, Roei Herzig, Huijuan Xu, Xiaolong Wang, Trevor Darrell

Human action is naturally compositional: humans can easily recognize and perform actions with objects that are different from those used in training demonstrations.

Action Recognition

Reasoning About Human-Object Interactions Through Dual Attention Networks

no code implementations ICCV 2019 Tete Xiao, Quanfu Fan, Dan Gutfreund, Mathew Monfort, Aude Oliva, Bolei Zhou

The model not only finds when an action is happening and which object is being manipulated, but also identifies which part of the object is being interacted with.

Human-Object Interaction Detection

Acquisition of Localization Confidence for Accurate Object Detection

4 code implementations ECCV 2018 Borui Jiang, Ruixuan Luo, Jiayuan Mao, Tete Xiao, Yuning Jiang

The network acquires this confidence of localization, which improves the NMS procedure by preserving accurately localized bounding boxes.

General Classification Object Detection

Unified Perceptual Parsing for Scene Understanding

19 code implementations ECCV 2018 Tete Xiao, Yingcheng Liu, Bolei Zhou, Yuning Jiang, Jian Sun

In this paper, we study a new task called Unified Perceptual Parsing, which requires the machine vision systems to recognize as many visual concepts as possible from a given image.

Scene Understanding Semantic Segmentation

Learning Visually-Grounded Semantics from Contrastive Adversarial Samples

1 code implementation COLING 2018 Haoyue Shi, Jiayuan Mao, Tete Xiao, Yuning Jiang, Jian Sun

Begin with an insightful adversarial attack on VSE embeddings, we show the limitation of current frameworks and image-text datasets (e. g., MS-COCO) both quantitatively and qualitatively.

Adversarial Attack Image Captioning

CrowdHuman: A Benchmark for Detecting Human in a Crowd

1 code implementation30 Apr 2018 Shuai Shao, Zijian Zhao, Boxun Li, Tete Xiao, Gang Yu, Xiangyu Zhang, Jian Sun

There are a total of $470K$ human instances from the train and validation subsets, and $~22. 6$ persons per image, with various kinds of occlusions in the dataset.

Ranked #5 on Pedestrian Detection on Caltech (using extra training data)

Human Detection Object Detection +1

Repulsion Loss: Detecting Pedestrians in a Crowd

2 code implementations CVPR 2018 Xinlong Wang, Tete Xiao, Yuning Jiang, Shuai Shao, Jian Sun, Chunhua Shen

In this paper, we first explore how a state-of-the-art pedestrian detector is harmed by crowd occlusion via experimentation, providing insights into the crowd occlusion problem.

Ranked #7 on Pedestrian Detection on Caltech (using extra training data)

Pedestrian Detection

MegDet: A Large Mini-Batch Object Detector

6 code implementations CVPR 2018 Chao Peng, Tete Xiao, Zeming Li, Yuning Jiang, Xiangyu Zhang, Kai Jia, Gang Yu, Jian Sun

The improvements in recent CNN-based object detection works, from R-CNN [11], Fast/Faster R-CNN [10, 31] to recent Mask R-CNN [14] and RetinaNet [24], mainly come from new network, new framework, or novel loss design.

Object Detection

What Can Help Pedestrian Detection?

no code implementations CVPR 2017 Jiayuan Mao, Tete Xiao, Yuning Jiang, Zhimin Cao

Aggregating extra features has been considered as an effective approach to boost traditional pedestrian detection methods.

Pedestrian Detection

Semantic Understanding of Scenes through the ADE20K Dataset

21 code implementations18 Aug 2016 Bolei Zhou, Hang Zhao, Xavier Puig, Tete Xiao, Sanja Fidler, Adela Barriuso, Antonio Torralba

Scene parsing, or recognizing and segmenting objects and stuff in an image, is one of the key problems in computer vision.

Scene Parsing Semantic Segmentation

Cannot find the paper you are looking for? You can Submit a new open access paper.