Search Results for author: Tsung-Yi Lin

Found 37 papers, 23 papers with code

Vision Transformer for NeRF-Based View Synthesis from a Single Input Image

no code implementations12 Jul 2022 Kai-En Lin, Lin Yen-Chen, Wei-Sheng Lai, Tsung-Yi Lin, Yi-Chang Shih, Ravi Ramamoorthi

Existing approaches condition on local image features to reconstruct a 3D object, but often render blurry predictions at viewpoints that are far away from the source view.

Novel View Synthesis

A Unified Sequence Interface for Vision Tasks

no code implementations15 Jun 2022 Ting Chen, Saurabh Saxena, Lala Li, Tsung-Yi Lin, David J. Fleet, Geoffrey Hinton

Despite that, by formulating the output of each task as a sequence of discrete tokens with a unified interface, we show that one can train a neural network with a single model architecture and loss function on all these tasks, with no task-specific customization.

Image Captioning Instance Segmentation +4

NeRF-Supervision: Learning Dense Object Descriptors from Neural Radiance Fields

no code implementations3 Mar 2022 Lin Yen-Chen, Pete Florence, Jonathan T. Barron, Tsung-Yi Lin, Alberto Rodriguez, Phillip Isola

In particular, we demonstrate that a NeRF representation of a scene can be used to train dense object descriptors.

Scaling Open-Vocabulary Image Segmentation with Image-Level Labels

no code implementations22 Dec 2021 Golnaz Ghiasi, Xiuye Gu, Yin Cui, Tsung-Yi Lin

We propose OpenSeg to address the above issue while still making use of scalable image-level supervision of captions.

Semantic Segmentation

A Simple Single-Scale Vision Transformer for Object Localization and Instance Segmentation

no code implementations17 Dec 2021 Wuyang Chen, Xianzhi Du, Fan Yang, Lucas Beyer, Xiaohua Zhai, Tsung-Yi Lin, Huizhong Chen, Jing Li, Xiaodan Song, Zhangyang Wang, Denny Zhou

In this paper, we comprehensively study three architecture design choices on ViT -- spatial reduction, doubled channels, and multiscale features -- and demonstrate that a vanilla ViT architecture can fulfill this goal without handcrafting multiscale features, maintaining the original ViT design philosophy.

Image Classification Instance Segmentation +5

Multi-Task Self-Training for Learning General Representations

no code implementations ICCV 2021 Golnaz Ghiasi, Barret Zoph, Ekin D. Cubuk, Quoc V. Le, Tsung-Yi Lin

The results suggest self-training is a promising direction to aggregate labeled and unlabeled training data for learning general feature representations.

Multi-Task Learning Surface Normal Estimation

Patch2CAD: Patchwise Embedding Learning for In-the-Wild Shape Retrieval from a Single Image

no code implementations ICCV 2021 Weicheng Kuo, Anelia Angelova, Tsung-Yi Lin, Angela Dai

3D perception of object shapes from RGB image input is fundamental towards semantic scene understanding, grounding image-based perception in our spatially 3-dimensional real-world environments.

Scene Understanding

Learning Open-World Object Proposals without Learning to Classify

2 code implementations15 Aug 2021 Dahun Kim, Tsung-Yi Lin, Anelia Angelova, In So Kweon, Weicheng Kuo

In this paper, we identify that the problem is that the binary classifiers in existing proposal methods tend to overfit to the training categories.

object-detection Object Discovery +3

Learning to See before Learning to Act: Visual Pre-training for Manipulation

no code implementations1 Jul 2021 Lin Yen-Chen, Andy Zeng, Shuran Song, Phillip Isola, Tsung-Yi Lin

With just a small amount of robotic experience, we can further fine-tune the affordance model to achieve better results.

Transfer Learning

Single Image Texture Translation for Data Augmentation

1 code implementation25 Jun 2021 Boyi Li, Yin Cui, Tsung-Yi Lin, Serge Belongie

In this paper, we explore the use of Single Image Texture Translation (SITT) for data augmentation.

Data Augmentation Few-Shot Image Classification +2

Revisiting ResNets: Improved Training and Scaling Strategies

4 code implementations NeurIPS 2021 Irwan Bello, William Fedus, Xianzhi Du, Ekin D. Cubuk, Aravind Srinivas, Tsung-Yi Lin, Jonathon Shlens, Barret Zoph

Using improved training and scaling strategies, we design a family of ResNet architectures, ResNet-RS, which are 1. 7x - 2. 7x faster than EfficientNets on TPUs, while achieving similar accuracies on ImageNet.

Action Classification Document Image Classification +2

Bottleneck Transformers for Visual Recognition

12 code implementations CVPR 2021 Aravind Srinivas, Tsung-Yi Lin, Niki Parmar, Jonathon Shlens, Pieter Abbeel, Ashish Vaswani

Finally, we present a simple adaptation of the BoTNet design for image classification, resulting in models that achieve a strong performance of 84. 7% top-1 accuracy on the ImageNet benchmark while being up to 1. 64x faster in compute time than the popular EfficientNet models on TPU-v3 hardware.

Image Classification Instance Segmentation +2

INeRF: Inverting Neural Radiance Fields for Pose Estimation

1 code implementation10 Dec 2020 Lin Yen-Chen, Pete Florence, Jonathan T. Barron, Alberto Rodriguez, Phillip Isola, Tsung-Yi Lin

We then show that for complex real-world scenes from the LLFF dataset, iNeRF can improve NeRF by estimating the camera poses of novel images and using these images as additional training data for NeRF.

Pose Estimation

Mask2CAD: 3D Shape Prediction by Learning to Segment and Retrieve

no code implementations ECCV 2020 Wei-cheng Kuo, Anelia Angelova, Tsung-Yi Lin, Angela Dai

We propose to leverage existing large-scale datasets of 3D models to understand the underlying 3D structure of objects seen in an image by constructing a CAD-based representation of the objects and their poses.

Object Recognition

Rethinking Pre-training and Self-training

2 code implementations NeurIPS 2020 Barret Zoph, Golnaz Ghiasi, Tsung-Yi Lin, Yin Cui, Hanxiao Liu, Ekin D. Cubuk, Quoc V. Le

For example, on the COCO object detection dataset, pre-training benefits when we use one fifth of the labeled data, and hurts accuracy when we use all labeled data.

 Ranked #1 on Semantic Segmentation on PASCAL VOC 2012 test (using extra training data)

Data Augmentation object-detection +2

Improving Semantic Segmentation through Spatio-Temporal Consistency Learned from Videos

no code implementations11 Apr 2020 Ankita Pasad, Ariel Gordon, Tsung-Yi Lin, Anelia Angelova

We leverage unsupervised learning of depth, egomotion, and camera intrinsics to improve the performance of single-image semantic segmentation, by enforcing 3D-geometric and temporal consistency of segmentation masks across video frames.

Semantic Segmentation

SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization

5 code implementations CVPR 2020 Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song

We propose SpineNet, a backbone with scale-permuted intermediate features and cross-scale connections that is learned on an object detection task by Neural Architecture Search.

General Classification Image Classification +5

MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices

2 code implementations CVPR 2020 Bo Chen, Golnaz Ghiasi, Hanxiao Liu, Tsung-Yi Lin, Dmitry Kalenichenko, Hartwig Adams, Quoc V. Le

We propose MnasFPN, a mobile-friendly search space for the detection head, and combine it with latency-aware architecture search to produce efficient object detection models.

object-detection Object Detection

Learning Data Augmentation Strategies for Object Detection

6 code implementations ECCV 2020 Barret Zoph, Ekin D. Cubuk, Golnaz Ghiasi, Tsung-Yi Lin, Jonathon Shlens, Quoc V. Le

Importantly, the best policy found on COCO may be transferred unchanged to other detection datasets and models to improve predictive accuracy.

Ranked #62 on Object Detection on COCO test-dev (using extra training data)

Image Augmentation Image Classification +2

ShapeMask: Learning to Segment Novel Objects by Refining Shape Priors

1 code implementation ICCV 2019 Wei-cheng Kuo, Anelia Angelova, Jitendra Malik, Tsung-Yi Lin

However, it is difficult and costly to segment objects in novel categories because a large number of mask annotations is required.

Instance Segmentation Semantic Segmentation

Class-Balanced Loss Based on Effective Number of Samples

7 code implementations CVPR 2019 Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang song, Serge Belongie

We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss.

Image Classification Long-tail Learning

DropBlock: A regularization method for convolutional networks

6 code implementations NeurIPS 2018 Golnaz Ghiasi, Tsung-Yi Lin, Quoc V. Le

This lack of success of dropout for convolutional layers is perhaps due to the fact that activation units in convolutional layers are spatially correlated so information can still flow through convolutional networks despite dropout.

Image Classification Object Detection

Focal Loss for Dense Object Detection

216 code implementations ICCV 2017 Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollár

Our novel Focal Loss focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training.

Dense Object Detection Long-tail Learning +4

A MultiPath Network for Object Detection

1 code implementation7 Apr 2016 Sergey Zagoruyko, Adam Lerer, Tsung-Yi Lin, Pedro O. Pinheiro, Sam Gross, Soumith Chintala, Piotr Dollár

To address these challenges, we test three modifications to the standard Fast R-CNN object detector: (1) skip connections that give the detector access to features at multiple network layers, (2) a foveal structure to exploit object context at multiple object resolutions, and (3) an integral loss function and corresponding network adjustment that improve localization.

Instance Segmentation object-detection +1

Learning to Refine Object Segments

2 code implementations29 Mar 2016 Pedro O. Pinheiro, Tsung-Yi Lin, Ronan Collobert, Piotr Dollàr

In this work we propose to augment feedforward nets for object segmentation with a novel top-down refinement approach.

Semantic Segmentation

Learning Deep Representations for Ground-to-Aerial Geolocalization

no code implementations CVPR 2015 Tsung-Yi Lin, Yin Cui, Serge Belongie, James Hays

Most approaches predict the location of a query image by matching to ground-level images with known locations (e. g., street-view data).

Face Verification

Microsoft COCO: Common Objects in Context

26 code implementations1 May 2014 Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, Piotr Dollár

We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding.

Instance Segmentation Object Localization +3

Cross-View Image Geolocalization

no code implementations CVPR 2013 Tsung-Yi Lin, Serge Belongie, James Hays

On the other hand, there is no shortage of visual and geographic data that densely covers the Earth we examine overhead imagery and land cover survey data but the relationship between this data and ground level query photographs is complex.

Cannot find the paper you are looking for? You can Submit a new open access paper.