Search Results for author: Tsung-Yi Lin

Found 42 papers, 28 papers with code

Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation

no code implementations CVPR 2024 Yunhao Ge, Xiaohui Zeng, Jacob Samuel Huffman, Tsung-Yi Lin, Ming-Yu Liu, Yin Cui

VFC consists of three steps: 1) proposal, where image-to-text captioning models propose multiple initial captions; 2) verification, where a large language model (LLM) utilizes tools such as object detection and VQA models to fact-check proposed captions; 3) captioning, where an LLM generates the final caption by summarizing caption proposals and the fact check verification results.

Caption Generation Hallucination +7

ATT3D: Amortized Text-to-3D Object Synthesis

no code implementations ICCV 2023 Jonathan Lorraine, Kevin Xie, Xiaohui Zeng, Chen-Hsuan Lin, Towaki Takikawa, Nicholas Sharp, Tsung-Yi Lin, Ming-Yu Liu, Sanja Fidler, James Lucas

Text-to-3D modelling has seen exciting progress by combining generative text-to-image models with image-to-3D methods like Neural Radiance Fields.

Image to 3D Object +1

Motion-Conditioned Diffusion Model for Controllable Video Synthesis

no code implementations27 Apr 2023 Tsai-Shien Chen, Chieh Hubert Lin, Hung-Yu Tseng, Tsung-Yi Lin, Ming-Hsuan Yang

In response to this gap, we introduce MCDiff, a conditional diffusion model that generates a video from a starting image frame and a set of strokes, which allow users to specify the intended content and dynamics for synthesis.

Motion Synthesis

Magic3D: High-Resolution Text-to-3D Content Creation

1 code implementation CVPR 2023 Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, Tsung-Yi Lin

DreamFusion has recently demonstrated the utility of a pre-trained text-to-image diffusion model to optimize Neural Radiance Fields (NeRF), achieving remarkable text-to-3D synthesis results.

Text to 3D Vocal Bursts Intensity Prediction

Optimizing Anchor-based Detectors for Autonomous Driving Scenes

no code implementations11 Aug 2022 Xianzhi Du, Wei-Chih Hung, Tsung-Yi Lin

This paper summarizes model improvements and inference-time optimizations for the popular anchor-based detectors in the scenes of autonomous driving.

Autonomous Driving

Vision Transformer for NeRF-Based View Synthesis from a Single Input Image

1 code implementation12 Jul 2022 Kai-En Lin, Lin Yen-Chen, Wei-Sheng Lai, Tsung-Yi Lin, Yi-Chang Shih, Ravi Ramamoorthi

Existing approaches condition on local image features to reconstruct a 3D object, but often render blurry predictions at viewpoints that are far away from the source view.

Novel View Synthesis

A Unified Sequence Interface for Vision Tasks

1 code implementation15 Jun 2022 Ting Chen, Saurabh Saxena, Lala Li, Tsung-Yi Lin, David J. Fleet, Geoffrey Hinton

Despite that, by formulating the output of each task as a sequence of discrete tokens with a unified interface, we show that one can train a neural network with a single model architecture and loss function on all these tasks, with no task-specific customization.

Image Captioning Instance Segmentation +2

NeRF-Supervision: Learning Dense Object Descriptors from Neural Radiance Fields

no code implementations3 Mar 2022 Lin Yen-Chen, Pete Florence, Jonathan T. Barron, Tsung-Yi Lin, Alberto Rodriguez, Phillip Isola

In particular, we demonstrate that a NeRF representation of a scene can be used to train dense object descriptors.

Scaling Open-Vocabulary Image Segmentation with Image-Level Labels

1 code implementation22 Dec 2021 Golnaz Ghiasi, Xiuye Gu, Yin Cui, Tsung-Yi Lin

We propose OpenSeg to address the above issue while still making use of scalable image-level supervision of captions.

Image Segmentation Segmentation +1

A Simple Single-Scale Vision Transformer for Object Localization and Instance Segmentation

3 code implementations17 Dec 2021 Wuyang Chen, Xianzhi Du, Fan Yang, Lucas Beyer, Xiaohua Zhai, Tsung-Yi Lin, Huizhong Chen, Jing Li, Xiaodan Song, Zhangyang Wang, Denny Zhou

In this paper, we comprehensively study three architecture design choices on ViT -- spatial reduction, doubled channels, and multiscale features -- and demonstrate that a vanilla ViT architecture can fulfill this goal without handcrafting multiscale features, maintaining the original ViT design philosophy.

Image Classification Instance Segmentation +6

Multi-Task Self-Training for Learning General Representations

no code implementations ICCV 2021 Golnaz Ghiasi, Barret Zoph, Ekin D. Cubuk, Quoc V. Le, Tsung-Yi Lin

The results suggest self-training is a promising direction to aggregate labeled and unlabeled training data for learning general feature representations.

Multi-Task Learning Partially Labeled Datasets +1

Patch2CAD: Patchwise Embedding Learning for In-the-Wild Shape Retrieval from a Single Image

no code implementations ICCV 2021 Weicheng Kuo, Anelia Angelova, Tsung-Yi Lin, Angela Dai

3D perception of object shapes from RGB image input is fundamental towards semantic scene understanding, grounding image-based perception in our spatially 3-dimensional real-world environments.

Retrieval Scene Understanding

Learning Open-World Object Proposals without Learning to Classify

3 code implementations15 Aug 2021 Dahun Kim, Tsung-Yi Lin, Anelia Angelova, In So Kweon, Weicheng Kuo

In this paper, we identify that the problem is that the binary classifiers in existing proposal methods tend to overfit to the training categories.

Object object-detection +4

Learning to See before Learning to Act: Visual Pre-training for Manipulation

no code implementations1 Jul 2021 Lin Yen-Chen, Andy Zeng, Shuran Song, Phillip Isola, Tsung-Yi Lin

With just a small amount of robotic experience, we can further fine-tune the affordance model to achieve better results.

Transfer Learning

Simple Training Strategies and Model Scaling for Object Detection

1 code implementation30 Jun 2021 Xianzhi Du, Barret Zoph, Wei-Chih Hung, Tsung-Yi Lin

We benchmark these improvements on the vanilla ResNet-FPN backbone with RetinaNet and RCNN detectors.

Instance Segmentation Object +3

Revisiting ResNets: Improved Training and Scaling Strategies

3 code implementations NeurIPS 2021 Irwan Bello, William Fedus, Xianzhi Du, Ekin D. Cubuk, Aravind Srinivas, Tsung-Yi Lin, Jonathon Shlens, Barret Zoph

Using improved training and scaling strategies, we design a family of ResNet architectures, ResNet-RS, which are 1. 7x - 2. 7x faster than EfficientNets on TPUs, while achieving similar accuracies on ImageNet.

Action Classification Document Image Classification +2

Bottleneck Transformers for Visual Recognition

13 code implementations CVPR 2021 Aravind Srinivas, Tsung-Yi Lin, Niki Parmar, Jonathon Shlens, Pieter Abbeel, Ashish Vaswani

Finally, we present a simple adaptation of the BoTNet design for image classification, resulting in models that achieve a strong performance of 84. 7% top-1 accuracy on the ImageNet benchmark while being up to 1. 64x faster in compute time than the popular EfficientNet models on TPU-v3 hardware.

Image Classification Instance Segmentation +3

INeRF: Inverting Neural Radiance Fields for Pose Estimation

1 code implementation10 Dec 2020 Lin Yen-Chen, Pete Florence, Jonathan T. Barron, Alberto Rodriguez, Phillip Isola, Tsung-Yi Lin

We then show that for complex real-world scenes from the LLFF dataset, iNeRF can improve NeRF by estimating the camera poses of novel images and using these images as additional training data for NeRF.

Object Pose Estimation

Mask2CAD: 3D Shape Prediction by Learning to Segment and Retrieve

no code implementations ECCV 2020 Wei-cheng Kuo, Anelia Angelova, Tsung-Yi Lin, Angela Dai

We propose to leverage existing large-scale datasets of 3D models to understand the underlying 3D structure of objects seen in an image by constructing a CAD-based representation of the objects and their poses.

Image to 3D Object +3

Rethinking Pre-training and Self-training

2 code implementations NeurIPS 2020 Barret Zoph, Golnaz Ghiasi, Tsung-Yi Lin, Yin Cui, Hanxiao Liu, Ekin D. Cubuk, Quoc V. Le

For example, on the COCO object detection dataset, pre-training benefits when we use one fifth of the labeled data, and hurts accuracy when we use all labeled data.

Data Augmentation Object +4

Improving Semantic Segmentation through Spatio-Temporal Consistency Learned from Videos

no code implementations11 Apr 2020 Ankita Pasad, Ariel Gordon, Tsung-Yi Lin, Anelia Angelova

We leverage unsupervised learning of depth, egomotion, and camera intrinsics to improve the performance of single-image semantic segmentation, by enforcing 3D-geometric and temporal consistency of segmentation masks across video frames.

Segmentation Semantic Segmentation

SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization

13 code implementations CVPR 2020 Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song

We propose SpineNet, a backbone with scale-permuted intermediate features and cross-scale connections that is learned on an object detection task by Neural Architecture Search.

Decoder General Classification +6

MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices

2 code implementations CVPR 2020 Bo Chen, Golnaz Ghiasi, Hanxiao Liu, Tsung-Yi Lin, Dmitry Kalenichenko, Hartwig Adams, Quoc V. Le

We propose MnasFPN, a mobile-friendly search space for the detection head, and combine it with latency-aware architecture search to produce efficient object detection models.

object-detection Object Detection

ShapeMask: Learning to Segment Novel Objects by Refining Shape Priors

1 code implementation ICCV 2019 Wei-cheng Kuo, Anelia Angelova, Jitendra Malik, Tsung-Yi Lin

However, it is difficult and costly to segment objects in novel categories because a large number of mask annotations is required.

Instance Segmentation Object +1

Class-Balanced Loss Based on Effective Number of Samples

8 code implementations CVPR 2019 Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang song, Serge Belongie

We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss.

Image Classification Long-tail Learning

DropBlock: A regularization method for convolutional networks

6 code implementations NeurIPS 2018 Golnaz Ghiasi, Tsung-Yi Lin, Quoc V. Le

This lack of success of dropout for convolutional layers is perhaps due to the fact that activation units in convolutional layers are spatially correlated so information can still flow through convolutional networks despite dropout.

Image Classification Object Detection

Focal Loss for Dense Object Detection

230 code implementations ICCV 2017 Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollár

Our novel Focal Loss focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training.

Dense Object Detection Knowledge Distillation +5

A MultiPath Network for Object Detection

1 code implementation7 Apr 2016 Sergey Zagoruyko, Adam Lerer, Tsung-Yi Lin, Pedro O. Pinheiro, Sam Gross, Soumith Chintala, Piotr Dollár

To address these challenges, we test three modifications to the standard Fast R-CNN object detector: (1) skip connections that give the detector access to features at multiple network layers, (2) a foveal structure to exploit object context at multiple object resolutions, and (3) an integral loss function and corresponding network adjustment that improve localization.

Instance Segmentation Object +2

Learning to Refine Object Segments

2 code implementations29 Mar 2016 Pedro O. Pinheiro, Tsung-Yi Lin, Ronan Collobert, Piotr Dollàr

In this work we propose to augment feedforward nets for object segmentation with a novel top-down refinement approach.

Object Semantic Segmentation

Learning Deep Representations for Ground-to-Aerial Geolocalization

no code implementations CVPR 2015 Tsung-Yi Lin, Yin Cui, Serge Belongie, James Hays

Most approaches predict the location of a query image by matching to ground-level images with known locations (e. g., street-view data).

Face Verification

Microsoft COCO: Common Objects in Context

36 code implementations1 May 2014 Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, Piotr Dollár

We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding.

Instance Segmentation Object +5

Cross-View Image Geolocalization

no code implementations CVPR 2013 Tsung-Yi Lin, Serge Belongie, James Hays

On the other hand, there is no shortage of visual and geographic data that densely covers the Earth we examine overhead imagery and land cover survey data but the relationship between this data and ground level query photographs is complex.

Cannot find the paper you are looking for? You can Submit a new open access paper.