no code implementations • 6 Jun 2023 • Jonathan Lorraine, Kevin Xie, Xiaohui Zeng, Chen-Hsuan Lin, Towaki Takikawa, Nicholas Sharp, Tsung-Yi Lin, Ming-Yu Liu, Sanja Fidler, James Lucas
Text-to-3D modelling has seen exciting progress by combining generative text-to-image models with image-to-3D methods like Neural Radiance Fields.
no code implementations • 27 Apr 2023 • Tsai-Shien Chen, Chieh Hubert Lin, Hung-Yu Tseng, Tsung-Yi Lin, Ming-Hsuan Yang
In response to this gap, we introduce MCDiff, a conditional diffusion model that generates a video from a starting image frame and a set of strokes, which allow users to specify the intended content and dynamics for synthesis.
1 code implementation • CVPR 2023 • Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, Tsung-Yi Lin
DreamFusion has recently demonstrated the utility of a pre-trained text-to-image diffusion model to optimize Neural Radiance Fields (NeRF), achieving remarkable text-to-3D synthesis results.
no code implementations • 11 Aug 2022 • Xianzhi Du, Wei-Chih Hung, Tsung-Yi Lin
This paper summarizes model improvements and inference-time optimizations for the popular anchor-based detectors in the scenes of autonomous driving.
1 code implementation • 12 Jul 2022 • Kai-En Lin, Lin Yen-Chen, Wei-Sheng Lai, Tsung-Yi Lin, Yi-Chang Shih, Ravi Ramamoorthi
Existing approaches condition on local image features to reconstruct a 3D object, but often render blurry predictions at viewpoints that are far away from the source view.
1 code implementation • 15 Jun 2022 • Ting Chen, Saurabh Saxena, Lala Li, Tsung-Yi Lin, David J. Fleet, Geoffrey Hinton
Despite that, by formulating the output of each task as a sequence of discrete tokens with a unified interface, we show that one can train a neural network with a single model architecture and loss function on all these tasks, with no task-specific customization.
no code implementations • 3 Mar 2022 • Lin Yen-Chen, Pete Florence, Jonathan T. Barron, Tsung-Yi Lin, Alberto Rodriguez, Phillip Isola
In particular, we demonstrate that a NeRF representation of a scene can be used to train dense object descriptors.
1 code implementation • 22 Dec 2021 • Golnaz Ghiasi, Xiuye Gu, Yin Cui, Tsung-Yi Lin
We propose OpenSeg to address the above issue while still making use of scalable image-level supervision of captions.
3 code implementations • 17 Dec 2021 • Wuyang Chen, Xianzhi Du, Fan Yang, Lucas Beyer, Xiaohua Zhai, Tsung-Yi Lin, Huizhong Chen, Jing Li, Xiaodan Song, Zhangyang Wang, Denny Zhou
In this paper, we comprehensively study three architecture design choices on ViT -- spatial reduction, doubled channels, and multiscale features -- and demonstrate that a vanilla ViT architecture can fulfill this goal without handcrafting multiscale features, maintaining the original ViT design philosophy.
no code implementations • ICCV 2021 • Golnaz Ghiasi, Barret Zoph, Ekin D. Cubuk, Quoc V. Le, Tsung-Yi Lin
The results suggest self-training is a promising direction to aggregate labeled and unlabeled training data for learning general feature representations.
no code implementations • ICCV 2021 • Weicheng Kuo, Anelia Angelova, Tsung-Yi Lin, Angela Dai
3D perception of object shapes from RGB image input is fundamental towards semantic scene understanding, grounding image-based perception in our spatially 3-dimensional real-world environments.
2 code implementations • 15 Aug 2021 • Dahun Kim, Tsung-Yi Lin, Anelia Angelova, In So Kweon, Weicheng Kuo
In this paper, we identify that the problem is that the binary classifiers in existing proposal methods tend to overfit to the training categories.
Ranked #2 on
Open World Object Detection
on COCO VOC to non-VOC
no code implementations • 1 Jul 2021 • Lin Yen-Chen, Andy Zeng, Shuran Song, Phillip Isola, Tsung-Yi Lin
With just a small amount of robotic experience, we can further fine-tune the affordance model to achieve better results.
1 code implementation • 30 Jun 2021 • Xianzhi Du, Barret Zoph, Wei-Chih Hung, Tsung-Yi Lin
We benchmark these improvements on the vanilla ResNet-FPN backbone with RetinaNet and RCNN detectors.
Ranked #53 on
Object Detection
on COCO minival
2 code implementations • 25 Jun 2021 • Boyi Li, Yin Cui, Tsung-Yi Lin, Serge Belongie
In this paper, we propose and explore the problem of image translation for data augmentation.
4 code implementations • ICLR 2022 • Xiuye Gu, Tsung-Yi Lin, Weicheng Kuo, Yin Cui
On COCO, ViLD outperforms the previous state-of-the-art by 4. 8 on novel AP and 11. 4 on overall AP.
Ranked #2 on
Open Vocabulary Object Detection
on Objects365
3 code implementations • NeurIPS 2021 • Irwan Bello, William Fedus, Xianzhi Du, Ekin D. Cubuk, Aravind Srinivas, Tsung-Yi Lin, Jonathon Shlens, Barret Zoph
Using improved training and scaling strategies, we design a family of ResNet architectures, ResNet-RS, which are 1. 7x - 2. 7x faster than EfficientNets on TPUs, while achieving similar accuracies on ImageNet.
Ranked #1 on
Document Image Classification
on AIP
13 code implementations • CVPR 2021 • Aravind Srinivas, Tsung-Yi Lin, Niki Parmar, Jonathon Shlens, Pieter Abbeel, Ashish Vaswani
Finally, we present a simple adaptation of the BoTNet design for image classification, resulting in models that achieve a strong performance of 84. 7% top-1 accuracy on the ImageNet benchmark while being up to 1. 64x faster in compute time than the popular EfficientNet models on TPU-v3 hardware.
Ranked #50 on
Instance Segmentation
on COCO minival
5 code implementations • CVPR 2021 • Golnaz Ghiasi, Yin Cui, Aravind Srinivas, Rui Qian, Tsung-Yi Lin, Ekin D. Cubuk, Quoc V. Le, Barret Zoph
Our baseline model outperforms the LVIS 2020 Challenge winning entry by +3. 6 mask AP on rare categories.
Ranked #1 on
Object Detection
on PASCAL VOC 2007
(using extra training data)
1 code implementation • 10 Dec 2020 • Lin Yen-Chen, Pete Florence, Jonathan T. Barron, Alberto Rodriguez, Phillip Isola, Tsung-Yi Lin
We then show that for complex real-world scenes from the LLFF dataset, iNeRF can improve NeRF by estimating the camera poses of novel images and using these images as additional training data for NeRF.
no code implementations • ECCV 2020 • Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Yin Cui, Mingxing Tan, Quoc Le, Xiaodan Song
Furthermore, SpineNet is built with a uniform resource distribution over operations.
no code implementations • 18 Aug 2020 • Wei-Chih Hung, Henrik Kretzschmar, Tsung-Yi Lin, Yuning Chai, Ruichi Yu, Ming-Hsuan Yang, Dragomir Anguelov
Robust multi-object tracking (MOT) is a prerequisite fora safe deployment of self-driving cars.
no code implementations • ECCV 2020 • Wei-cheng Kuo, Anelia Angelova, Tsung-Yi Lin, Angela Dai
We propose to leverage existing large-scale datasets of 3D models to understand the underlying 3D structure of objects seen in an image by constructing a CAD-based representation of the objects and their poses.
2 code implementations • NeurIPS 2020 • Barret Zoph, Golnaz Ghiasi, Tsung-Yi Lin, Yin Cui, Hanxiao Liu, Ekin D. Cubuk, Quoc V. Le
For example, on the COCO object detection dataset, pre-training benefits when we use one fifth of the labeled data, and hurts accuracy when we use all labeled data.
Ranked #1 on
Semantic Segmentation
on PASCAL VOC 2012 val
no code implementations • 11 Apr 2020 • Ankita Pasad, Ariel Gordon, Tsung-Yi Lin, Anelia Angelova
We leverage unsupervised learning of depth, egomotion, and camera intrinsics to improve the performance of single-image semantic segmentation, by enforcing 3D-geometric and temporal consistency of segmentation masks across video frames.
5 code implementations • CVPR 2020 • Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song
We propose SpineNet, a backbone with scale-permuted intermediate features and cross-scale connections that is learned on an object detection task by Neural Architecture Search.
Ranked #9 on
Image Classification
on iNaturalist
2 code implementations • CVPR 2020 • Bo Chen, Golnaz Ghiasi, Hanxiao Liu, Tsung-Yi Lin, Dmitry Kalenichenko, Hartwig Adams, Quoc V. Le
We propose MnasFPN, a mobile-friendly search space for the detection head, and combine it with latency-aware architecture search to produce efficient object detection models.
Ranked #234 on
Object Detection
on COCO test-dev
6 code implementations • ECCV 2020 • Barret Zoph, Ekin D. Cubuk, Golnaz Ghiasi, Tsung-Yi Lin, Jonathon Shlens, Quoc V. Le
Importantly, the best policy found on COCO may be transferred unchanged to other detection datasets and models to improve predictive accuracy.
Ranked #83 on
Object Detection
on COCO test-dev
5 code implementations • CVPR 2019 • Golnaz Ghiasi, Tsung-Yi Lin, Ruoming Pang, Quoc V. Le
Here we aim to learn a better architecture of feature pyramid network for object detection.
1 code implementation • ICCV 2019 • Wei-cheng Kuo, Anelia Angelova, Jitendra Malik, Tsung-Yi Lin
However, it is difficult and costly to segment objects in novel categories because a large number of mask annotations is required.
8 code implementations • CVPR 2019 • Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang song, Serge Belongie
We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss.
Ranked #2 on
Long-tail Learning
on EGTEA
6 code implementations • NeurIPS 2018 • Golnaz Ghiasi, Tsung-Yi Lin, Quoc V. Le
This lack of success of dropout for convolutional layers is perhaps due to the fact that activation units in convolutional layers are spatially correlated so information can still flow through convolutional networks despite dropout.
Ranked #729 on
Image Classification
on ImageNet
228 code implementations • ICCV 2017 • Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollár
Our novel Focal Loss focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training.
Ranked #3 on
Long-tail Learning
on EGTEA
1 code implementation • WWW 2017 • Cheng-Kang Hsieh, Longqi Yang, Yin Cui, Tsung-Yi Lin, Serge Belongie, Deborah Estrin
Metric learning algorithms produce distance metrics that capture the important relationships among data.
Ranked #1 on
Recommendation Systems
on MovieLens 20M
(Recall@100 metric)
85 code implementations • CVPR 2017 • Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, Serge Belongie
Feature pyramids are a basic component in recognition systems for detecting objects at different scales.
Ranked #3 on
Pedestrian Detection
on TJU-Ped-traffic
1 code implementation • 7 Apr 2016 • Sergey Zagoruyko, Adam Lerer, Tsung-Yi Lin, Pedro O. Pinheiro, Sam Gross, Soumith Chintala, Piotr Dollár
To address these challenges, we test three modifications to the standard Fast R-CNN object detector: (1) skip connections that give the detector access to features at multiple network layers, (2) a foveal structure to exploit object context at multiple object resolutions, and (3) an integral loss function and corresponding network adjustment that improve localization.
Ranked #102 on
Instance Segmentation
on COCO test-dev
2 code implementations • 29 Mar 2016 • Pedro O. Pinheiro, Tsung-Yi Lin, Ronan Collobert, Piotr Dollàr
In this work we propose to augment feedforward nets for object segmentation with a novel top-down refinement approach.
Ranked #4 on
Region Proposal
on COCO test-dev
no code implementations • CVPR 2015 • Tsung-Yi Lin, Yin Cui, Serge Belongie, James Hays
Most approaches predict the location of a query image by matching to ground-level images with known locations (e. g., street-view data).
18 code implementations • 1 Apr 2015 • Xinlei Chen, Hao Fang, Tsung-Yi Lin, Ramakrishna Vedantam, Saurabh Gupta, Piotr Dollar, C. Lawrence Zitnick
In this paper we describe the Microsoft COCO Caption dataset and evaluation server.
34 code implementations • 1 May 2014 • Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, Piotr Dollár
We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding.
no code implementations • CVPR 2013 • Tsung-Yi Lin, Serge Belongie, James Hays
On the other hand, there is no shortage of visual and geographic data that densely covers the Earth we examine overhead imagery and land cover survey data but the relationship between this data and ground level query photographs is complex.