no code implementations • 16 Jan 2025 • Kumail Alhamoud, Shaden Alshammari, Yonglong Tian, Guohao Li, Philip Torr, Yoon Kim, Marzyeh Ghassemi
The benchmark consists of two core tasks designed to evaluate negation understanding in diverse multimodal settings: Retrieval with Negation and Multiple Choice Questions with Negated Captions.
1 code implementation • 20 Dec 2024 • Shobhita Sundaram, Julia Chae, Yonglong Tian, Sara Beery, Phillip Isola
Modern vision models excel at general purpose downstream tasks.
no code implementations • 17 Oct 2024 • Lijie Fan, Tianhong Li, Siyang Qin, Yuanzhen Li, Chen Sun, Michael Rubinstein, Deqing Sun, Kaiming He, Yonglong Tian
Models based on continuous tokens achieve significantly better visual quality than those using discrete tokens.
1 code implementation • 17 Jun 2024 • Tianhong Li, Yonglong Tian, He Li, Mingyang Deng, Kaiming He
In this work, we propose to model the per-token probability distribution using a diffusion procedure, which allows us to apply autoregressive models in a continuous-valued space.
Ranked #8 on
Image Generation
on ImageNet 256x256
1 code implementation • 11 Feb 2024 • Nate Gillman, Michael Freeman, Daksh Aggarwal, Chia-Hong Hsu, Calvin Luo, Yonglong Tian, Chen Sun
As synthetic data becomes higher quality and proliferates on the internet, machine learning models are increasingly trained on a mix of human- and machine-generated data.
1 code implementation • 5 Jan 2024 • Jiawei Yang, Katie Z Luo, Jiefeng Li, Congyue Deng, Leonidas Guibas, Dilip Krishnan, Kilian Q Weinberger, Yonglong Tian, Yue Wang
In the second stage, we train a lightweight transformer block to predict clean features from raw ViT outputs, leveraging the derived estimates of the clean features as supervision.
2 code implementations • CVPR 2024 • Yonglong Tian, Lijie Fan, KaiFeng Chen, Dina Katabi, Dilip Krishnan, Phillip Isola
We introduce SynCLR, a novel approach for learning visual representations exclusively from synthetic images and synthetic captions, without any real data.
1 code implementation • CVPR 2024 • Lijie Fan, KaiFeng Chen, Dilip Krishnan, Dina Katabi, Phillip Isola, Yonglong Tian
Our findings also suggest that scaling synthetic data can be particularly effective in scenarios such as: (1) when there is a limited supply of real images for a supervised problem (e. g., fewer than 0. 5 million images in ImageNet), (2) when the evaluation dataset diverges significantly from the training data, indicating the out-of-distribution scenario, or (3) when synthetic data is used in conjunction with real images, as demonstrated in the training of CLIP models.
no code implementations • 5 Oct 2023 • Tianhong Li, Sangnie Bhardwaj, Yonglong Tian, Han Zhang, Jarred Barber, Dina Katabi, Guillaume Lajoie, Huiwen Chang, Dilip Krishnan
We demonstrate image generation and captioning performance on par with state-of-the-art text-to-image and image-to-text models with orders of magnitude fewer (only 3M) paired image-text data.
2 code implementations • NeurIPS 2023 • Yilun Xu, Mingyang Deng, Xiang Cheng, Yonglong Tian, Ziming Liu, Tommi Jaakkola
Restart not only outperforms the previous best SDE results, but also accelerates the sampling speed by 10-fold / 2-fold on CIFAR-10 / ImageNet $64 \times 64$.
2 code implementations • NeurIPS 2023 • Yonglong Tian, Lijie Fan, Phillip Isola, Huiwen Chang, Dilip Krishnan
We investigate the potential of learning visual representations using synthetic images generated by text-to-image models.
1 code implementation • NeurIPS 2023 • Lijie Fan, Dilip Krishnan, Phillip Isola, Dina Katabi, Yonglong Tian
During training, LaCLIP randomly selects either the original texts or the rewritten versions as text augmentations for each image.
1 code implementation • 8 Feb 2023 • Yilun Xu, Ziming Liu, Yonglong Tian, Shangyuan Tong, Max Tegmark, Tommi Jaakkola
The new models reduce to PFGM when $D{=}1$ and to diffusion models when $D{\to}\infty$.
Ranked #1 on
Image Generation
on FFHQ 64x64 - 4x upscaling
1 code implementation • 20 Oct 2022 • Lirui Wang, Kaiqing Zhang, Yunzhu Li, Yonglong Tian, Russ Tedrake
Decentralized learning has been advocated and widely deployed to make efficient use of distributed datasets, with an extensive focus on supervised learning (SL) problems.
no code implementations • 22 Mar 2022 • Tianyu Hua, Yonglong Tian, Sucheng Ren, Michalis Raptis, Hang Zhao, Leonid Sigal
We illustrate that randomized serialization of the segments significantly improves the performance and results in distribution over spatially-long (across-segments) and -short (within-segment) predictions which are effective for feature learning.
no code implementations • CVPR 2022 • Sucheng Ren, Zhengqi Gao, Tianyu Hua, Zihui Xue, Yonglong Tian, Shengfeng He, Hang Zhao
Transformers recently are adapted from the community of natural language processing as a promising substitute of convolution-based neural networks for visual learning tasks.
1 code implementation • 21 Jun 2021 • Jindong Gu, Wei Liu, Yonglong Tian
While large self-supervised models have rivalled the performance of their supervised counterparts, small models still struggle.
1 code implementation • ICLR 2022 • Ali Jahanian, Xavier Puig, Yonglong Tian, Phillip Isola
We investigate this question in the setting of learning general-purpose visual representations from a black-box generative model rather than directly from data.
no code implementations • ICCV 2021 • Yonglong Tian, Olivier J. Henaff, Aaron van den Oord
Self-supervised learning holds promise in leveraging large amounts of unlabeled data, however much of its progress has thus far been limited to highly curated pre-training data such as ImageNet.
no code implementations • ICCV 2021 • Chen Sun, Arsha Nagrani, Yonglong Tian, Cordelia Schmid
We focus on contrastive methods for self-supervised video representation learning.
no code implementations • 17 Dec 2020 • Tianhong Li, Lijie Fan, Yuan Yuan, Hao He, Yonglong Tian, Rogerio Feris, Piotr Indyk, Dina Katabi
However, contrastive learning is susceptible to feature suppression, i. e., it may discard important information relevant to the task of interest, and learn irrelevant features.
1 code implementation • NeurIPS 2020 • Yonglong Tian, Chen Sun, Ben Poole, Dilip Krishnan, Cordelia Schmid, Phillip Isola
Contrastive learning between multiple views of the data has recently achieved state of the art performance in the field of self-supervised representation learning.
Ranked #2 on
Contrastive Learning
on imagenet-1k
24 code implementations • NeurIPS 2020 • Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, Dilip Krishnan
Contrastive learning applied to self-supervised representation learning has seen a resurgence in recent years, leading to state of the art performance in the unsupervised training of deep image models.
Ranked #2 on
Class Incremental Learning
on cifar100
3 code implementations • ECCV 2020 • Yonglong Tian, Yue Wang, Dilip Krishnan, Joshua B. Tenenbaum, Phillip Isola
The focus of recent meta-learning research has been on the development of learning algorithms that can quickly adapt to test time tasks with limited data and low computational cost.
4 code implementations • ICLR 2020 • Yonglong Tian, Dilip Krishnan, Phillip Isola
We demonstrate that this objective ignores important structural knowledge of the teacher network.
Ranked #14 on
Knowledge Distillation
on CIFAR-100
no code implementations • 28 Sep 2019 • Lu Mi, Hao Wang, Yonglong Tian, Hao He, Nir Shavit
Uncertainty estimation is an essential step in the evaluation of the robustness for deep learning models in computer vision, especially when applied in risk-sensitive areas.
8 code implementations • ECCV 2020 • Yonglong Tian, Dilip Krishnan, Phillip Isola
We analyze key properties of the approach that make it work, finding that the contrastive loss outperforms a popular alternative based on cross-view prediction, and that the more views we learn from, the better the resulting representation captures underlying scene semantics.
Ranked #48 on
Self-Supervised Action Recognition
on UCF101
1 code implementation • ICLR 2019 • Hao He, Hao Wang, Guang-He Lee, Yonglong Tian
Probabilistic modelling is a principled framework to perform model aggregation, which has been a primary mechanism to combat mode collapse in the context of Generative Adversarial Networks (GAN).
Ranked #28 on
Image Generation
on STL-10
no code implementations • ICLR 2019 • Yonglong Tian, Andrew Luo, Xingyuan Sun, Kevin Ellis, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu
Human perception of 3D shapes goes beyond reconstructing them as a set of points or a composition of geometric primitives: we also effortlessly understand higher-level shape structure such as the repetition and reflective symmetry of object parts.
no code implementations • Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2018 • Yonglong Tian, Guang-He Lee, Hao He, Chen-Yu Hsu, Dina Katabi
Falls are the top reason for fatal and non-fatal injuries among seniors.
Ranked #2 on
RF-based Pose Estimation
on RF-MMD
no code implementations • SIGCOMM '18 Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication 2018 • Ming-Min Zhao, Yonglong Tian, Hang Zhao, Mohammad Abu Alsheikh, Tianhong Li, Rumen Hristov, Zachary Kabelac, Dina Katabi, Antonio Torralba
It maintains this accuracy even in the presence of multiple people, and in new environments that it has not seen in the training set.
5 code implementations • ICML 2018 • Keyulu Xu, Chengtao Li, Yonglong Tian, Tomohiro Sonobe, Ken-ichi Kawarabayashi, Stefanie Jegelka
Furthermore, combining the JK framework with models like Graph Convolutional Networks, GraphSAGE and Graph Attention Networks consistently improves those models' performance.
Ranked #14 on
Node Classification
on PPI
no code implementations • CVPR 2018 • Ming-Min Zhao, Tianhong Li, Mohammad Abu Alsheikh, Yonglong Tian, Hang Zhao, Antonio Torralba, Dina Katabi
Yet, unlike vision-based pose estimation, the radio-based system can estimate 2D poses through walls despite never trained on such scenarios.
no code implementations • ICCV 2015 • Yonglong Tian, Ping Luo, Xiaogang Wang, Xiaoou Tang
Third, each part detector in DeepParts is a strong detector that can detect pedestrian by observing only a part of a proposal.
no code implementations • CVPR 2015 • Wanli Ouyang, Xiaogang Wang, Xingyu Zeng, Shi Qiu, Ping Luo, Yonglong Tian, Hongsheng Li, Shuo Yang, Zhe Wang, Chen-Change Loy, Xiaoou Tang
In this paper, we propose deformable deep convolutional neural networks for generic object detection.
no code implementations • CVPR 2015 • Yonglong Tian, Ping Luo, Xiaogang Wang, Xiaoou Tang
Rather than expensively annotating scene attributes, we transfer attributes information from existing scene segmentation datasets to the pedestrian dataset, by proposing a novel deep model to learn high-level features from multiple tasks and multiple data sources.
Ranked #30 on
Pedestrian Detection
on Caltech
no code implementations • 11 Sep 2014 • Wanli Ouyang, Ping Luo, Xingyu Zeng, Shi Qiu, Yonglong Tian, Hongsheng Li, Shuo Yang, Zhe Wang, Yuanjun Xiong, Chen Qian, Zhenyao Zhu, Ruohui Wang, Chen-Change Loy, Xiaogang Wang, Xiaoou Tang
In the proposed new deep architecture, a new deformation constrained pooling (def-pooling) layer models the deformation of object parts with geometric constraint and penalty.
no code implementations • CVPR 2014 • Ping Luo, Yonglong Tian, Xiaogang Wang, Xiaoou Tang
In this paper, we propose a Switchable Deep Network (SDN) for pedestrian detection.