Search Results for author: Andreas Veit

Found 30 papers, 11 papers with code

Rethinking FID: Towards a Better Evaluation Metric for Image Generation

2 code implementations30 Nov 2023 Sadeep Jayasumana, Srikumar Ramalingam, Andreas Veit, Daniel Glasner, Ayan Chakrabarti, Sanjiv Kumar

It is an unbiased estimator that does not make any assumptions on the probability distribution of the embeddings and is sample efficient.

Image Generation

Large Language Models with Controllable Working Memory

no code implementations9 Nov 2022 Daliang Li, Ankit Singh Rawat, Manzil Zaheer, Xin Wang, Michal Lukasik, Andreas Veit, Felix Yu, Sanjiv Kumar

By contrast, when the context is irrelevant to the task, the model should ignore it and fall back on its internal knowledge.

counterfactual World Knowledge

When does mixup promote local linearity in learned representations?

no code implementations28 Oct 2022 Arslan Chaudhry, Aditya Krishna Menon, Andreas Veit, Sadeep Jayasumana, Srikumar Ramalingam, Sanjiv Kumar

Towards this, we study two questions: (1) how does the Mixup loss that enforces linearity in the \emph{last} network layer propagate the linearity to the \emph{earlier} layers?

Representation Learning

Teacher Guided Training: An Efficient Framework for Knowledge Transfer

no code implementations14 Aug 2022 Manzil Zaheer, Ankit Singh Rawat, Seungyeon Kim, Chong You, Himanshu Jain, Andreas Veit, Rob Fergus, Sanjiv Kumar

In this paper, we propose the teacher-guided training (TGT) framework for training a high-quality compact model that leverages the knowledge acquired by pretrained generative models, while obviating the need to go through a large volume of data.

Generalization Bounds Image Classification +4

Leveraging redundancy in attention with Reuse Transformers

1 code implementation13 Oct 2021 Srinadh Bhojanapalli, Ayan Chakrabarti, Andreas Veit, Michal Lukasik, Himanshu Jain, Frederick Liu, Yin-Wen Chang, Sanjiv Kumar

Pairwise dot product-based attention allows Transformers to exchange information between tokens in an input-dependent way, and is key to their success across diverse applications in language and vision.

Eigen Analysis of Self-Attention and its Reconstruction from Partial Computation

no code implementations16 Jun 2021 Srinadh Bhojanapalli, Ayan Chakrabarti, Himanshu Jain, Sanjiv Kumar, Michal Lukasik, Andreas Veit

State-of-the-art transformer models use pairwise dot-product based self-attention, which comes at a computational cost quadratic in the input sequence length.

RankDistil: Knowledge Distillation for Ranking

no code implementations AISTATS 2021 Sashank J. Reddi, Rama Kumar Pasumarthi, Aditya Krishna Menon, Ankit Singh Rawat Felix Yu, Seungyeon Kim, Andreas Veit, Sanjiv Kumar

Knowledge distillation is an approach to improve the performance of a student model by using the knowledge of a complex teacher. Despite its success in several deep learning applications, the study of distillation is mostly confined to classification settings.

Document Ranking Knowledge Distillation

Understanding Robustness of Transformers for Image Classification

no code implementations ICCV 2021 Srinadh Bhojanapalli, Ayan Chakrabarti, Daniel Glasner, Daliang Li, Thomas Unterthiner, Andreas Veit

We find that when pre-trained with a sufficient amount of data, ViT models are at least as robust as the ResNet counterparts on a broad range of perturbations.

Classification General Classification +1

On the Reproducibility of Neural Network Predictions

no code implementations5 Feb 2021 Srinadh Bhojanapalli, Kimberly Wilber, Andreas Veit, Ankit Singh Rawat, Seungyeon Kim, Aditya Menon, Sanjiv Kumar

By analyzing the relationship between churn and prediction confidences, we pursue an approach with two components for churn reduction.

Data Augmentation Image Classification

Improving Calibration in Deep Metric Learning With Cross-Example Softmax

no code implementations17 Nov 2020 Andreas Veit, Kimberly Wilber

Triplet-based methods capture top-$k$ relevancy, where all top-$k$ scoring documents are assumed to be relevant to a given query Pairwise contrastive models capture threshold relevancy, where all documents scoring higher than some threshold are assumed to be relevant.

Image Retrieval Metric Learning +1

Coping with Label Shift via Distributionally Robust Optimisation

1 code implementation ICLR 2021 Jingzhao Zhang, Aditya Menon, Andreas Veit, Srinadh Bhojanapalli, Sanjiv Kumar, Suvrit Sra

The label shift problem refers to the supervised learning setting where the train and test label distributions do not match.

Long-tail learning via logit adjustment

3 code implementations ICLR 2021 Aditya Krishna Menon, Sadeep Jayasumana, Ankit Singh Rawat, Himanshu Jain, Andreas Veit, Sanjiv Kumar

Real-world classification problems typically exhibit an imbalanced or long-tailed label distribution, wherein many labels are associated with only a few samples.

Long-tail Learning

Doubly-stochastic mining for heterogeneous retrieval

no code implementations23 Apr 2020 Ankit Singh Rawat, Aditya Krishna Menon, Andreas Veit, Felix Yu, Sashank J. Reddi, Sanjiv Kumar

Modern retrieval problems are characterised by training sets with potentially billions of labels, and heterogeneous data distributions across subpopulations (e. g., users of a retrieval system may be from different countries), each of which poses a challenge.

Retrieval Stochastic Optimization

Why are Adaptive Methods Good for Attention Models?

no code implementations NeurIPS 2020 Jingzhao Zhang, Sai Praneeth Karimireddy, Andreas Veit, Seungyeon Kim, Sashank J. Reddi, Sanjiv Kumar, Suvrit Sra

While stochastic gradient descent (SGD) is still the \emph{de facto} algorithm in deep learning, adaptive methods like Clipped SGD/Adam have been observed to outperform SGD across important tasks, such as attention models.

Why ADAM Beats SGD for Attention Models

no code implementations25 Sep 2019 Jingzhao Zhang, Sai Praneeth Karimireddy, Andreas Veit, Seungyeon Kim, Sashank J Reddi, Sanjiv Kumar, Suvrit Sra

While stochastic gradient descent (SGD) is still the de facto algorithm in deep learning, adaptive methods like Adam have been observed to outperform SGD across important tasks, such as attention models.

How To Backdoor Federated Learning

3 code implementations2 Jul 2018 Eugene Bagdasaryan, Andreas Veit, Yiqing Hua, Deborah Estrin, Vitaly Shmatikov

An attacker selected in a single round of federated learning can cause the global model to immediately reach 100% accuracy on the backdoor task.

Anomaly Detection Data Poisoning +2

Semantic Segmentation with Scarce Data

no code implementations2 Jul 2018 Isay Katsman, Rohun Tripathi, Andreas Veit, Serge Belongie

Semantic segmentation is a challenging vision problem that usually necessitates the collection of large amounts of finely annotated data, which is often quite expensive to obtain.

Segmentation Semantic Segmentation

Learning to Evaluate Image Captioning

1 code implementation CVPR 2018 Yin Cui, Guandao Yang, Andreas Veit, Xun Huang, Serge Belongie

To address these two challenges, we propose a novel learning based discriminative evaluation metric that is directly trained to distinguish between human and machine-generated captions.

8k Data Augmentation +2

Convolutional Networks with Adaptive Inference Graphs

2 code implementations ECCV 2018 Andreas Veit, Serge Belongie

In this work, we propose convolutional networks with adaptive inference graphs (ConvNet-AIG) that adaptively define their network topology conditioned on the input image.

Separating Self-Expression and Visual Content in Hashtag Supervision

1 code implementation CVPR 2018 Andreas Veit, Maximilian Nickel, Serge Belongie, Laurens van der Maaten

The variety, abundance, and structured nature of hashtags make them an interesting data source for training vision models.

Retrieval

Deep Learning is Robust to Massive Label Noise

no code implementations ICLR 2018 David Rolnick, Andreas Veit, Serge Belongie, Nir Shavit

Deep neural networks trained on large supervised datasets have led to impressive results in image classification and other tasks.

Image Classification

Residual Networks Behave Like Ensembles of Relatively Shallow Networks

2 code implementations NeurIPS 2016 Andreas Veit, Michael Wilber, Serge Belongie

Moreover, residual networks seem to enable very deep networks by leveraging only the short paths during training.

Conditional Similarity Networks

4 code implementations CVPR 2017 Andreas Veit, Serge Belongie, Theofanis Karaletsos

A main reason for this is that contradicting notions of similarities cannot be captured in a single space.

Household Electricity Demand Forecasting -- Benchmarking State-of-the-Art Methods

no code implementations1 Apr 2014 Andreas Veit, Christoph Goebel, Rohit Tidke, Christoph Doblander, Hans-Arno Jacobsen

The increasing use of renewable energy sources with variable output, such as solar photovoltaic and wind power generation, calls for Smart Grids that effectively manage flexible loads and energy storage.

Benchmarking

Cannot find the paper you are looking for? You can Submit a new open access paper.