no code implementations • ECCV 2020 • Qing Liu, Orchid Majumder, Alessandro Achille, Avinash Ravichandran, Rahul Bhotika, Stefano Soatto
This process enables incrementally improving the model by processing multiple learning episodes, each representing a different learning task, even with few training examples.
no code implementations • 29 Oct 2024 • Yingheng Wang, Zichen Wang, Gil Sadeh, Luca Zancato, Alessandro Achille, George Karypis, Huzefa Rangwala
Self-supervised training of language models (LMs) has seen great success for protein sequences in learning meaningful representations and for generative drug design.
no code implementations • 12 Jul 2024 • Matthew Trager, Alessandro Achille, Pramuditha Perera, Luca Zancato, Stefano Soatto
Specifically, we introduce a characterization of compositional structures in terms of "interaction decompositions," and we establish necessary and sufficient conditions for the presence of such structures within the representations of a model.
no code implementations • 8 Jul 2024 • Luca Zancato, Arjun Seshadri, Yonatan Dukler, Aditya Golatkar, Yantao Shen, Benjamin Bowman, Matthew Trager, Alessandro Achille, Stefano Soatto
Recent hybrid architectures have combined eidetic and fading memory, but with limitations that do not allow the designer or the learning process to seamlessly modulate the two, nor to extend the eidetic memory span.
no code implementations • 12 Jun 2024 • Benjamin Biggs, Arjun Seshadri, Yang Zou, Achin Jain, Aditya Golatkar, Yusheng Xie, Alessandro Achille, Ashwin Swaminathan, Stefano Soatto
We present Diffusion Soup, a compartmentalization method for Text-to-Image Generation that averages the weights of diffusion models trained on sharded data.
no code implementations • 6 Jun 2024 • Xiang Xu, Tianchen Zhao, Zheng Zhang, Zhihua Li, Jon Wu, Alessandro Achille, Mani Srivastava
Protecting digital identities of human face from various attack vectors is paramount, and face anti-spoofing plays a crucial role in this endeavor.
no code implementations • 30 Apr 2024 • Benet Oriol Sabat, Alessandro Achille, Matthew Trager, Stefano Soatto
We propose NeRF-Insert, a NeRF editing framework that allows users to make high-quality local edits with a flexible level of control.
no code implementations • CVPR 2024 • Aditya Golatkar, Alessandro Achille, Luca Zancato, Yu-Xiang Wang, Ashwin Swaminathan, Stefano Soatto
To reduce risks of leaking private information contained in the retrieved set, we introduce Copy-Protected generation with Retrieval (CPR), a new method for RAG with strong copyright protection guarantees in a mixed-private setting for diffusion models. CPR allows to condition the output of diffusion models on a set of retrieved images, while also guaranteeing that unique identifiable information about those example is not exposed in the generated outputs.
no code implementations • CVPR 2024 • Alessandro Favero, Luca Zancato, Matthew Trager, Siddharth Choudhary, Pramuditha Perera, Alessandro Achille, Ashwin Swaminathan, Stefano Soatto
In particular, we show that as more tokens are generated, the reliance on the visual prompt decreases, and this behavior strongly correlates with the emergence of hallucinations.
no code implementations • CVPR 2024 • Alessandro Achille, Greg Ver Steeg, Tian Yu Liu, Matthew Trager, Carson Klingenberg, Stefano Soatto
Quantifying the degree of similarity between images is a key copyright issue for image-based machine learning.
1 code implementation • 23 Oct 2023 • Tian Yu Liu, Matthew Trager, Alessandro Achille, Pramuditha Perera, Luca Zancato, Stefano Soatto
We propose to extract meaning representations from autoregressive language models by considering the distribution of all possible trajectories extending an input text.
1 code implementation • 23 Aug 2023 • Michael Kleinman, Alessandro Achille, Stefano Soatto
Critical learning periods are periods early in development where temporary sensory deficits can have a permanent effect on behavior and learned representations.
no code implementations • 2 Aug 2023 • Aditya Golatkar, Alessandro Achille, Ashwin Swaminathan, Stefano Soatto
We introduce Compartmentalized Diffusion Models (CDM), a method to train different diffusion models (or prompts) on distinct data sources and arbitrarily compose them at inference time.
no code implementations • 6 Jun 2023 • Chethan Parameshwara, Alessandro Achille, Xiaolong Li, Jiawei Mo, Matthew Trager, Ashwin Swaminathan, Cj Taylor, Dheera Venkatraman, Xiaohan Fei, Stefano Soatto
We describe a first step towards learning general-purpose visual representations of physical scenes using only image prediction as a training criterion.
no code implementations • 1 Jun 2023 • Pramuditha Perera, Matthew Trager, Luca Zancato, Alessandro Achille, Stefano Soatto
We investigate whether prompts learned independently for different tasks can be later combined through prompt algebra to obtain a model that supports composition of tasks.
no code implementations • ICCV 2023 • Yonatan Dukler, Benjamin Bowman, Alessandro Achille, Aditya Golatkar, Ashwin Swaminathan, Stefano Soatto
We present Synergy Aware Forgetting Ensemble (SAFE), a method to adapt large models on a diverse collection of data while minimizing the expected cost to remove the influence of training samples from the trained model.
no code implementations • NeurIPS 2023 • Marco Fumero, Florian Wenzel, Luca Zancato, Alessandro Achille, Emanuele Rodolà, Stefano Soatto, Bernhard Schölkopf, Francesco Locatello
Recovering the latent factors of variation of high dimensional data has so far focused on simple synthetic settings.
no code implementations • 7 Apr 2023 • Alessandro Achille, Michael Kearns, Carson Klingenberg, Stefano Soatto
One potential fix for training corpus data defects is model disgorgement -- the elimination of not just the improperly used data, but also the effects of improperly used data on any component of an ML model.
no code implementations • CVPR 2023 • Luca Zancato, Alessandro Achille, Tian Yu Liu, Matthew Trager, Pramuditha Perera, Stefano Soatto
Second, we apply ${\rm T^3AR}$ for test-time adaptation and show that exploiting a pool of external images at test-time leads to more robust representations over existing methods on DomainNet-126 and VISDA-C, especially when few adaptation data are available (up to 8%).
1 code implementation • 3 Mar 2023 • Kaiwen Gui, Alexander M. Dalzell, Alessandro Achille, Martin Suchara, Frederic T. Chong
When our protocol is compiled into CNOT and arbitrary single-qubit gates, it prepares an $N$-dimensional state in depth $O(\log(N))$ and spacetime allocation (a metric that accounts for the fact that oftentimes some ancilla qubits need not be active for the entire circuit) $O(N)$, which are both optimal.
no code implementations • CVPR 2023 • Achin Jain, Gurumurthy Swaminathan, Paolo Favaro, Hao Yang, Avinash Ravichandran, Hrayr Harutyunyan, Alessandro Achille, Onkar Dabeer, Bernt Schiele, Ashwin Swaminathan, Stefano Soatto
The PPL improves the performance estimation on average by 37% across 16 classification and 33% across 10 detection datasets, compared to the power law.
no code implementations • ICCV 2023 • Matthew Trager, Pramuditha Perera, Luca Zancato, Alessandro Achille, Parminder Bhatia, Stefano Soatto
These vectors can be seen as "ideal words" for generating concepts directly within the embedding space of the model.
no code implementations • 15 Feb 2023 • Benjamin Bowman, Alessandro Achille, Luca Zancato, Matthew Trager, Pramuditha Perera, Giovanni Paolini, Stefano Soatto
During inference, models can be assembled based on arbitrary selections of data sources, which we call "\`a-la-carte learning".
no code implementations • CVPR 2023 • Benjamin Bowman, Alessandro Achille, Luca Zancato, Matthew Trager, Pramuditha Perera, Giovanni Paolini, Stefano Soatto
During inference, models can be assembled based on arbitrary selections of data sources, which we call a-la-carte learning.
no code implementations • 23 Nov 2022 • Tian Yu Liu, Aditya Golatkar, Stefano Soatto, Alessandro Achille
We propose a lightweight continual learning method which incorporates information from specialized datasets incrementally, by integrating it along the vector field of "generalist" models.
1 code implementation • CVPR 2023 • Michael Kleinman, Alessandro Achille, Stefano Soatto
We show that the ability of a neural network to integrate information from diverse sources hinges critically on being exposed to properly correlated signals during the early phases of training.
no code implementations • 25 Jul 2022 • Alessandro Achille, Stefano Soatto
We revisit the classic signal-to-symbol barrier in light of the remarkable ability of deep neural networks to generate realistic synthetic data.
no code implementations • 1 Jul 2022 • Mohamad Rida Rammal, Alessandro Achille, Aditya Golatkar, Suhas Diggavi, Stefano Soatto
We derive information theoretic generalization bounds for supervised learning algorithms based on a new measure of leave-one-out conditional mutual information (loo-CMI).
1 code implementation • NeurIPS 2023 • Michael Kleinman, Alessandro Achille, Stefano Soatto, Jonathan Kao
We propose a notion of common information that allows one to quantify and separate the information that is shared between two random variables from the information that is unique to each.
1 code implementation • CVPR 2022 • Matthew Wallingford, Hao Li, Alessandro Achille, Avinash Ravichandran, Charless Fowlkes, Rahul Bhotika, Stefano Soatto
TAPS solves a joint optimization problem which determines which layers to share with the base model and the value of the task-specific weights.
no code implementations • 30 Mar 2022 • Simone Bombari, Alessandro Achille, Zijian Wang, Yu-Xiang Wang, Yusheng Xie, Kunwar Yashraj Singh, Srikar Appalaraju, Vijay Mahadevan, Stefano Soatto
While bounding general memorization can have detrimental effects on the performance of a trained model, bounding RM does not prevent effective learning.
no code implementations • CVPR 2022 • Aditya Golatkar, Alessandro Achille, Yu-Xiang Wang, Aaron Roth, Michael Kearns, Stefano Soatto
AdaMix incorporates few-shot training, or cross-modal zero-shot learning, on public data prior to private fine-tuning, to improve the trade-off.
no code implementations • NeurIPS 2021 • Julian Zilly, Alessandro Achille, Andrea Censi, Emilio Frazzoli
In particular, we show that, when using weight decay, weights in successive layers of a deep network may become "mutually frozen".
no code implementations • ICLR 2022 • Yonatan Dukler, Alessandro Achille, Giovanni Paolini, Avinash Ravichandran, Marzia Polito, Stefano Soatto
A learning task is a function from a training set to the validation error, which can be represented by a trained deep neural network (DNN).
no code implementations • 29 Sep 2021 • Luca Zancato, Alessandro Achille, Giovanni Paolini, Alessandro Chiuso, Stefano Soatto
After modeling the signals, we use an anomaly detection system based on the classic CUMSUM algorithm and a variational approximation of the $f$-divergence to detect both isolated point anomalies and change-points in statistics of the signals.
no code implementations • ICLR Workshop Neural_Compression 2021 • Michael Kleinman, Alessandro Achille, Stefano Soatto, Jonathan Kao
We introduce the Redundant Information Neural Estimator (RINE), a method that allows efficient estimation for the component of information about a target variable that is common to a set of sources, previously referred to as the “redundant information.” We show that existing definitions of the redundant information can be recast in terms of an optimization over a family of deterministic or stochastic functions.
no code implementations • 29 Jan 2021 • Aditya Deshpande, Alessandro Achille, Avinash Ravichandran, Hao Li, Luca Zancato, Charless Fowlkes, Rahul Bhotika, Stefano Soatto, Pietro Perona
Since all model selection algorithms in the literature have been tested on different use-cases and never compared directly, we introduce a new comprehensive benchmark for model selection comprising of: i) A model zoo of single and multi-domain models, and ii) Many target tasks.
no code implementations • 26 Jan 2021 • Orchid Majumder, Avinash Ravichandran, Subhransu Maji, Alessandro Achille, Marzia Polito, Stefano Soatto
In this work we investigate the complementary roles of these two sources of information by combining instance-discriminative contrastive learning and supervised learning in a single framework called Supervised Momentum Contrastive learning (SUPMOCO).
1 code implementation • ICLR 2021 • Hrayr Harutyunyan, Alessandro Achille, Giovanni Paolini, Orchid Majumder, Avinash Ravichandran, Rahul Bhotika, Stefano Soatto
We define a notion of information that an individual sample provides to the training of a neural network, and we specialize it to measure both how much a sample informs the final weights and how much it informs the function computed by the weights.
2 code implementations • ICLR 2021 • Giovanni Paolini, Ben Athiwaratkun, Jason Krone, Jie Ma, Alessandro Achille, Rishita Anubhai, Cicero Nogueira dos santos, Bing Xiang, Stefano Soatto
We propose a new framework, Translation between Augmented Natural Languages (TANL), to solve many structured prediction language tasks including joint entity and relation extraction, nested named entity recognition, relation classification, semantic role labeling, event extraction, coreference resolution, and dialogue state tracking.
Ranked #3 on Relation Classification on TACRED
no code implementations • 1 Jan 2021 • Kamal Gupta, Vijay Mahadevan, Alessandro Achille, Justin Lazarow, Larry S. Davis, Abhinav Shrivastava
We address the problem of scene layout generation for diverse domains such as images, mobile applications, documents and 3D objects.
no code implementations • CVPR 2021 • Aditya Golatkar, Alessandro Achille, Avinash Ravichandran, Marzia Polito, Stefano Soatto
We show that the influence of a subset of the training samples can be removed -- or "forgotten" -- from the weights of a network trained on large-scale image classification tasks, and we provide strong computable bounds on the amount of remaining information after forgetting.
no code implementations • CVPR 2021 • Alessandro Achille, Aditya Golatkar, Avinash Ravichandran, Marzia Polito, Stefano Soatto
Classifiers that are linear in their parameters, and trained by optimizing a convex loss function, have predictable behavior with respect to changes in the training data, initial conditions, and optimization.
no code implementations • ICLR 2021 • Michael Kleinman, Alessandro Achille, Daksh Idnani, Jonathan C. Kao
We introduce a notion of usable information contained in the representation learned by a deep network, and use it to study how optimal representations for the task emerge during training.
no code implementations • NeurIPS 2020 • Luca Zancato, Alessandro Achille, Avinash Ravichandran, Rahul Bhotika, Stefano Soatto
We tackle the problem of predicting the number of optimization steps that a pre-trained deep network needs to converge to a given value of the loss function.
no code implementations • 22 Jul 2020 • Matteo Terzi, Alessandro Achille, Marco Maggipinto, Gian Antonio Susto
Recent results show that features of adversarially trained networks for classification, in addition to being robust, enable desirable properties such as invertibility.
2 code implementations • ICCV 2021 • Kamal Gupta, Justin Lazarow, Alessandro Achille, Larry Davis, Vijay Mahadevan, Abhinav Shrivastava
Generating a new layout or extending an existing layout requires understanding the relationships between these primitives.
1 code implementation • ECCV 2020 • Aditya Golatkar, Alessandro Achille, Stefano Soatto
We describe a procedure for removing dependency on a cohort of training data from a trained deep network that improves upon and generalizes previous methods to different readout functions and can be extended to ensure forgetting in the activations of the network.
no code implementations • 11 Feb 2020 • Qing Liu, Orchid Majumder, Alessandro Achille, Avinash Ravichandran, Rahul Bhotika, Stefano Soatto
Majority of the modern meta-learning methods for few-shot classification tasks operate in two phases: a meta-training phase where the meta-learner learns a generic representation by solving multiple few-shot tasks sampled from a large dataset and a testing phase, where the meta-learner leverages its learnt internal representation for a specific few-shot task involving classes which were not seen during the meta-training phase.
1 code implementation • 19 Dec 2019 • Joël Seytre, Jon Wu, Alessandro Achille
We present a detector for curved text in natural images.
2 code implementations • CVPR 2020 • Aditya Golatkar, Alessandro Achille, Stefano Soatto
We explore the problem of selectively forgetting a particular subset of the data used for training a deep neural network.
no code implementations • 25 Sep 2019 • Alessandro Achille, Stefano Soatto
We relate this to the Information in the Weights, and use this result to show that models of low (information) complexity not only generalize better, but are bound to learn invariant representations of future inputs.
no code implementations • 2 Aug 2019 • Cuong V. Nguyen, Alessandro Achille, Michael Lam, Tal Hassner, Vijay Mahadevan, Stefano Soatto
As an application, we apply our procedure to study two properties of a task sequence: (1) total complexity and (2) sequential heterogeneity.
no code implementations • NeurIPS 2019 • Aditya Golatkar, Alessandro Achille, Stefano Soatto
Deep neural networks (DNNs), however, challenge this view: We show that removing regularization after an initial transient period has little effect on generalization, even if the final loss landscape is the same as if there had been no regularization.
no code implementations • 29 May 2019 • Alessandro Achille, Giovanni Paolini, Stefano Soatto
We establish a novel relation between the information in the weights and the effective information in the activations, and use this result to show that models with low (information) complexity not only generalize better, but are bound to learn invariant representations of future inputs.
no code implementations • ICLR 2019 • Alessandro Achille, Matteo Rovere, Stefano Soatto
Deficits that do not affect low-level statistics, such as vertical flipping of the images, have no lasting effect on performance and can be overcome with further training.
no code implementations • 5 Apr 2019 • Alessandro Achille, Giovanni Paolini, Glen Mbeng, Stefano Soatto
Our framework is the first to measure complexity in a way that accounts for the effect of the optimization scheme, which is critical in Deep Learning.
1 code implementation • ICCV 2019 • Alessandro Achille, Michael Lam, Rahul Tewari, Avinash Ravichandran, Subhransu Maji, Charless Fowlkes, Stefano Soatto, Pietro Perona
We demonstrate that this embedding is capable of predicting task similarities that match our intuition about semantic and taxonomic relations between different visual tasks (e. g., tasks based on classifying different types of plants are similar) We also demonstrate the practical value of this framework for the meta-task of selecting a pre-trained feature extractor for a new task.
no code implementations • 4 Oct 2018 • Alessandro Achille, Glen Mbeng, Stefano Soatto
We compute the transition probability between two learning tasks, and show that it decomposes into two factors.
1 code implementation • NeurIPS 2018 • Alessandro Achille, Tom Eccles, Loic Matthey, Christopher P. Burgess, Nick Watters, Alexander Lerchner, Irina Higgins
Intelligent behaviour in the real-world requires the ability to acquire new knowledge from an ongoing sequence of experiences while preserving and reusing past knowledge.
1 code implementation • 24 Nov 2017 • Alessandro Achille, Matteo Rovere, Stefano Soatto
Deficits that do not affect low-level statistics, such as vertical flipping of the images, have no lasting effect on performance and can be overcome with further training.
no code implementations • 9 Nov 2017 • Alessandro Achille, Stefano Soatto
Again this can be finitely-parametrized using a deep neural network, and already some applications are beginning to emerge.
no code implementations • 5 Jun 2017 • Alessandro Achille, Stefano Soatto
Using established principles from Statistics and Information Theory, we show that invariance to nuisance factors in a deep neural network is equivalent to information minimality of the learned representation, and that stacking layers and injecting noise during training naturally bias the network towards learning invariant representations.
1 code implementation • 4 Nov 2016 • Alessandro Achille, Stefano Soatto
The cross-entropy loss commonly used in deep learning is closely related to the defining properties of optimal representations, but does not enforce some of the key properties.