Search Results for author: Samira Abnar

Found 19 papers, 9 papers with code

Adaptivity and Modularity for Efficient Generalization Over Task Complexity

no code implementations13 Oct 2023 Samira Abnar, Omid Saremi, Laurent Dinh, Shantel Wilson, Miguel Angel Bautista, Chen Huang, Vimal Thilak, Etai Littwin, Jiatao Gu, Josh Susskind, Samy Bengio

We investigate how the use of a mechanism for adaptive and modular computation in transformers facilitates the learning of tasks that demand generalization over the number of sequential computation steps (i. e., the depth of the computation graph).

Retrieval

Diffusion Probabilistic Fields

no code implementations1 Mar 2023 Peiye Zhuang, Samira Abnar, Jiatao Gu, Alex Schwing, Joshua M. Susskind, Miguel Ángel Bautista

Diffusion probabilistic models have quickly become a major approach for generative modeling of images, 3D geometry, video and other domains.

Denoising

GAUDI: A Neural Architect for Immersive 3D Scene Generation

1 code implementation27 Jul 2022 Miguel Angel Bautista, Pengsheng Guo, Samira Abnar, Walter Talbott, Alexander Toshev, Zhuoyuan Chen, Laurent Dinh, Shuangfei Zhai, Hanlin Goh, Daniel Ulbricht, Afshin Dehghan, Josh Susskind

We introduce GAUDI, a generative model capable of capturing the distribution of complex and realistic 3D scenes that can be rendered immersively from a moving camera.

Image Generation Scene Generation

Exploring the Limits of Large Scale Pre-training

no code implementations ICLR 2022 Samira Abnar, Mostafa Dehghani, Behnam Neyshabur, Hanie Sedghi

Recent developments in large-scale machine learning suggest that by scaling up data, model size and training time properly, one might observe that improvements in pre-training would transfer favorably to most downstream tasks.

Scale Efficiently: Insights from Pretraining and Finetuning Transformers

no code implementations ICLR 2022 Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler

The key findings of this paper are as follows: (1) we show that aside from only the model size, model shape matters for downstream fine-tuning, (2) scaling protocols operate differently at different compute regions, (3) widely adopted T5-base and T5-large sizes are Pareto-inefficient.

Gradual Domain Adaptation in the Wild: When Intermediate Distributions are Absent

no code implementations29 Sep 2021 Samira Abnar, Rianne van den Berg, Golnaz Ghiasi, Mostafa Dehghani, Nal Kalchbrenner, Hanie Sedghi

It is shown that under the following two assumptions: (a) access to samples from intermediate distributions, and (b) samples being annotated with the amount of change from the source distribution; self-training can be successfully applied on gradually shifted samples to adapt the model toward the target distribution.

Domain Adaptation

Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers

3 code implementations22 Sep 2021 Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler

The key findings of this paper are as follows: (1) we show that aside from only the model size, model shape matters for downstream fine-tuning, (2) scaling protocols operate differently at different compute regions, (3) widely adopted T5-base and T5-large sizes are Pareto-inefficient.

Gradual Domain Adaptation in the Wild:When Intermediate Distributions are Absent

1 code implementation10 Jun 2021 Samira Abnar, Rianne van den Berg, Golnaz Ghiasi, Mostafa Dehghani, Nal Kalchbrenner, Hanie Sedghi

It has been shown that under the following two assumptions: (a) access to samples from intermediate distributions, and (b) samples being annotated with the amount of change from the source distribution, self-training can be successfully applied on gradually shifted samples to adapt the model toward the target distribution.

Domain Adaptation

Long Range Arena: A Benchmark for Efficient Transformers

5 code implementations8 Nov 2020 Yi Tay, Mostafa Dehghani, Samira Abnar, Yikang Shen, Dara Bahri, Philip Pham, Jinfeng Rao, Liu Yang, Sebastian Ruder, Donald Metzler

In the recent months, a wide spectrum of efficient, fast Transformers have been proposed to tackle this problem, more often than not claiming superior or comparable model quality to vanilla Transformer models.

Ranked #18 on Long-range modeling on LRA (Pathfinder metric)

16k Benchmarking +1

Transferring Inductive Biases through Knowledge Distillation

1 code implementation31 May 2020 Samira Abnar, Mostafa Dehghani, Willem Zuidema

Having the right inductive biases can be crucial in many tasks or scenarios where data or computing resources are a limiting factor, or where training data is not perfectly representative of the conditions at test time.

Knowledge Distillation

Quantifying Attention Flow in Transformers

7 code implementations ACL 2020 Samira Abnar, Willem Zuidema

This makes attention weights unreliable as explanations probes.

Blackbox Meets Blackbox: Representational Similarity \& Stability Analysis of Neural Language Models and Brains

1 code implementation WS 2019 Samira Abnar, Lisa Beinborn, Rochelle Choenni, Willem Zuidema

In this paper, we define and apply representational stability analysis (ReStA), an intuitive way of analyzing neural language models.

Blackbox meets blackbox: Representational Similarity and Stability Analysis of Neural Language Models and Brains

1 code implementation4 Jun 2019 Samira Abnar, Lisa Beinborn, Rochelle Choenni, Willem Zuidema

In this paper, we define and apply representational stability analysis (ReStA), an intuitive way of analyzing neural language models.

Robust Evaluation of Language-Brain Encoding Experiments

1 code implementation4 Apr 2019 Lisa Beinborn, Samira Abnar, Rochelle Choenni

Language-brain encoding experiments evaluate the ability of language models to predict brain responses elicited by language stimuli.

Incremental Reading for Question Answering

no code implementations15 Jan 2019 Samira Abnar, Tania Bedrax-Weiss, Tom Kwiatkowski, William W. Cohen

Current state-of-the-art question answering models reason over an entire passage, not incrementally.

Continual Learning Question Answering

Experiential, Distributional and Dependency-based Word Embeddings have Complementary Roles in Decoding Brain Activity

no code implementations WS 2018 Samira Abnar, Rasyan Ahmed, Max Mijnheer, Willem Zuidema

We evaluate 8 different word embedding models on their usefulness for predicting the neural activation patterns associated with concrete nouns.

Word Embeddings

Cannot find the paper you are looking for? You can Submit a new open access paper.