2 code implementations • 2 Oct 2024 • Leo Feng, Frederick Tung, Mohamed Osama Ahmed, Yoshua Bengio, Hossein Hajimirsadeghi
The introduction of Transformers in 2017 reshaped the landscape of deep learning.
1 code implementation • 22 May 2024 • Leo Feng, Frederick Tung, Hossein Hajimirsadeghi, Mohamed Osama Ahmed, Yoshua Bengio, Greg Mori
Tackling this, we (3) introduce a new efficient method of computing attention's \textit{many-to-many} RNN output based on the parallel prefix scan algorithm.
Ranked #66 on
Time Series Forecasting
on ETTh1 (336) Multivariate
1 code implementation • 29 Sep 2023 • Leo Feng, Frederick Tung, Hossein Hajimirsadeghi, Yoshua Bengio, Mohamed Osama Ahmed
In this work, we propose Tree Cross Attention (TCA) - a module based on Cross Attention that only retrieves information from a logarithmic $\mathcal{O}(\log(N))$ number of tokens for performing inference.
no code implementations • 21 Jun 2023 • Leo Feng, Frederick Tung, Hossein Hajimirsadeghi, Yoshua Bengio, Mohamed Osama Ahmed
Modern foundation model architectures rely on attention mechanisms to effectively capture context.
1 code implementation • 23 May 2023 • Leo Feng, Frederick Tung, Hossein Hajimirsadeghi, Yoshua Bengio, Mohamed Osama Ahmed
Leveraging the update operation, we propose Constant Memory Attention Block (CMAB), a novel attention block that (i) is permutation invariant, (ii) computes its output in constant memory, and (iii) performs constant computation updates.
1 code implementation • 27 Jan 2023 • Wonho Bae, Mohamed Osama Ahmed, Frederick Tung, Gabriel L. Oliveira
In this work, we propose to train TPPs in a meta learning framework, where each sequence is treated as a different task, via a novel framing of TPPs as neural processes (NPs).
no code implementations • 19 Nov 2022 • Mahmoud Salem, Mohamed Osama Ahmed, Frederick Tung, Gabriel Oliveira
This commonly encountered operational context calls for principled techniques for training ML models with the option to abstain from predicting when uncertain.
1 code implementation • 15 Nov 2022 • Leo Feng, Hossein Hajimirsadeghi, Yoshua Bengio, Mohamed Osama Ahmed
We demonstrate that LBANPs can trade-off the computational cost and performance according to the number of latent vectors.
1 code implementation • 17 Jun 2022 • Leo Feng, Mohamed Osama Ahmed, Hossein Hajimirsadeghi, Amir Abdi
We tackle the problem of Selective Classification where the objective is to achieve the best performance on a predetermined ratio (coverage) of the dataset.
no code implementations • 17 May 2022 • Joao Monteiro, Mohamed Osama Ahmed, Hossein Hajimirsadeghi, Greg Mori
We study settings where gradient penalties are used alongside risk minimization with the goal of obtaining predictors satisfying different notions of monotonicity.
no code implementations • 29 Sep 2021 • Joao Monteiro, Mohamed Osama Ahmed, Hossein Hajimirsadeghi, Greg Mori
We study the setting where risk minimization is performed over general classes of models and consider two cases where monotonicity is treated as either a requirement to be satisfied everywhere or a useful property.
no code implementations • 18 Oct 2019 • Nazanin Mehrasa, Ruizhi Deng, Mohamed Osama Ahmed, Bo Chang, JiaWei He, Thibaut Durand, Marcus Brubaker, Greg Mori
Event sequences can be modeled by temporal point processes (TPPs) to capture their asynchronous and probabilistic nature.
no code implementations • 7 Apr 2019 • Ramy Hussein, Mohamed Osama Ahmed, Rabab Ward, Z. Jane Wang, Levin Kuhlmann, Yi Guo
2) The traditional PCA is not a reliable method for iEEG data reduction in seizure prediction.
no code implementations • 10 Oct 2018 • Mohamed Osama Ahmed, Sharan Vaswani, Mark Schmidt
Indeed, in a particular setting, we prove that using the Lipschitz information yields the same or a better bound on the regret compared to using Bayesian optimization on its own.
no code implementations • NeurIPS 2015 • Reza Harikandeh, Mohamed Osama Ahmed, Alim Virani, Mark Schmidt, Jakub Konečný, Scott Sallinen
We present and analyze several strategies for improving the performance ofstochastic variance-reduced gradient (SVRG) methods.
no code implementations • 5 Nov 2015 • Reza Babanezhad, Mohamed Osama Ahmed, Alim Virani, Mark Schmidt, Jakub Konečný, Scott Sallinen
We present and analyze several strategies for improving the performance of stochastic variance-reduced gradient (SVRG) methods.
no code implementations • 16 Apr 2015 • Mark Schmidt, Reza Babanezhad, Mohamed Osama Ahmed, Aaron Defazio, Ann Clifton, Anoop Sarkar
We apply stochastic average gradient (SAG) algorithms for training conditional random fields (CRFs).