Search Results for author: Behrooz Ghorbani

Found 18 papers, 6 papers with code

Order Matters in the Presence of Dataset Imbalance for Multilingual Learning

no code implementations NeurIPS 2023 Dami Choi, Derrick Xin, Hamid Dadkhahi, Justin Gilmer, Ankush Garg, Orhan Firat, Chih-Kuan Yeh, Andrew M. Dai, Behrooz Ghorbani

In this paper, we empirically study the optimization dynamics of multi-task learning, particularly focusing on those that govern a collection of tasks with significant data imbalance.

Language Modelling Machine Translation +3

Epsilon Sampling Rocks: Investigating Sampling Strategies for Minimum Bayes Risk Decoding for Machine Translation

1 code implementation17 May 2023 Markus Freitag, Behrooz Ghorbani, Patrick Fernandes

Recent advances in machine translation (MT) have shown that Minimum Bayes Risk (MBR) decoding can be a powerful alternative to beam search decoding, especially when combined with neural-based utility functions.

Machine Translation

Scaling Laws for Multilingual Neural Machine Translation

no code implementations19 Feb 2023 Patrick Fernandes, Behrooz Ghorbani, Xavier Garcia, Markus Freitag, Orhan Firat

Through a novel joint scaling law formulation, we compute the effective number of parameters allocated to each language pair and examine the role of language similarity in the scaling behavior of our models.

Machine Translation Translation

Binarized Neural Machine Translation

1 code implementation NeurIPS 2023 Yichi Zhang, Ankush Garg, Yuan Cao, Łukasz Lew, Behrooz Ghorbani, Zhiru Zhang, Orhan Firat

In this work, we propose a novel binarization technique for Transformers applied to machine translation (BMT), the first of its kind.

Binarization Machine Translation +2

Do Current Multi-Task Optimization Methods in Deep Learning Even Help?

no code implementations23 Sep 2022 Derrick Xin, Behrooz Ghorbani, Ankush Garg, Orhan Firat, Justin Gilmer

Recent research has proposed a series of specialized optimization algorithms for deep multi-task models.

Data Scaling Laws in NMT: The Effect of Noise and Architecture

no code implementations4 Feb 2022 Yamini Bansal, Behrooz Ghorbani, Ankush Garg, Biao Zhang, Maxim Krikun, Colin Cherry, Behnam Neyshabur, Orhan Firat

In this work, we study the effect of varying the architecture and training data quality on the data scaling properties of Neural Machine Translation (NMT).

Language Modelling Machine Translation +1

Examining Scaling and Transfer of Language Model Architectures for Machine Translation

no code implementations1 Feb 2022 Biao Zhang, Behrooz Ghorbani, Ankur Bapna, Yong Cheng, Xavier Garcia, Jonathan Shen, Orhan Firat

Natural language understanding and generation models follow one of the two dominant architectural paradigms: language models (LMs) that process concatenated sequences in a single stack of layers, and encoder-decoder models (EncDec) that utilize separate layer stacks for input and output processing.

Language Modelling Machine Translation +2

A Loss Curvature Perspective on Training Instability in Deep Learning

no code implementations8 Oct 2021 Justin Gilmer, Behrooz Ghorbani, Ankush Garg, Sneha Kudugunta, Behnam Neyshabur, David Cardoze, George Dahl, Zachary Nado, Orhan Firat

In this work, we study the evolution of the loss Hessian across many classification tasks in order to understand the effect the curvature of the loss has on the training dynamics.

Navigate

A Loss Curvature Perspective on Training Instabilities of Deep Learning Models

no code implementations ICLR 2022 Justin Gilmer, Behrooz Ghorbani, Ankush Garg, Sneha Kudugunta, Behnam Neyshabur, David Cardoze, George Edward Dahl, Zachary Nado, Orhan Firat

In this work, we study the evolution of the loss Hessian across many classification tasks in order to understand the effect the curvature of the loss has on the training dynamics.

Navigate

When Do Neural Networks Outperform Kernel Methods?

1 code implementation NeurIPS 2020 Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz, Andrea Montanari

Recent empirical work showed that, for some classification tasks, RKHS methods can replace NNs without a large loss in performance.

Image Classification

Limitations of Lazy Training of Two-layers Neural Network

1 code implementation NeurIPS 2019 Song Mei, Theodor Misiakiewicz, Behrooz Ghorbani, Andrea Montanari

We study the supervised learning problem under either of the following two models: (1) Feature vectors x_i are d-dimensional Gaussian and responses are y_i = f_*(x_i) for f_* an unknown quadratic function; (2) Feature vectors x_i are distributed as a mixture of two d-dimensional centered Gaussians, and y_i's are the corresponding class labels.

Vocal Bursts Valence Prediction

Limitations of Lazy Training of Two-layers Neural Networks

1 code implementation21 Jun 2019 Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz, Andrea Montanari

We study the supervised learning problem under either of the following two models: (1) Feature vectors ${\boldsymbol x}_i$ are $d$-dimensional Gaussians and responses are $y_i = f_*({\boldsymbol x}_i)$ for $f_*$ an unknown quadratic function; (2) Feature vectors ${\boldsymbol x}_i$ are distributed as a mixture of two $d$-dimensional centered Gaussians, and $y_i$'s are the corresponding class labels.

Vocal Bursts Valence Prediction

The Effect of Network Depth on the Optimization Landscape

no code implementations28 May 2019 Behrooz Ghorbani, Ying Xiao, Shankar Krishnan

It is well-known that deeper neural networks are harder to train than shallower ones.

Linearized two-layers neural networks in high dimension

no code implementations27 Apr 2019 Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz, Andrea Montanari

Both these approaches can also be regarded as randomized approximations of kernel ridge regression (with respect to different kernels), and enjoy universal approximation properties when the number of neurons $N$ diverges, for a fixed dimension $d$.

regression Vocal Bursts Intensity Prediction +1

An Investigation into Neural Net Optimization via Hessian Eigenvalue Density

1 code implementation29 Jan 2019 Behrooz Ghorbani, Shankar Krishnan, Ying Xiao

To understand the dynamics of optimization in deep neural networks, we develop a tool to study the evolution of the entire Hessian spectrum throughout the optimization process.

An Instability in Variational Inference for Topic Models

no code implementations2 Feb 2018 Behrooz Ghorbani, Hamid Javadi, Andrea Montanari

Namely, for certain regimes of the model parameters, variational inference outputs a non-trivial decomposition into topics.

Topic Models Variational Inference

Cannot find the paper you are looking for? You can Submit a new open access paper.