MAML, or Model-Agnostic Meta-Learning, is a model and task-agnostic algorithm for meta-learning that trains a model’s parameters such that a small number of gradient updates will lead to fast learning on a new task.
Consider a model represented by a parametrized function $f_{\theta}$ with parameters $\theta$. When adapting to a new task $\mathcal{T}_{i}$, the model’s parameters $\theta$ become $\theta'_{i}$. With MAML, the updated parameter vector $\theta'_{i}$ is computed using one or more gradient descent updates on task $\mathcal{T}_{i}$. For example, when using one gradient update,
$$ \theta'_{i} = \theta - \alpha\nabla_{\theta}\mathcal{L}_{\mathcal{T}_{i}}\left(f_{\theta}\right) $$
The step size $\alpha$ may be fixed as a hyperparameter or metalearned. The model parameters are trained by optimizing for the performance of $f_{\theta'_{i}}$ with respect to $\theta$ across tasks sampled from $p\left(\mathcal{T}_{i}\right)$. More concretely the meta-objective is as follows:
$$ \min_{\theta} \sum_{\mathcal{T}_{i} \sim p\left(\mathcal{T}\right)} \mathcal{L}_{\mathcal{T_{i}}}\left(f_{\theta'_{i}}\right) = \sum_{\mathcal{T}_{i} \sim p\left(\mathcal{T}\right)} \mathcal{L}_{\mathcal{T_{i}}}\left(f_{\theta - \alpha\nabla_{\theta}\mathcal{L}_{\mathcal{T}_{i}}\left(f_{\theta}\right)}\right) $$
Note that the meta-optimization is performed over the model parameters $\theta$, whereas the objective is computed using the updated model parameters $\theta'$. In effect MAML aims to optimize the model parameters such that one or a small number of gradient steps on a new task will produce maximally effective behavior on that task. The meta-optimization across tasks is performed via stochastic gradient descent (SGD), such that the model parameters $\theta$ are updated as follows:
$$ \theta \leftarrow \theta - \beta\nabla_{\theta} \sum_{\mathcal{T}_{i} \sim p\left(\mathcal{T}\right)} \mathcal{L}_{\mathcal{T_{i}}}\left(f_{\theta'_{i}}\right)$$
where $\beta$ is the meta step size.
Source: Model-Agnostic Meta-Learning for Fast Adaptation of Deep NetworksPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Meta-Learning | 170 | 36.64% |
Few-Shot Learning | 60 | 12.93% |
Image Classification | 18 | 3.88% |
reinforcement Learning | 16 | 3.45% |
General Classification | 16 | 3.45% |
Few-Shot Image Classification | 13 | 2.80% |
Federated Learning | 8 | 1.72% |
Continual Learning | 6 | 1.29% |
Text Classification | 6 | 1.29% |
Component | Type |
|
---|---|---|
🤖 No Components Found | You can add them if they exist; e.g. Mask R-CNN uses RoIAlign |