Language Models

LLaMA

Introduced by Touvron et al. in LLaMA: Open and Efficient Foundation Language Models

LLaMA is a collection of foundation language models ranging from 7B to 65B parameters. It is based on the transformer architecture with various improvements that were subsequently proposed. The main difference with the original architecture are listed below.

  • RMSNorm normalizing function is used to improve the training stability, by normalizing the input of each transformer sub-layer, instead of normalizing the output.
  • The ReLU non-linearity is replaced by the SwiGLU activation function to improve performance.
  • Absolute positional embeddings are removed and instead rotary positional embeddings (RoPE) are added at each layer of the network.
Source: LLaMA: Open and Efficient Foundation Language Models

Papers


Paper Code Results Date Stars

Tasks


Task Papers Share
Language Modelling 98 13.61%
Large Language Model 57 7.92%
Question Answering 32 4.44%
Quantization 25 3.47%
Text Generation 25 3.47%
Instruction Following 23 3.19%
In-Context Learning 22 3.06%
Retrieval 20 2.78%
Code Generation 18 2.50%

Components


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories