LIMA

Introduced by Zhou et al. in LIMA: Less Is More for Alignment

The LIMA dataset is a valuable resource used in natural language processing (NLP) research. Let me provide you with some details:

  1. Origin and Purpose:
  2. The LIMA dataset is derived from the LLaMa language model, which has an impressive 65 billion parameters.
  3. It serves as a fine-tuned version of the LLaMa model, specifically adjusted using approximately 1,000 prompts and responses.

  4. Performance and Applications:

  5. LIMA demonstrates remarkable performance by learning to follow specific response formats from just a handful of examples in the training data.
  6. The dataset covers a wide range of tasks, including complex queries such as planning trip itineraries and speculating about alternate history.
  7. Interestingly, the model tends to generalize well to unseen tasks that were not part of the training data.

  8. License:

  9. The licensing of the LIMA dataset depends on the source data it was derived from:
    • If the source data has a stricter license than CC BY-NC-SA, the LIMA dataset follows the same restrictions.
    • Otherwise, it adheres to the CC BY-NC-SA license.

(1) GAIR/lima · Datasets at Hugging Face. https://huggingface.co/datasets/GAIR/lima. (2) GAIR/lima at main - Hugging Face. https://huggingface.co/datasets/GAIR/lima/tree/main. (3) 日本語LIMAデータセットlima-jaを作成したので公開します. https://zanote.net/ai/lima-ja/. (4) Paper page - LIMA: Less Is More for Alignment - Hugging Face. https://huggingface.co/papers/2305.11206. (5) undefined. https://huggingface.co/datasets/GAIR/lima/.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages