Search Results

Mistral 7B

6 code implementations10 Oct 2023

We introduce Mistral 7B v0. 1, a 7-billion-parameter language model engineered for superior performance and efficiency.

answerability prediction Arithmetic Reasoning +11

LoRA: Low-Rank Adaptation of Large Language Models

68 code implementations ICLR 2022

We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks.

Language Modelling Mathematical Reasoning +1

Cut Your Losses in Large-Vocabulary Language Models

2 code implementations13 Nov 2024

We implement a custom kernel that performs the matrix multiplications and the log-sum-exp reduction over the vocabulary in flash memory, making global memory consumption for the cross-entropy computation negligible.

SqueezeLLM: Dense-and-Sparse Quantization

3 code implementations13 Jun 2023

When applied to the LLaMA models, our 3-bit quantization significantly reduces the perplexity gap from the FP16 baseline by up to 2. 1x as compared to the state-of-the-art methods with the same memory requirement.

Quantization

ORPO: Monolithic Preference Optimization without Reference Model

4 code implementations12 Mar 2024

While recent preference alignment algorithms for language models have demonstrated promising results, supervised fine-tuning (SFT) remains imperative for achieving successful convergence.

model

MuSR: Testing the Limits of Chain-of-thought with Multistep Soft Reasoning

3 code implementations24 Oct 2023

We evaluate a range of LLMs and prompting techniques on this dataset and characterize the gaps that remain for techniques like chain-of-thought to perform robust reasoning.

QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models

2 code implementations26 Sep 2023

Recently years have witnessed a rapid development of large language models (LLMs).

Quantization

ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models

3 code implementations2 Sep 2023

Large language models (LLMs) have recently demonstrated remarkable capabilities to comprehend human intentions, engage in reasoning, and design planning-like behavior.

Aurora:Activating Chinese chat capability for Mixtral-8x7B sparse Mixture-of-Experts through Instruction-Tuning

1 code implementation22 Dec 2023

This work is pioneering in the execution of instruction fine-tuning on a sparse expert-mixed model, marking a significant breakthrough in enhancing the capabilities of this model architecture.

Instruction Following Mixture-of-Experts +1

From Local to Global: A Graph RAG Approach to Query-Focused Summarization

3 code implementations24 Apr 2024

To combine the strengths of these contrasting methods, we propose a Graph RAG approach to question answering over private text corpora that scales with both the generality of user questions and the quantity of source text to be indexed.

Query-focused Summarization Question Answering +2