Long-Context Understanding

38 papers with code • 3 benchmarks • 1 datasets

This task has no description! Would you like to contribute one?

Datasets


Most implemented papers

GPT-4 Technical Report

openai/evals Preprint 2023

We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs.

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

lm-sys/fastchat NeurIPS 2023

Evaluating large language model (LLM) based chat assistants is challenging due to their broad capabilities and the inadequacy of existing benchmarks in measuring human preferences.

GLM-130B: An Open Bilingual Pre-trained Model

thudm/glm-130b 5 Oct 2022

We introduce GLM-130B, a bilingual (English and Chinese) pre-trained language model with 130 billion parameters.

RULER: What's the Real Context Size of Your Long-Context Language Models?

hsiehjackson/ruler 9 Apr 2024

Despite achieving nearly perfect accuracy in the vanilla NIAH test, almost all models exhibit large performance drops as the context length increases.

InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning

salesforce/lavis NeurIPS 2023

Large-scale pre-training and instruction tuning have been successful at creating general-purpose language models with broad competence.

CogVLM: Visual Expert for Pretrained Language Models

thudm/cogvlm 6 Nov 2023

We introduce CogVLM, a powerful open-source visual language foundation model.

LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding

thudm/longbench 28 Aug 2023

In this paper, we introduce LongBench, the first bilingual, multi-task benchmark for long context understanding, enabling a more rigorous evaluation of long context understanding.

InternLM2 Technical Report

internlm/internlm 26 Mar 2024

The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has sparked discussions on the advent of Artificial General Intelligence (AGI).

FABLES: Evaluating faithfulness and content selection in book-length summarization

mungg/fables 1 Apr 2024

While LLM-based auto-raters have proven reliable for factuality and coherence in other settings, we implement several LLM raters of faithfulness and find that none correlates strongly with human annotations, especially with regard to detecting unfaithful claims.

Gated Delta Networks: Improving Mamba2 with Delta Rule

NVlabs/GatedDeltaNet 9 Dec 2024

Linear Transformers have gained attention as efficient alternatives to standard Transformers, but their performance in retrieval and long-context tasks has been limited.