Long-Context Understanding
38 papers with code • 3 benchmarks • 1 datasets
Most implemented papers
GPT-4 Technical Report
We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs.
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Evaluating large language model (LLM) based chat assistants is challenging due to their broad capabilities and the inadequacy of existing benchmarks in measuring human preferences.
GLM-130B: An Open Bilingual Pre-trained Model
We introduce GLM-130B, a bilingual (English and Chinese) pre-trained language model with 130 billion parameters.
RULER: What's the Real Context Size of Your Long-Context Language Models?
Despite achieving nearly perfect accuracy in the vanilla NIAH test, almost all models exhibit large performance drops as the context length increases.
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
Large-scale pre-training and instruction tuning have been successful at creating general-purpose language models with broad competence.
CogVLM: Visual Expert for Pretrained Language Models
We introduce CogVLM, a powerful open-source visual language foundation model.
LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding
In this paper, we introduce LongBench, the first bilingual, multi-task benchmark for long context understanding, enabling a more rigorous evaluation of long context understanding.
InternLM2 Technical Report
The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has sparked discussions on the advent of Artificial General Intelligence (AGI).
FABLES: Evaluating faithfulness and content selection in book-length summarization
While LLM-based auto-raters have proven reliable for factuality and coherence in other settings, we implement several LLM raters of faithfulness and find that none correlates strongly with human annotations, especially with regard to detecting unfaithful claims.
Gated Delta Networks: Improving Mamba2 with Delta Rule
Linear Transformers have gained attention as efficient alternatives to standard Transformers, but their performance in retrieval and long-context tasks has been limited.