TinyQA Benchmark++
1 papers with code • 1 benchmarks • 1 datasets
Ultra-lightweight evaluation suite and python package designed to expose critical failures in Large Language Model (LLM) systems within seconds
Most implemented papers
Tiny QA Benchmark++: Ultra-Lightweight, Synthetic Multilingual Dataset Generation & Smoke-Tests for Continuous LLM Evaluation
Tiny QA Benchmark++ (TQB++) presents an ultra-lightweight, multilingual smoke-test suite designed to give large-language-model (LLM) pipelines a unit-test style safety net dataset that runs in seconds with minimal cost.