SeaEval

Introduced by Wang et al. in SeaEval for Multilingual Foundation Models: From Cross-Lingual Alignment to Cultural Reasoning

SeaEval is a benchmark designed for evaluating multilingual foundation models (FMs). These large language models (LLMs) have demonstrated impressive generalizability and adaptability across various downstream tasks. The SeaEval benchmark goes beyond standard accuracy metrics and investigates how well these models understand and reason with natural language, as well as their comprehension of cultural practices, nuances, and values¹.

Here are some key aspects of the SeaEval benchmark:

Multilingual Context: SeaEval spans multiple languages, allowing researchers to assess model performance across diverse linguistic contexts.
Cultural Reasoning: In addition to traditional NLP tasks, SeaEval evaluates how well models comprehend cultural nuances and practices. This is crucial for applications in multicultural scenarios.
Brittleness Analysis: The benchmark explores the brittleness of foundation models in terms of semantics and multilinguality. For instance:
- Some models exhibit varied behavior when given paraphrased instructions.
- Exposure bias (e.g., positional bias, majority label bias) remains a challenge.
- Consistent responses are expected for semantically equivalent queries across languages, but most models surprisingly demonstrate inconsistency for factual, scientific, and commonsense knowledge questions.
- Multilingually-trained models have not yet achieved "balanced multilingual" capabilities.
Empirical Results: SeaEval provides empirical results across classic NLP tasks, reasoning, and cultural comprehension.

In summary, SeaEval serves as a launchpad for thorough investigations and evaluations of multilingual and multicultural scenarios, emphasizing the need for more generalizable semantic representations and enhanced multilingual contextualization¹.

(1) SeaEval for Multilingual Foundation Models: - arXiv.org. https://arxiv.org/html/2309.04766v2. (2) [2306.05179] M3Exam: A Multilingual, Multimodal, Multilevel Benchmark .... https://arxiv.org/abs/2306.05179. (3) SeaEval for Multilingual Foundation Models: From Cross-Lingual .... https://arxiv.org/pdf/2309.04766v1.pdf. (4) SeaEval for Multilingual Foundation Models: From Cross-Lingual .... https://arxiv.org/pdf/2309.04766v2.pdf. (5) undefined. https://github.com/SeaEval/SeaEval.

Homepage

Benchmarks

Add a new result Link an existing benchmark

No benchmarks yet. Start a new benchmark or link an existing one.

Papers

Paper	Code	Results	Date	Stars

SeaEval

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Similar Datasets

DREAM

FLoRes

C3

OCNLI

Usage

License

Modalities

Languages

SeaEval

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit

Similar Datasets

DREAM

FLoRes

C3

OCNLI

Usage

License Edit

Modalities Edit

Languages Edit

Benchmarks

Add a new result Link an existing benchmark

Dataset Loaders

Add Remove

Tasks

License

Modalities

Languages