SeaEval is a benchmark designed for evaluating multilingual foundation models (FMs). These large language models (LLMs) have demonstrated impressive generalizability and adaptability across various downstream tasks. The SeaEval benchmark goes beyond standard accuracy metrics and investigates how well these models understand and reason with natural language, as well as their comprehension of cultural practices, nuances, and values¹.

Here are some key aspects of the SeaEval benchmark:

  1. Multilingual Context: SeaEval spans multiple languages, allowing researchers to assess model performance across diverse linguistic contexts.

  2. Cultural Reasoning: In addition to traditional NLP tasks, SeaEval evaluates how well models comprehend cultural nuances and practices. This is crucial for applications in multicultural scenarios.

  3. Brittleness Analysis: The benchmark explores the brittleness of foundation models in terms of semantics and multilinguality. For instance:

    • Some models exhibit varied behavior when given paraphrased instructions.
    • Exposure bias (e.g., positional bias, majority label bias) remains a challenge.
    • Consistent responses are expected for semantically equivalent queries across languages, but most models surprisingly demonstrate inconsistency for factual, scientific, and commonsense knowledge questions.
    • Multilingually-trained models have not yet achieved "balanced multilingual" capabilities.
  4. Empirical Results: SeaEval provides empirical results across classic NLP tasks, reasoning, and cultural comprehension.

In summary, SeaEval serves as a launchpad for thorough investigations and evaluations of multilingual and multicultural scenarios, emphasizing the need for more generalizable semantic representations and enhanced multilingual contextualization¹.

(1) SeaEval for Multilingual Foundation Models: - arXiv.org. https://arxiv.org/html/2309.04766v2. (2) [2306.05179] M3Exam: A Multilingual, Multimodal, Multilevel Benchmark .... https://arxiv.org/abs/2306.05179. (3) SeaEval for Multilingual Foundation Models: From Cross-Lingual .... https://arxiv.org/pdf/2309.04766v1.pdf. (4) SeaEval for Multilingual Foundation Models: From Cross-Lingual .... https://arxiv.org/pdf/2309.04766v2.pdf. (5) undefined. https://github.com/SeaEval/SeaEval.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages