SeaEval is a benchmark designed for evaluating multilingual foundation models (FMs). These large language models (LLMs) have demonstrated impressive generalizability and adaptability across various downstream tasks. The SeaEval benchmark goes beyond standard accuracy metrics and investigates how well these models understand and reason with natural language, as well as their comprehension of cultural practices, nuances, and values¹.
Here are some key aspects of the SeaEval benchmark:
Multilingual Context: SeaEval spans multiple languages, allowing researchers to assess model performance across diverse linguistic contexts.
Cultural Reasoning: In addition to traditional NLP tasks, SeaEval evaluates how well models comprehend cultural nuances and practices. This is crucial for applications in multicultural scenarios.
Brittleness Analysis: The benchmark explores the brittleness of foundation models in terms of semantics and multilinguality. For instance:
Empirical Results: SeaEval provides empirical results across classic NLP tasks, reasoning, and cultural comprehension.
In summary, SeaEval serves as a launchpad for thorough investigations and evaluations of multilingual and multicultural scenarios, emphasizing the need for more generalizable semantic representations and enhanced multilingual contextualization¹.
(1) SeaEval for Multilingual Foundation Models: - arXiv.org. https://arxiv.org/html/2309.04766v2. (2) [2306.05179] M3Exam: A Multilingual, Multimodal, Multilevel Benchmark .... https://arxiv.org/abs/2306.05179. (3) SeaEval for Multilingual Foundation Models: From Cross-Lingual .... https://arxiv.org/pdf/2309.04766v1.pdf. (4) SeaEval for Multilingual Foundation Models: From Cross-Lingual .... https://arxiv.org/pdf/2309.04766v2.pdf. (5) undefined. https://github.com/SeaEval/SeaEval.
Paper | Code | Results | Date | Stars |
---|