no code implementations • Findings (EMNLP) 2021 • Haonan Li, Yeyun Gong, Jian Jiao, Ruofei Zhang, Timothy Baldwin, Nan Duan
Pre-trained language models have led to substantial gains over a broad range of natural language processing (NLP) tasks, but have been shown to have limitations for natural language generation tasks with high-quality requirements on the output, such as commonsense generation and ad keyword generation.
no code implementations • NAACL (ACL) 2022 • Haonan Li, Yameng Huang, Yeyun Gong, Jian Jiao, Ruofei Zhang, Timothy Baldwin, Nan Duan
Pre-trained language models (PLMs) have dramatically improved performance for many natural language processing (NLP) tasks in domains such as finance and healthcare.
1 code implementation • NAACL 2022 • Haonan Li, Martin Tomko, Maria Vasardani, Timothy Baldwin
Raw questions and contexts are extracted from the Natural Questions dataset.
no code implementations • 13 Jan 2025 • Zhengzhong Liu, Bowen Tan, Hongyi Wang, Willie Neiswanger, Tianhua Tao, Haonan Li, Fajri Koto, Yuqi Wang, Suqi Sun, Omkar Pangarkar, Richard Fan, Yi Gu, Victor Miller, Liqun Ma, Liping Tang, Nikhil Ranjan, Yonghao Zhuang, Guowei He, Renxi Wang, Mingkai Deng, Robin Algayres, Yuanzhi Li, Zhiqiang Shen, Preslav Nakov, Eric Xing
We detail the training of the LLM360 K2-65B model, scaling up our 360-degree OPEN SOURCE approach to the largest and most powerful models under project LLM360.
no code implementations • 24 Dec 2024 • Haonan Li, Xudong Han, Zenan Zhai, Honglin Mu, Hao Wang, Zhenxuan Zhang, Yilin Geng, Shom Lin, Renxi Wang, Artem Shelmanov, Xiangyu Qi, Yuxia Wang, Donghai Hong, Youliang Yuan, Meng Chen, Haoqin Tu, Fajri Koto, Tatsuki Kuribayashi, Cong Zeng, Rishabh Bhardwaj, Bingchen Zhao, Yawen Duan, Yi Liu, Emad A. Alghamdi, Yaodong Yang, Yinpeng Dong, Soujanya Poria, PengFei Liu, Zhengzhong Liu, Xuguang Ren, Eduard Hovy, Iryna Gurevych, Preslav Nakov, Monojit Choudhury, Timothy Baldwin
To address this gap, we introduce Libra-Leaderboard, a comprehensive framework designed to rank LLMs through a balanced evaluation of performance and safety.
no code implementations • 18 Nov 2024 • Shengkun Tang, Liqun Ma, Haonan Li, MingJie Sun, Zhiqiang Shen
The typical selective state-space model (SSM) of Mamba addresses several limitations of Transformers, such as quadratic computational complexity with sequence length and significant inference-time memory requirements due to the key-value cache.
1 code implementation • 4 Oct 2024 • Renxi Wang, Xudong Han, Lei Ji, Shu Wang, Timothy Baldwin, Haonan Li
As large language models (LLMs) advance, their inability to autonomously execute tasks by directly interacting with external tools remains a critical limitation.
1 code implementation • 2 Oct 2024 • Haonan Li, Xudong Han, Hao Wang, Yuxia Wang, Minghan Wang, Rui Xing, Yilin Geng, Zenan Zhai, Preslav Nakov, Timothy Baldwin
We introduce Loki, an open-source tool designed to address the growing problem of misinformation.
1 code implementation • 28 Jun 2024 • Sukmin Yun, Haokun Lin, Rusiru Thushara, Mohammad Qazim Bhat, Yongxin Wang, Zutao Jiang, Mingkai Deng, Jinhong Wang, Tianhua Tao, Junbo Li, Haonan Li, Preslav Nakov, Timothy Baldwin, Zhengzhong Liu, Eric P. Xing, Xiaodan Liang, Zhiqiang Shen
To address this problem, we propose $\texttt{Web2Code}$, a benchmark consisting of a new large-scale webpage-to-code dataset for instruction tuning and an evaluation framework for the webpage understanding and HTML code translation abilities of MLLMs.
no code implementations • 23 May 2024 • Stella Biderman, Hailey Schoelkopf, Lintang Sutawika, Leo Gao, Jonathan Tow, Baber Abbasi, Alham Fikri Aji, Pawan Sasanka Ammanamanchi, Sidney Black, Jordan Clive, Anthony DiPofi, Julen Etxaniz, Benjamin Fattori, Jessica Zosa Forde, Charles Foster, Jeffrey Hsu, Mimansa Jaiswal, Wilson Y. Lee, Haonan Li, Charles Lovering, Niklas Muennighoff, Ellie Pavlick, Jason Phang, Aviya Skowron, Samson Tan, Xiangru Tang, Kevin A. Wang, Genta Indra Winata, François Yvon, Andy Zou
Third, we present the Language Model Evaluation Harness (lm-eval): an open source library for independent, reproducible, and extensible evaluation of language models that seeks to address these issues.
no code implementations • 12 May 2024 • Haonan Li, Patrick P. K. Chen, Yitong Zhou
Moreover, dealing with complex scenes and occlusions in monocular images poses entirely new challenges.
1 code implementation • 31 Mar 2024 • Lizhi Lin, Honglin Mu, Zenan Zhai, Minghan Wang, Yuxia Wang, Renxi Wang, Junjie Gao, Yixuan Zhang, Wanxiang Che, Timothy Baldwin, Xudong Han, Haonan Li
Generative models are rapidly gaining popularity and being integrated into everyday applications, raising concerns over their safe use as various vulnerabilities are exposed.
1 code implementation • 15 Mar 2024 • Rocktim Jyoti Das, Simeon Emilov Hristov, Haonan Li, Dimitar Iliyanov Dimitrov, Ivan Koychev, Preslav Nakov
Solving the problems in the dataset requires advanced perception and joint reasoning over the text and the visual content of the image.
1 code implementation • 7 Mar 2024 • Ekaterina Fadeeva, Aleksandr Rubashevskii, Artem Shelmanov, Sergey Petrakov, Haonan Li, Hamdy Mubarak, Evgenii Tsymbalov, Gleb Kuzmin, Alexander Panchenko, Timothy Baldwin, Preslav Nakov, Maxim Panov
Uncertainty scores leverage information encapsulated in the output of a neural network or its layers to detect unreliable predictions, and we show that they can be used to fact-check the atomic claims in the LLM output.
1 code implementation • 20 Feb 2024 • Fajri Koto, Haonan Li, Sara Shatnawi, Jad Doughman, Abdelrahman Boda Sadallah, Aisha Alraeesi, Khalid Almubarak, Zaid Alyafeai, Neha Sengupta, Shady Shehata, Nizar Habash, Preslav Nakov, Timothy Baldwin
The focus of language model evaluation has transitioned towards reasoning and knowledge-intensive tasks, driven by advancements in pretraining large models.
1 code implementation • 19 Feb 2024 • Yuxia Wang, Zenan Zhai, Haonan Li, Xudong Han, Lizhi Lin, Zhenxuan Zhang, Jingru Zhao, Preslav Nakov, Timothy Baldwin
Previous studies have proposed comprehensive taxonomies of the risks posed by LLMs, as well as corresponding prompts that can be used to examine the safety mechanisms of LLMs.
1 code implementation • 18 Feb 2024 • Renxi Wang, Haonan Li, Xudong Han, Yixuan Zhang, Timothy Baldwin
However, LLMs are optimized for language generation instead of tool use during training or alignment, limiting their effectiveness as agents.
1 code implementation • 4 Jan 2024 • Haonan Li, Martin Tomko, Timothy Baldwin
To overcome this, we propose treating the QA task as a dense vector retrieval problem, where we encode questions and POIs separately and retrieve the most relevant POIs for a question by utilizing embedding space similarity.
1 code implementation • 17 Dec 2023 • Renxi Wang, Haonan Li, Minghao Wu, Yuxia Wang, Xudong Han, Chiyu Zhang, Timothy Baldwin
Instruction tuning significantly enhances the performance of large language models (LLMs) across various tasks.
1 code implementation • 11 Dec 2023 • Zhengzhong Liu, Aurick Qiao, Willie Neiswanger, Hongyi Wang, Bowen Tan, Tianhua Tao, Junbo Li, Yuqi Wang, Suqi Sun, Omkar Pangarkar, Richard Fan, Yi Gu, Victor Miller, Yonghao Zhuang, Guowei He, Haonan Li, Fajri Koto, Liping Tang, Nikhil Ranjan, Zhiqiang Shen, Xuguang Ren, Roberto Iriondo, Cun Mu, Zhiting Hu, Mark Schulze, Preslav Nakov, Tim Baldwin, Eric P. Xing
The recent surge in open-source Large Language Models (LLMs), such as LLaMA, Falcon, and Mistral, provides diverse options for AI practitioners and researchers.
1 code implementation • 14 Oct 2023 • Yixuan Zhang, Haonan Li
To bridge this gap, we present ACLUE, an evaluation benchmark designed to assess the capability of language models in comprehending ancient Chinese.
1 code implementation • 7 Oct 2023 • Fajri Koto, Nurul Aisyah, Haonan Li, Timothy Baldwin
In this work, we introduce IndoMMLU, the first multi-task language understanding benchmark for Indonesian culture and languages, which consists of questions from primary school to university entrance exams in Indonesia.
no code implementations • 30 Aug 2023 • Neha Sengupta, Sunil Kumar Sahu, Bokang Jia, Satheesh Katipomu, Haonan Li, Fajri Koto, William Marshall, Gurpreet Gosal, Cynthia Liu, Zhiming Chen, Osama Mohammed Afzal, Samta Kamboj, Onkar Pandit, Rahul Pal, Lalit Pradhan, Zain Muhammad Mujahid, Massa Baali, Xudong Han, Sondos Mahmoud Bsharat, Alham Fikri Aji, Zhiqiang Shen, Zhengzhong Liu, Natalia Vassilieva, Joel Hestness, Andy Hock, Andrew Feldman, Jonathan Lee, Andrew Jackson, Hector Xuguang Ren, Preslav Nakov, Timothy Baldwin, Eric Xing
We release two open versions of the model -- the foundation Jais model, and an instruction-tuned Jais-chat variant -- with the aim of promoting research on Arabic LLMs.
1 code implementation • 25 Aug 2023 • Yuxia Wang, Haonan Li, Xudong Han, Preslav Nakov, Timothy Baldwin
With the rapid evolution of large language models (LLMs), new and hard-to-predict harmful capabilities are emerging.
no code implementations • 1 Aug 2023 • Haonan Li, Yu Hao, Yizhuo Zhai, Zhiyun Qian
By carefully designing the framework and the prompts, we are able to overcome a number of challenges, including bug-specific modeling, the large problem scope, the non-deterministic nature of LLMs, etc.
1 code implementation • 15 Jun 2023 • Haonan Li, Yixuan Zhang, Fajri Koto, Yifei Yang, Hai Zhao, Yeyun Gong, Nan Duan, Timothy Baldwin
As the capabilities of large language models (LLMs) continue to advance, evaluating their performance becomes increasingly crucial and challenging.
1 code implementation • 24 May 2023 • Haonan Li, Fajri Koto, Minghao Wu, Alham Fikri Aji, Timothy Baldwin
However, research on multilingual instruction tuning has been limited due to the scarcity of high-quality instruction-response datasets across different languages.
1 code implementation • 18 Oct 2022 • Shuai Fan, Chen Lin, Haonan Li, Zhenghao Lin, Jinsong Su, Hang Zhang, Yeyun Gong, Jian Guo, Nan Duan
Most existing pre-trained language representation models (PLMs) are sub-optimal in sentiment analysis tasks, as they capture the sentiment information from word-level while under-considering sentence-level information.
1 code implementation • 10 Nov 2021 • Xiangru Lian, Binhang Yuan, XueFeng Zhu, Yulong Wang, Yongjun He, Honghuan Wu, Lei Sun, Haodong Lyu, Chengjun Liu, Xing Dong, Yiqiao Liao, Mingnan Luo, Congfei Zhang, Jingru Xie, Haonan Li, Lei Chen, Renjie Huang, Jianying Lin, Chengchun Shu, Xuezhong Qiu, Zhishan Liu, Dongying Kong, Lei Yuan, Hai Yu, Sen yang, Ce Zhang, Ji Liu
Specifically, in order to ensure both the training efficiency and the training accuracy, we design a novel hybrid training algorithm, where the embedding layer and the dense neural network are handled by different synchronization mechanisms; then we build a system called Persia (short for parallel recommendation training system with hybrid acceleration) to support this hybrid training algorithm.
no code implementations • 14 Sep 2021 • Haonan Li, Yeyun Gong, Jian Jiao, Ruofei Zhang, Timothy Baldwin, Nan Duan
Pre-trained language models have led to substantial gains over a broad range of natural language processing (NLP) tasks, but have been shown to have limitations for natural language generation tasks with high-quality requirements on the output, such as commonsense generation and ad keyword generation.
1 code implementation • COLING 2020 • Haonan Li, Maria Vasardani, Martin Tomko, Timothy Baldwin
Existing metonymy resolution approaches rely on features extracted from external resources like dictionaries and hand-crafted lexical resources.
no code implementations • SEMEVAL 2019 • Haonan Li, Minghan Wang, Timothy Baldwin, Martin Tomko, Maria Vasardani
This paper describes our submission to SemEval-2019 Task 12 on toponym resolution over scientific articles.