no code implementations • WMT (EMNLP) 2020 • Lucia Specia, Zhenhao Li, Juan Pino, Vishrav Chaudhary, Francisco Guzmán, Graham Neubig, Nadir Durrani, Yonatan Belinkov, Philipp Koehn, Hassan Sajjad, Paul Michel, Xian Li
We report the findings of the second edition of the shared task on improving robustness in Machine Translation (MT).
1 code implementation • 28 Oct 2024 • Yilun Jin, Zheng Li, Chenwei Zhang, Tianyu Cao, Yifan Gao, Pratik Jayarao, Mao Li, Xin Liu, Ritesh Sarkhel, Xianfeng Tang, Haodong Wang, Zhengyang Wang, Wenju Xu, Jingfeng Yang, Qingyu Yin, Xian Li, Priyanka Nigam, Yi Xu, Kai Chen, Qiang Yang, Meng Jiang, Bing Yin
Shopping MMLU consists of 57 tasks covering 4 major shopping skills: concept understanding, knowledge reasoning, user behavior alignment, and multi-linguality, and can thus comprehensively evaluate the abilities of LLMs as general shop assistants.
no code implementations • 21 Oct 2024 • Da Ju, Song Jiang, Andrew Cohen, Aaron Foss, Sasha Mitts, Arman Zharmagambetov, Brandon Amos, Xian Li, Justine T Kao, Maryam Fazel-Zarandi, Yuandong Tian
In this paper, we propose To the Globe (TTG), a real-time demo system that takes natural language requests from users, translates it to symbolic form via a fine-tuned Large Language Model, and produces optimal travel itineraries with Mixed Integer Linear Programming solvers.
2 code implementations • 21 Oct 2024 • Kamal Al-Sabahi, Kang Yang, Wangwang Liu, Guanyu Jiang, Xian Li, Ming Yang
For this reason, the attention has been shifted to the non-autoregressive or sequence tagging models.
no code implementations • 10 Sep 2024 • Kuan Wang, Alexander Bukharin, Haoming Jiang, Qingyu Yin, Zhengyang Wang, Tuo Zhao, Jingbo Shang, Chao Zhang, Bing Yin, Xian Li, Jianshu Chen, Shiyang Li
However, existing models trained on open-source IFT datasets only have the ability to follow instructions from users, and often fail to follow complex role and rules specified by developers, a. k. a.
no code implementations • 8 Aug 2024 • Thao Nguyen, Jeffrey Li, Sewoong Oh, Ludwig Schmidt, Jason Weston, Luke Zettlemoyer, Xian Li
We propose a new method, instruction back-and-forth translation, to construct high-quality synthetic data grounded in world knowledge for aligning large language models (LLMs).
no code implementations • 5 Aug 2024 • Tianlu Wang, Ilia Kulikov, Olga Golovneva, Ping Yu, Weizhe Yuan, Jane Dwivedi-Yu, Richard Yuanzhe Pang, Maryam Fazel-Zarandi, Jason Weston, Xian Li
Model-based evaluation is at the heart of successful model development -- as a reward model for training, and as a replacement for human evaluation.
no code implementations • 31 Jul 2024 • Kewei Cheng, Jingfeng Yang, Haoming Jiang, Zhengyang Wang, Binxuan Huang, Ruirui Li, Shiyang Li, Zheng Li, Yifan Gao, Xian Li, Bing Yin, Yizhou Sun
To investigate into the true inductive reasoning capabilities of LLMs, we propose a novel framework, SolverLearner.
1 code implementation • 12 Mar 2024 • Sainbayar Sukhbaatar, Olga Golovneva, Vasu Sharma, Hu Xu, Xi Victoria Lin, Baptiste Rozière, Jacob Kahn, Daniel Li, Wen-tau Yih, Jason Weston, Xian Li
We investigate efficient methods for training Large Language Models (LLMs) to possess capabilities in multiple specialized domains, such as coding, math reasoning and world knowledge.
Ranked #38 on Common Sense Reasoning on WinoGrande
no code implementations • 21 Feb 2024 • Rui Zhou, Xian Li, Ying Fang, Xiaofei Li
In this work, we propose Mel-FullSubNet, a single-channel Mel-spectrogram denoising and dereverberation network for improving both speech quality and automatic speech recognition (ASR) performance.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
1 code implementation • 7 Feb 2024 • Yu Wang, Yifan Gao, Xiusi Chen, Haoming Jiang, Shiyang Li, Jingfeng Yang, Qingyu Yin, Zheng Li, Xian Li, Bing Yin, Jingbo Shang, Julian McAuley
We aim to build models containing a considerable portion of self-updatable parameters, enabling the model to integrate new knowledge effectively and efficiently.
3 code implementations • 18 Jan 2024 • Weizhe Yuan, Richard Yuanzhe Pang, Kyunghyun Cho, Xian Li, Sainbayar Sukhbaatar, Jing Xu, Jason Weston
We posit that to achieve superhuman agents, future models require superhuman feedback in order to provide an adequate training signal.
no code implementations • 23 Oct 2023 • Swarnadeep Saha, Omer Levy, Asli Celikyilmaz, Mohit Bansal, Jason Weston, Xian Li
Large Language Models (LLMs) are frequently used for multi-faceted language generation and evaluation tasks that involve satisfying intricate user constraints or taking into account multiple aspects and criteria.
no code implementations • 23 Oct 2023 • Xian Li, Hongguang Shi, Yunfei Wang, Yeqin Zhang, Xubin Li, Cam-Tu Nguyen
Specifically, the recommendation predicts the long-term recommendation target based on the conversational context and the user history.
1 code implementation • 20 Sep 2023 • Shehzaad Dhuliawala, Mojtaba Komeili, Jing Xu, Roberta Raileanu, Xian Li, Asli Celikyilmaz, Jason Weston
Generation of plausible yet incorrect factual information, termed hallucination, is an unsolved issue in large language models.
1 code implementation • 15 Sep 2023 • Nian Shao, Xian Li, Xiaofei Li
In this work, we study the fine-tuning method of the pretrained models for SED.
Ranked #1 on Sound Event Detection on DESED (using extra training data)
2 code implementations • 11 Aug 2023 • Xian Li, Ping Yu, Chunting Zhou, Timo Schick, Omer Levy, Luke Zettlemoyer, Jason Weston, Mike Lewis
We present a scalable method to build a high quality instruction following language model by automatically labelling human-written text with corresponding instructions.
no code implementations • 4 Jul 2023 • Zijie Huang, Daheng Wang, Binxuan Huang, Chenwei Zhang, Jingbo Shang, Yan Liang, Zhengyang Wang, Xian Li, Christos Faloutsos, Yizhou Sun, Wei Wang
We propose Concept2Box, a novel approach that jointly embeds the two views of a KG using dual geometric representations.
2 code implementations • 7 Jun 2023 • Xian Li, Nian Shao, Xiaofei Li
In order to tackle both clip-level and frame-level tasks, this paper proposes Audio Teacher-Student Transformer (ATST), with a clip-level version (named ATST-Clip) and a frame-level version (named ATST-Frame), responsible for learning clip-level and frame-level representations, respectively.
Ranked #10 on Audio Classification on AudioSet
no code implementations • 1 Jun 2023 • Hejie Cui, Rongmei Lin, Nasser Zalmout, Chenwei Zhang, Jingbo Shang, Carl Yang, Xian Li
Information extraction, e. g., attribute value extraction, has been extensively studied and formulated based only on text.
1 code implementation • 26 May 2023 • Liyan Xu, Chenwei Zhang, Xian Li, Jingbo Shang, Jinho D. Choi
We present a new task setting for attribute mining on e-commerce products, serving as a practical solution to extract open-world attributes without extensive human intervention.
no code implementations • 23 May 2023 • Zeyu Leo Liu, Tim Dettmers, Xi Victoria Lin, Veselin Stoyanov, Xian Li
Large and sparse feed-forward layers (S-FFN) such as Mixture-of-Experts (MoE) have proven effective in scaling up Transformers model size for \textit{pretraining} large language models.
no code implementations • 9 May 2023 • Imanol Schlag, Sainbayar Sukhbaatar, Asli Celikyilmaz, Wen-tau Yih, Jason Weston, Jürgen Schmidhuber, Xian Li
In recent years, large pre-trained language models (LLMs) have demonstrated the ability to follow instructions and perform novel tasks from a few examples.
1 code implementation • 22 Dec 2022 • Srinivasan Iyer, Xi Victoria Lin, Ramakanth Pasunuru, Todor Mihaylov, Daniel Simig, Ping Yu, Kurt Shuster, Tianlu Wang, Qing Liu, Punit Singh Koura, Xian Li, Brian O'Horo, Gabriel Pereyra, Jeff Wang, Christopher Dewan, Asli Celikyilmaz, Luke Zettlemoyer, Ves Stoyanov
To this end, we create OPT-IML Bench: a large benchmark for Instruction Meta-Learning (IML) of 2000 NLP tasks consolidated into task categories from 8 existing benchmarks, and prepare an evaluation framework to measure three types of model generalizations: to tasks from fully held-out categories, to held-out tasks from seen categories, and to held-out instances from seen tasks.
Ranked #26 on Natural Language Inference on RTE
no code implementations • 14 Sep 2022 • LiAn Yang, Ming Xiao, Xian Li, Ya-lan Wang
Mutations in STK11/LKB1 and NF1 genes have been found in ADC and SQC and are often associated with drug resistance and poor prognosis, but STK11/NF1 co-mutation has not been reported and more cases are needed to reveal the association.
no code implementations • EACL 2021 • Xiang Kong, Adithya Renduchintala, James Cross, Yuqing Tang, Jiatao Gu, Xian Li
Recent work in multilingual translation advances translation quality surpassing bilingual baselines using deep transformer models with increased capacity.
no code implementations • 25 May 2022 • Badr AlKhamissi, Faisal Ladhak, Srini Iyer, Ves Stoyanov, Zornitsa Kozareva, Xian Li, Pascale Fung, Lambert Mathias, Asli Celikyilmaz, Mona Diab
Hate speech detection is complex; it relies on commonsense reasoning, knowledge of stereotypes, and an understanding of social nuance that differs from one culture to the next.
Cultural Vocal Bursts Intensity Prediction Few-Shot Learning +1
6 code implementations • IEEE Transactions on Image Processing 2021 • Yaoyao Zhong, Weihong Deng, Jiani Hu, Dongyue Zhao, Xian Li, Dongchao Wen
Deep face recognition has achieved great success due to large-scale training databases and rapidly developing loss functions.
Ranked #2 on Face Verification on CALFW
no code implementations • NAACL 2022 • Jonas Pfeiffer, Naman Goyal, Xi Victoria Lin, Xian Li, James Cross, Sebastian Riedel, Mikel Artetxe
Multilingual pre-trained models are known to suffer from the curse of multilinguality, which causes per-language performance to drop as they cover more languages.
10 code implementations • 2 May 2022 • Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, Todor Mihaylov, Myle Ott, Sam Shleifer, Kurt Shuster, Daniel Simig, Punit Singh Koura, Anjali Sridhar, Tianlu Wang, Luke Zettlemoyer
Large language models, which are often trained for hundreds of thousands of compute days, have shown remarkable capabilities for zero- and few-shot learning.
Ranked #2 on Stereotypical Bias Analysis on CrowS-Pairs
1 code implementation • 29 Apr 2022 • Xinyang Zhang, Chenwei Zhang, Xian Li, Xin Luna Dong, Jingbo Shang, Christos Faloutsos, Jiawei Han
Most prior works on this matter mine new values for a set of known attributes but cannot handle new attributes that arose from constantly changing data.
4 code implementations • 26 Apr 2022 • Xian Li, Xiaofei Li
Self-supervised learning (SSL) learns knowledge from a large amount of unlabeled data, and then transfers the knowledge to a specific problem with a limited number of labeled data.
Ranked #2 on Spoken Command Recognition on Speech Command v2
no code implementations • ACL 2022 • Yun Tang, Hongyu Gong, Ning Dong, Changhan Wang, Wei-Ning Hsu, Jiatao Gu, Alexei Baevski, Xian Li, Abdelrahman Mohamed, Michael Auli, Juan Pino
Two pre-training configurations for speech translation and recognition, respectively, are presented to alleviate subtask interference.
no code implementations • 14 Mar 2022 • Ping Yu, Mikel Artetxe, Myle Ott, Sam Shleifer, Hongyu Gong, Ves Stoyanov, Xian Li
All-MLP architectures have attracted increasing interest as an alternative to attention-based models.
Ranked #17 on Question Answering on StoryCloze
2 code implementations • 20 Dec 2021 • Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li
Large-scale generative language models such as GPT-3 are competitive few-shot learners.
no code implementations • 20 Dec 2021 • Mikel Artetxe, Shruti Bhosale, Naman Goyal, Todor Mihaylov, Myle Ott, Sam Shleifer, Xi Victoria Lin, Jingfei Du, Srinivasan Iyer, Ramakanth Pasunuru, Giri Anantharaman, Xian Li, Shuohui Chen, Halil Akin, Mandeep Baines, Louis Martin, Xing Zhou, Punit Singh Koura, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Mona Diab, Zornitsa Kozareva, Ves Stoyanov
This paper presents a detailed empirical study of how autoregressive MoE language models scale in comparison with dense models in a wide range of settings: in- and out-of-domain language modeling, zero- and few-shot priming, and full-shot fine-tuning.
1 code implementation • 26 Nov 2021 • Peter Hase, Mona Diab, Asli Celikyilmaz, Xian Li, Zornitsa Kozareva, Veselin Stoyanov, Mohit Bansal, Srinivasan Iyer
In this paper, we discuss approaches to detecting when models have beliefs about the world, and we improve on methods for updating model beliefs to be more truthful, with a focus on methods based on learned optimizers or hypernetworks.
1 code implementation • EMNLP 2021 • Chunting Zhou, Daniel Levy, Xian Li, Marjan Ghazvininejad, Graham Neubig
Multilingual neural machine translation (MNMT) learns to translate multiple language pairs with a single model, potentially improving both the accuracy and the memory-efficiency of deployed models.
no code implementations • ACL 2021 • Xian Li, Changhan Wang, Yun Tang, Chau Tran, Yuqing Tang, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli
We present a simple yet effective approach to build multilingual speech-to-text (ST) translation through efficient transfer learning from a pretrained speech encoder and text decoder.
no code implementations • ACL (IWSLT) 2021 • Yun Tang, Hongyu Gong, Xian Li, Changhan Wang, Juan Pino, Holger Schwenk, Naman Goyal
In this paper, we describe our end-to-end multilingual speech translation system submitted to the IWSLT 2021 evaluation campaign on the Multilingual Speech Translation shared task.
no code implementations • ACL 2021 • Yun Tang, Juan Pino, Xian Li, Changhan Wang, Dmitriy Genzel
Pretraining and multitask learning are widely used to improve the speech to text translation performance.
no code implementations • 30 Jun 2021 • Xian Li
Procedural fairness has been a public concern, which leads to controversy when making decisions with respect to protected classes, such as race, social status, and disability.
no code implementations • NeurIPS 2021 • Hongyu Gong, Yun Tang, Juan Pino, Xian Li
We further propose attention sharing strategies to facilitate parameter sharing and specialization in multilingual and multi-domain sequence modeling.
no code implementations • ACL 2021 • Adithya Renduchintala, Denise Diaz, Kenneth Heafield, Xian Li, Mona Diab
Is bias amplified when neural machine translation (NMT) models are optimized for speed and evaluated on generic test sets using BLEU?
no code implementations • 15 Apr 2021 • Hongyu Gong, Xian Li, Dmitriy Genzel
Based on these insights, we propose an adaptive and sparse architecture for multilingual modeling, and train the model to learn shared and language-specific parameters to improve the positive transfer and mitigate the interference.
no code implementations • NeurIPS 2021 • Xian Li, Hongyu Gong
We show that common training method which upsamples low resources can not robustly optimize population loss with risks of either underfitting high resource languages or overfitting low resource ones.
no code implementations • ICCV 2021 • Yaobin Zhang, Weihong Deng, Yaoyao Zhong, Jiani Hu, Xian Li, Dongyue Zhao, Dongchao Wen
The training of a deep face recognition system usually faces the interference of label noise in the training data.
1 code implementation • ACL 2021 • Danni Liu, Jan Niehues, James Cross, Francisco Guzmán, Xian Li
The difficulty of generalizing to new translation directions suggests the model representations are highly specific to those language pairs seen in training.
1 code implementation • 29 Dec 2020 • Yilun Zhou, Adithya Renduchintala, Xian Li, Sida Wang, Yashar Mehdad, Asish Ghoshal
Active learning (AL) algorithms may achieve better performance with fewer data because the model guides the data selection process.
no code implementations • 24 Oct 2020 • Xian Li, Changhan Wang, Yun Tang, Chau Tran, Yuqing Tang, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli
We present a simple yet effective approach to build multilingual speech-to-text (ST) translation by efficient transfer learning from pretrained speech encoder and text decoder.
no code implementations • 23 Oct 2020 • Xiaogang Zhu, Shigang Liu, Xian Li, Sheng Wen, Jun Zhang, Camtepe Seyit, Yang Xiang
Fuzzing is one of the most effective technique to identify potential software vulnerabilities.