1 code implementation • 25 Jan 2024 • Yanda Chen, Chandan Singh, Xiaodong Liu, Simiao Zuo, Bin Yu, He He, Jianfeng Gao
We propose explanation-consistency finetuning (EC-finetuning), a method that adapts LLMs to generate more consistent natural-language explanations on related examples.
1 code implementation • 25 Oct 2023 • Zichong Li, Yanbo Xu, Simiao Zuo, Haoming Jiang, Chao Zhang, Tuo Zhao, Hongyuan Zha
We conduct extensive experiments in both event type prediction and uncertainty quantification of arrival time.
no code implementations • 20 Oct 2023 • Xinyu Hu, Pengfei Tang, Simiao Zuo, Zihan Wang, Bowen Song, Qiang Lou, Jian Jiao, Denis Charles
In Evoke, there are two instances of a same LLM: one as a reviewer (LLM-Reviewer), it scores the current prompt; the other as an author (LLM-Author), it edits the prompt by considering the edit history and the reviewer's feedback.
no code implementations • 30 Jun 2023 • Simiao Zuo, Pengfei Tang, Xinyu Hu, Qiang Lou, Jian Jiao, Denis Charles
For model-free enhancement, we collect unlabeled web queries to augment domain knowledge; and we collect web search results to enrich the information of ads queries.
1 code implementation • 5 Jun 2023 • Alexander Bukharin, Tianyi Liu, Shengjie Wang, Simiao Zuo, Weihao Gao, Wen Yan, Tuo Zhao
To address this issue, we propose a multi-stage computational framework -- ASTEROID, which lowers the data cost of MLFFs by leveraging a combination of cheap inaccurate data and expensive accurate data.
1 code implementation • 15 Dec 2022 • Simiao Zuo, Xiaodong Liu, Jian Jiao, Denis Charles, Eren Manavoglu, Tuo Zhao, Jianfeng Gao
Specifically, we augment a SSM into the bottom layer of SPADE, and we employ efficient local attention methods for the other layers.
1 code implementation • 4 Oct 2022 • Chen Liang, Simiao Zuo, Qingru Zhang, Pengcheng He, Weizhu Chen, Tuo Zhao
As such, TED reduces the knowledge gap between the two models and helps the student to fit better on the target task.
no code implementations • 15 Sep 2022 • Simiao Zuo, Haoming Jiang, Qingyu Yin, Xianfeng Tang, Bing Yin, Tuo Zhao
Specifically, we train a generator to recover identities of the masked edges, and simultaneously, we train a discriminator to distinguish the generated edges from the original graph's edges.
no code implementations • 15 Sep 2022 • Simiao Zuo, Tianyi Liu, Tuo Zhao, Hongyuan Zha
Point process models are of great importance in real world applications.
no code implementations • 15 Sep 2022 • Simiao Zuo, Qingyu Yin, Haoming Jiang, Shaohui Xi, Bing Yin, Chao Zhang, Tuo Zhao
The model subsequently calculates session representations by combining the contextual information with the instant search query using an aggregation network.
1 code implementation • 25 Jun 2022 • Qingru Zhang, Simiao Zuo, Chen Liang, Alexander Bukharin, Pengcheng He, Weizhu Chen, Tuo Zhao
Large Transformer-based models have exhibited superior performance in various natural language processing and computer vision tasks.
1 code implementation • NAACL 2022 • Simiao Zuo, Qingru Zhang, Chen Liang, Pengcheng He, Tuo Zhao, Weizhu Chen
We propose MoEBERT, which uses a Mixture-of-Experts structure to increase model capacity and inference speed.
1 code implementation • ICLR 2022 • Chen Liang, Haoming Jiang, Simiao Zuo, Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen, Tuo Zhao
Analysis shows that the proposed schedule indeed reduces the redundancy and improves generalization performance.
1 code implementation • ICLR 2022 • Simiao Zuo, Xiaodong Liu, Jian Jiao, Young Jin Kim, Hany Hassan, Ruofei Zhang, Tuo Zhao, Jianfeng Gao
While most on-going research focuses on improving SAMs models by exploring methods of routing inputs to experts, our analysis reveals that such research might not lead to the solution we expect, i. e., the commonly-used routing methods based on gating mechanisms do not work better than randomly routing inputs to experts.
no code implementations • 16 Sep 2021 • Zhigen Zhao, Simiao Zuo, Tuo Zhao, Ye Zhao
Recent advancement in combining trajectory optimization with function approximation (especially neural networks) shows promise in learning complex control policies for diverse tasks in robot systems.
1 code implementation • Findings (EMNLP) 2021 • Simiao Zuo, Chen Liang, Haoming Jiang, Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen, Tuo Zhao
Adversarial regularization can improve model generalization in many natural language processing tasks.
no code implementations • Findings (NAACL) 2022 • Simiao Zuo, Yue Yu, Chen Liang, Haoming Jiang, Siawpeng Er, Chao Zhang, Tuo Zhao, Hongyuan Zha
In self-training, the student contributes to the prediction performance, and the teacher controls the training process by generating pseudo-labels.
1 code implementation • ACL 2021 • Chen Liang, Simiao Zuo, Minshuo Chen, Haoming Jiang, Xiaodong Liu, Pengcheng He, Tuo Zhao, Weizhu Chen
The Lottery Ticket Hypothesis suggests that an over-parametrized network consists of ``lottery tickets'', and training a certain collection of them (i. e., a subnetwork) can match the performance of the full model.
1 code implementation • EMNLP 2021 • Simiao Zuo, Chen Liang, Haoming Jiang, Xiaodong Liu, Pengcheng He, Jianfeng Gao, Weizhu Chen, Tuo Zhao
Adversarial regularization has been shown to improve the generalization performance of deep learning models in various natural language processing tasks.
no code implementations • ICLR 2021 • Yujia Xie, Yixiu Mao, Simiao Zuo, Hongteng Xu, Xiaojing Ye, Tuo Zhao, Hongyuan Zha
Due to the combinatorial nature of the problem, most existing methods are only applicable when the sample size is small, and limited to linear regression models.
1 code implementation • NAACL 2021 • Yue Yu, Simiao Zuo, Haoming Jiang, Wendi Ren, Tuo Zhao, Chao Zhang
To address this problem, we develop a contrastive self-training framework, COSINE, to enable fine-tuning LMs with weak supervision.
Ranked #1 on Word Sense Disambiguation on Words in Context
3 code implementations • ICML 2020 • Simiao Zuo, Haoming Jiang, Zichong Li, Tuo Zhao, Hongyuan Zha
Modern data acquisition routinely produce massive amounts of event sequence data in various domains, such as social media, healthcare, and financial markets.
no code implementations • ICLR 2019 • Simiao Zuo, Jialin Wu
There has long been debates on how we could interpret neural networks and understand the decisions our models make.