1 code implementation • ICML 2020 • chengyu dong, Liyuan Liu, Zichao Li, Jingbo Shang
Serving as a crucial factor, the depth of residual networks balances model capacity, performance, and training efficiency.
no code implementations • 2 Dec 2024 • Yonghao Dang, Liyuan Liu, Hui Kang, Ping Ye, Jianqin Yin
Moreover, MamKPD achieves state-of-the-art results on the MPII dataset and competitive results on the AP-10K dataset while saving 85% of the parameters compared to ViTPose.
no code implementations • 17 Nov 2024 • Haoran Gao, Xichuan Zhou, Yingcheng Lin, Min Tian, Liyuan Liu, Cong Shi
The prevailing of artificial intelligence-of-things calls for higher energy-efficient edge computing paradigms, such as neuromorphic agents leveraging brain-inspired spiking neural network (SNN) models based on spatiotemporally sparse binary spikes.
1 code implementation • 8 Oct 2024 • Yufan Zhuang, Chandan Singh, Liyuan Liu, Jingbo Shang, Jianfeng Gao
Large language models (LLMs) have shown remarkable in-context learning (ICL) capabilities on textual data.
no code implementations • 4 Oct 2024 • Rongzhi Zhang, Kuang Wang, Liyuan Liu, Shuohang Wang, Hao Cheng, Chao Zhang, Yelong Shen
Existing approaches to mitigate this issue include: (1) efficient attention variants integrated in upcycling stages, which requires extensive parameter tuning thus unsuitable for pre-trained LLMs; (2) KV cache compression at test time, primarily through token eviction policies, which often overlook inter-layer dependencies and can be task-specific.
no code implementations • 18 Sep 2024 • Liyuan Liu, Young Jin Kim, Shuohang Wang, Chen Liang, Yelong Shen, Hao Cheng, Xiaodong Liu, Masahiro Tanaka, Xiaoxia Wu, Wenxiang Hu, Vishrav Chaudhary, Zeqi Lin, Chenruidong Zhang, Jilong Xue, Hany Awadalla, Jianfeng Gao, Weizhu Chen
Mixture-of-Experts (MoE) models scale more effectively than dense models due to sparse computation through expert routing, selectively activating only a small subset of expert modules.
no code implementations • 16 Sep 2024 • Qingru Zhang, Xiaodong Yu, Chandan Singh, Xiaodong Liu, Liyuan Liu, Jianfeng Gao, Tuo Zhao, Dan Roth, Hao Cheng
However, they often struggle to fully comprehend and effectively utilize their input contexts, resulting in responses that are unfaithful or hallucinated.
no code implementations • 22 Apr 2024 • Marah Abdin, Jyoti Aneja, Hany Awadalla, Ahmed Awadallah, Ammar Ahmad Awan, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Martin Cai, Qin Cai, Vishrav Chaudhary, Dong Chen, Dongdong Chen, Weizhu Chen, Yen-Chun Chen, Yi-Ling Chen, Hao Cheng, Parul Chopra, Xiyang Dai, Matthew Dixon, Ronen Eldan, Victor Fragoso, Jianfeng Gao, Mei Gao, Min Gao, Amit Garg, Allie Del Giorno, Abhishek Goswami, Suriya Gunasekar, Emman Haider, Junheng Hao, Russell J. Hewett, Wenxiang Hu, Jamie Huynh, Dan Iter, Sam Ade Jacobs, Mojan Javaheripi, Xin Jin, Nikos Karampatziakis, Piero Kauffmann, Mahoud Khademi, Dongwoo Kim, Young Jin Kim, Lev Kurilenko, James R. Lee, Yin Tat Lee, Yuanzhi Li, Yunsheng Li, Chen Liang, Lars Liden, Xihui Lin, Zeqi Lin, Ce Liu, Liyuan Liu, Mengchen Liu, Weishung Liu, Xiaodong Liu, Chong Luo, Piyush Madan, Ali Mahmoudzadeh, David Majercak, Matt Mazzola, Caio César Teodoro Mendes, Arindam Mitra, Hardik Modi, Anh Nguyen, Brandon Norick, Barun Patra, Daniel Perez-Becker, Thomas Portet, Reid Pryzant, Heyang Qin, Marko Radmilac, Liliang Ren, Gustavo de Rosa, Corby Rosset, Sambudha Roy, Olatunji Ruwase, Olli Saarikivi, Amin Saied, Adil Salim, Michael Santacroce, Shital Shah, Ning Shang, Hiteshi Sharma, Yelong Shen, Swadheen Shukla, Xia Song, Masahiro Tanaka, Andrea Tupini, Praneetha Vaddamanu, Chunyu Wang, Guanhua Wang, Lijuan Wang, Shuohang Wang, Xin Wang, Yu Wang, Rachel Ward, Wen Wen, Philipp Witte, Haiping Wu, Xiaoxia Wu, Michael Wyatt, Bin Xiao, Can Xu, Jiahang Xu, Weijian Xu, Jilong Xue, Sonali Yadav, Fan Yang, Jianwei Yang, Yifan Yang, ZiYi Yang, Donghan Yu, Lu Yuan, Chenruidong Zhang, Cyril Zhang, Jianwen Zhang, Li Lyna Zhang, Yi Zhang, Yue Zhang, Yunan Zhang, Xiren Zhou
We introduce phi-3-mini, a 3. 8 billion parameter language model trained on 3. 3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3. 5 (e. g., phi-3-mini achieves 69% on MMLU and 8. 38 on MT-bench), despite being small enough to be deployed on a phone.
Ranked #5 on
MMR total
on MRR-Benchmark
(using extra training data)
1 code implementation • 22 Apr 2024 • Yonghao Dang, Jianqin Yin, Liyuan Liu, Pengxiang Ding, Yuan Sun, Yanzhu Hu
Multi-person pose estimation (MPPE) presents a formidable yet crucial challenge in computer vision.
1 code implementation • 6 Feb 2024 • Yufan Zhuang, Liyuan Liu, Chandan Singh, Jingbo Shang, Jianfeng Gao
Decision trees are renowned for their ability to achieve high predictive performance while remaining interpretable, especially on tabular data.
1 code implementation • 3 Nov 2023 • Qingru Zhang, Chandan Singh, Liyuan Liu, Xiaodong Liu, Bin Yu, Jianfeng Gao, Tuo Zhao
In human-written articles, we often leverage the subtleties of text style, such as bold and italics, to guide the attention of readers.
no code implementations • 11 Oct 2023 • chengyu dong, Liyuan Liu, Hao Cheng, Jingbo Shang, Jianfeng Gao, Xiaodong Liu
Although ELECTRA offers a significant boost in efficiency, its potential is constrained by the training cost brought by the auxiliary model.
2 code implementations • 3 Oct 2023 • Suyu Ge, Yunan Zhang, Liyuan Liu, Minjia Zhang, Jiawei Han, Jianfeng Gao
In this study, we introduce adaptive KV cache compression, a plug-and-play method that reduces the memory footprint of generative inference for Large Language Models (LLMs).
no code implementations • 1 Oct 2023 • Liyuan Liu, Jianfeng Gao, Weizhu Chen
One defining characteristic of Mixture-of-Expert (MoE) models is their capacity for conducting sparse computation via expert routing, leading to remarkable scalability.
no code implementations • 14 Jun 2022 • chengyu dong, Liyuan Liu, Jingbo Shang
How to conduct teacher training for knowledge distillation is still an open problem.
no code implementations • 15 Feb 2022 • Sha Li, Liyuan Liu, Yiqing Xie, Heng Ji, Jiawei Han
Our framework decomposes event detection into an identification task and a localization task.
no code implementations • 7 Oct 2021 • chengyu dong, Liyuan Liu, Jingbo Shang
We show that label noise exists in adversarial training.
no code implementations • 29 Sep 2021 • Zichao Li, Liyuan Liu, chengyu dong, Jingbo Shang
While this phenomenon is commonly explained as overfitting, we observe that it is a twin process: not only does the model catastrophic overfits to one type of perturbation, but also the perturbation deteriorates into random noise.
1 code implementation • 21 Jun 2021 • Tao Chen, Haochen Shi, Liyuan Liu, Siliang Tang, Jian Shao, Zhigang Chen, Yueting Zhuang
In this paper, we propose collaborative adversarial training to improve the data utilization, which coordinates virtual adversarial training (VAT) and adversarial training (AT) at different levels.
1 code implementation • 17 Jun 2021 • Liyuan Liu, Jialu Liu, Jiawei Han
Multi-head attention plays a crucial role in the recent success of Transformer models, which leads to consistent performance improvements over conventional attention in various applications.
2 code implementations • 28 May 2021 • Xiaotao Gu, Zihan Wang, Zhenyu Bi, Yu Meng, Liyuan Liu, Jiawei Han, Jingbo Shang
Training a conventional neural tagger based on silver labels usually faces the risk of overfitting phrase surface names.
Ranked #1 on
Phrase Tagging
on KPTimes
1 code implementation • 15 Feb 2021 • chengyu dong, Liyuan Liu, Jingbo Shang
Specifically, we first propose a strategy to measure the data quality based on the learning behaviors of the data during adversarial training and find that low-quality data may not be useful and even detrimental to the adversarial robustness.
no code implementations • NAACL 2021 • Xiaotao Gu, Liyuan Liu, Hongkun Yu, Jing Li, Chen Chen, Jiawei Han
Due to the excessive cost of large-scale language model pre-training, considerable efforts have been made to train BERT progressively -- start from an inferior but low-cost model and gradually grow the model to increase the computational complexity.
2 code implementations • 15 Oct 2020 • Zichao Li, Liyuan Liu, chengyu dong, Jingbo Shang
Our goal is to understand why the robustness drops after conducting adversarial training for too long.
4 code implementations • 18 Aug 2020 • Xiaodong Liu, Kevin Duh, Liyuan Liu, Jianfeng Gao
We explore the application of very deep Transformer models for Neural Machine Translation (NMT).
Ranked #1 on
Machine Translation
on WMT2014 English-French
(using extra training data)
no code implementations • 1 May 2020 • Shi Zhi, Liyuan Liu, Yu Zhang, Shiyin Wang, Qi Li, Chao Zhang, Jiawei Han
While typical named entity recognition (NER) models require the training set to be annotated with all target types, each available datasets may only cover a part of them.
2 code implementations • EMNLP 2020 • Liyuan Liu, Xiaodong Liu, Jianfeng Gao, Weizhu Chen, Jiawei Han
Transformers have proved effective in many NLP tasks.
Ranked #5 on
Machine Translation
on WMT2014 English-French
no code implementations • 27 Dec 2019 • Mingxin Zhao, Li Cheng, Xu Yang, Peng Feng, Liyuan Liu, Nanjian Wu
Meanwhile, we propose a joint loss function and a training method.
1 code implementation • ACL 2020 • Ouyu Lan, Xiao Huang, Bill Yuchen Lin, He Jiang, Liyuan Liu, Xiang Ren
Its performance is largely influenced by the annotation quality and quantity in supervised learning scenarios, and obtaining ground truth labels is often costly.
1 code implementation • IJCNLP 2019 • Zihan Wang, Jingbo Shang, Liyuan Liu, Lihao Lu, Jiacheng Liu, Jiawei Han
Therefore, we manually correct these label mistakes and form a cleaner test set.
Ranked #6 on
Named Entity Recognition (NER)
on CoNLL++
(using extra training data)
1 code implementation • ACL 2020 • Yuning Mao, Liyuan Liu, Qi Zhu, Xiang Ren, Jiawei Han
In this paper, we present a facet-aware evaluation setup for better assessment of the information coverage in extracted summaries.
1 code implementation • 14 Aug 2019 • Liyuan Liu, Zihan Wang, Jingbo Shang, Dandong Yin, Heng Ji, Xiang Ren, Shaowen Wang, Jiawei Han
Our model neither requires the conversion from character sequences to word sequences, nor assumes tokenizer can correctly detect all word boundaries.
21 code implementations • ICLR 2020 • Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, Jiawei Han
The learning rate warmup heuristic achieves remarkable success in stabilizing training, accelerating convergence and improving generalization for adaptive stochastic optimization algorithms like RMSprop and Adam.
no code implementations • WS 2019 • Liyuan Liu, Jingbo Shang, Jiawei Han
This paper presents the winning solution to the Arabic Named Entity Recognition challenge run by Topcoder. com.
1 code implementation • ACL 2019 • Ying Lin, Liyuan Liu, Heng Ji, Dong Yu, Jiawei Han
We design a set of word frequency-based reliability signals to indicate the quality of each word embedding.
1 code implementation • IJCNLP 2019 • Qinyuan Ye, Liyuan Liu, Maosen Zhang, Xiang Ren
In this paper, we study the problem what limits the performance of DS-trained neural models, conduct thorough analyses, and identify a factor that can influence the performance greatly, shifted label distribution.
1 code implementation • 27 Dec 2018 • Yujin Yuan, Liyuan Liu, Siliang Tang, Zhongfei Zhang, Yueting Zhuang, ShiLiang Pu, Fei Wu, Xiang Ren
Distant supervision leverages knowledge bases to automatically label instances, thus allowing us to train relation extractor without human annotations.
1 code implementation • EMNLP 2018 • Jingbo Shang, Liyuan Liu, Xiang Ren, Xiaotao Gu, Teng Ren, Jiawei Han
Recent advances in deep neural models allow us to build reliable named entity recognition (NER) systems without handcrafting features.
1 code implementation • EMNLP 2018 • Liyuan Liu, Xiang Ren, Jingbo Shang, Jian Peng, Jiawei Han
Many efforts have been made to facilitate natural language processing tasks with pre-trained language models (LMs), and brought significant improvements to various applications.
Ranked #49 on
Named Entity Recognition (NER)
on CoNLL 2003 (English)
no code implementations • 9 Mar 2018 • Huan Gui, Qi Zhu, Liyuan Liu, Aston Zhang, Jiawei Han
We study the task of expert finding in heterogeneous bibliographical networks based on two aspects: textual content analysis and authority ranking.
1 code implementation • 21 Dec 2017 • Carl Yang, Mengxiong Liu, Zongyi Wang, Liyuan Liu, Jiawei Han
Unlike most existing embedding methods that are task-agnostic, we simultaneously solve for the underlying node representations and the optimal clustering assignments in an end-to-end manner.
Social and Information Networks Physics and Society
3 code implementations • 13 Sep 2017 • Liyuan Liu, Jingbo Shang, Frank F. Xu, Xiang Ren, Huan Gui, Jian Peng, Jiawei Han
In this study, we develop a novel neural framework to extract abundant knowledge hidden in raw texts to empower the sequence labeling task.
Ranked #13 on
Part-Of-Speech Tagging
on Penn Treebank
1 code implementation • EMNLP 2017 • Liyuan Liu, Xiang Ren, Qi Zhu, Shi Zhi, Huan Gui, Heng Ji, Jiawei Han
These annotations, referred as heterogeneous supervision, often conflict with each other, which brings a new challenge to the original relation extraction task: how to infer the true label from noisy labels for a given instance.