Search Results for author: Liyuan Liu

Found 43 papers, 26 papers with code

Towards Adaptive Residual Network Training: A Neural-ODE Perspective

1 code implementation ICML 2020 chengyu dong, Liyuan Liu, Zichao Li, Jingbo Shang

Serving as a crucial factor, the depth of residual networks balances model capacity, performance, and training efficiency.

MamKPD: A Simple Mamba Baseline for Real-Time 2D Keypoint Detection

no code implementations2 Dec 2024 Yonghao Dang, Liyuan Liu, Hui Kang, Ping Ye, Jianqin Yin

Moreover, MamKPD achieves state-of-the-art results on the MPII dataset and competitive results on the AP-10K dataset while saving 85% of the parameters compared to ViTPose.

Animal Pose Estimation Keypoint Detection +1

STOP: Spatiotemporal Orthogonal Propagation for Weight-Threshold-Leakage Synergistic Training of Deep Spiking Neural Networks

no code implementations17 Nov 2024 Haoran Gao, Xichuan Zhou, Yingcheng Lin, Min Tian, Liyuan Liu, Cong Shi

The prevailing of artificial intelligence-of-things calls for higher energy-efficient edge computing paradigms, such as neuromorphic agents leveraging brain-inspired spiking neural network (SNN) models based on spatiotemporally sparse binary spikes.

Edge-computing

LoRC: Low-Rank Compression for LLMs KV Cache with a Progressive Compression Strategy

no code implementations4 Oct 2024 Rongzhi Zhang, Kuang Wang, Liyuan Liu, Shuohang Wang, Hao Cheng, Chao Zhang, Yelong Shen

Existing approaches to mitigate this issue include: (1) efficient attention variants integrated in upcycling stages, which requires extensive parameter tuning thus unsuitable for pre-trained LLMs; (2) KV cache compression at test time, primarily through token eviction policies, which often overlook inter-layer dependencies and can be task-specific.

Low-rank compression

GRIN: GRadient-INformed MoE

no code implementations18 Sep 2024 Liyuan Liu, Young Jin Kim, Shuohang Wang, Chen Liang, Yelong Shen, Hao Cheng, Xiaodong Liu, Masahiro Tanaka, Xiaoxia Wu, Wenxiang Hu, Vishrav Chaudhary, Zeqi Lin, Chenruidong Zhang, Jilong Xue, Hany Awadalla, Jianfeng Gao, Weizhu Chen

Mixture-of-Experts (MoE) models scale more effectively than dense models due to sparse computation through expert routing, selectively activating only a small subset of expert modules.

HellaSwag HumanEval +5

Model Tells Itself Where to Attend: Faithfulness Meets Automatic Attention Steering

no code implementations16 Sep 2024 Qingru Zhang, Xiaodong Yu, Chandan Singh, Xiaodong Liu, Liyuan Liu, Jianfeng Gao, Tuo Zhao, Dan Roth, Hao Cheng

However, they often struggle to fully comprehend and effectively utilize their input contexts, resulting in responses that are unfaithful or hallucinated.

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

no code implementations22 Apr 2024 Marah Abdin, Jyoti Aneja, Hany Awadalla, Ahmed Awadallah, Ammar Ahmad Awan, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Martin Cai, Qin Cai, Vishrav Chaudhary, Dong Chen, Dongdong Chen, Weizhu Chen, Yen-Chun Chen, Yi-Ling Chen, Hao Cheng, Parul Chopra, Xiyang Dai, Matthew Dixon, Ronen Eldan, Victor Fragoso, Jianfeng Gao, Mei Gao, Min Gao, Amit Garg, Allie Del Giorno, Abhishek Goswami, Suriya Gunasekar, Emman Haider, Junheng Hao, Russell J. Hewett, Wenxiang Hu, Jamie Huynh, Dan Iter, Sam Ade Jacobs, Mojan Javaheripi, Xin Jin, Nikos Karampatziakis, Piero Kauffmann, Mahoud Khademi, Dongwoo Kim, Young Jin Kim, Lev Kurilenko, James R. Lee, Yin Tat Lee, Yuanzhi Li, Yunsheng Li, Chen Liang, Lars Liden, Xihui Lin, Zeqi Lin, Ce Liu, Liyuan Liu, Mengchen Liu, Weishung Liu, Xiaodong Liu, Chong Luo, Piyush Madan, Ali Mahmoudzadeh, David Majercak, Matt Mazzola, Caio César Teodoro Mendes, Arindam Mitra, Hardik Modi, Anh Nguyen, Brandon Norick, Barun Patra, Daniel Perez-Becker, Thomas Portet, Reid Pryzant, Heyang Qin, Marko Radmilac, Liliang Ren, Gustavo de Rosa, Corby Rosset, Sambudha Roy, Olatunji Ruwase, Olli Saarikivi, Amin Saied, Adil Salim, Michael Santacroce, Shital Shah, Ning Shang, Hiteshi Sharma, Yelong Shen, Swadheen Shukla, Xia Song, Masahiro Tanaka, Andrea Tupini, Praneetha Vaddamanu, Chunyu Wang, Guanhua Wang, Lijuan Wang, Shuohang Wang, Xin Wang, Yu Wang, Rachel Ward, Wen Wen, Philipp Witte, Haiping Wu, Xiaoxia Wu, Michael Wyatt, Bin Xiao, Can Xu, Jiahang Xu, Weijian Xu, Jilong Xue, Sonali Yadav, Fan Yang, Jianwei Yang, Yifan Yang, ZiYi Yang, Donghan Yu, Lu Yuan, Chenruidong Zhang, Cyril Zhang, Jianwen Zhang, Li Lyna Zhang, Yi Zhang, Yue Zhang, Yunan Zhang, Xiren Zhou

We introduce phi-3-mini, a 3. 8 billion parameter language model trained on 3. 3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3. 5 (e. g., phi-3-mini achieves 69% on MMLU and 8. 38 on MT-bench), despite being small enough to be deployed on a phone.

Ranked #5 on MMR total on MRR-Benchmark (using extra training data)

Language Modeling Language Modelling +3

Learning a Decision Tree Algorithm with Transformers

1 code implementation6 Feb 2024 Yufan Zhuang, Liyuan Liu, Chandan Singh, Jingbo Shang, Jianfeng Gao

Decision trees are renowned for their ability to achieve high predictive performance while remaining interpretable, especially on tabular data.

Meta-Learning

Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs

1 code implementation3 Nov 2023 Qingru Zhang, Chandan Singh, Liyuan Liu, Xiaodong Liu, Bin Yu, Jianfeng Gao, Tuo Zhao

In human-written articles, we often leverage the subtleties of text style, such as bold and italics, to guide the attention of readers.

Fast-ELECTRA for Efficient Pre-training

no code implementations11 Oct 2023 chengyu dong, Liyuan Liu, Hao Cheng, Jingbo Shang, Jianfeng Gao, Xiaodong Liu

Although ELECTRA offers a significant boost in efficiency, its potential is constrained by the training cost brought by the auxiliary model.

Language Modeling Language Modelling

Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs

2 code implementations3 Oct 2023 Suyu Ge, Yunan Zhang, Liyuan Liu, Minjia Zhang, Jiawei Han, Jianfeng Gao

In this study, we introduce adaptive KV cache compression, a plug-and-play method that reduces the memory footprint of generative inference for Large Language Models (LLMs).

Sparse Backpropagation for MoE Training

no code implementations1 Oct 2023 Liyuan Liu, Jianfeng Gao, Weizhu Chen

One defining characteristic of Mixture-of-Expert (MoE) models is their capacity for conducting sparse computation via expert routing, leading to remarkable scalability.

Machine Translation

Perturbation Deterioration: The Other Side of Catastrophic Overfitting

no code implementations29 Sep 2021 Zichao Li, Liyuan Liu, chengyu dong, Jingbo Shang

While this phenomenon is commonly explained as overfitting, we observe that it is a twin process: not only does the model catastrophic overfits to one type of perturbation, but also the perturbation deteriorates into random noise.

Empower Distantly Supervised Relation Extraction with Collaborative Adversarial Training

1 code implementation21 Jun 2021 Tao Chen, Haochen Shi, Liyuan Liu, Siliang Tang, Jian Shao, Zhigang Chen, Yueting Zhuang

In this paper, we propose collaborative adversarial training to improve the data utilization, which coordinates virtual adversarial training (VAT) and adversarial training (AT) at different levels.

Relation Relation Extraction

Multi-head or Single-head? An Empirical Comparison for Transformer Training

1 code implementation17 Jun 2021 Liyuan Liu, Jialu Liu, Jiawei Han

Multi-head attention plays a crucial role in the recent success of Transformer models, which leads to consistent performance improvements over conventional attention in various applications.

UCPhrase: Unsupervised Context-aware Quality Phrase Tagging

2 code implementations28 May 2021 Xiaotao Gu, Zihan Wang, Zhenyu Bi, Yu Meng, Liyuan Liu, Jiawei Han, Jingbo Shang

Training a conventional neural tagger based on silver labels usually faces the risk of overfitting phrase surface names.

Keyphrase Extraction Language Modelling +3

Data Quality Matters For Adversarial Training: An Empirical Study

1 code implementation15 Feb 2021 chengyu dong, Liyuan Liu, Jingbo Shang

Specifically, we first propose a strategy to measure the data quality based on the learning behaviors of the data during adversarial training and find that low-quality data may not be useful and even detrimental to the adversarial robustness.

Adversarial Robustness

On the Transformer Growth for Progressive BERT Training

no code implementations NAACL 2021 Xiaotao Gu, Liyuan Liu, Hongkun Yu, Jing Li, Chen Chen, Jiawei Han

Due to the excessive cost of large-scale language model pre-training, considerable efforts have been made to train BERT progressively -- start from an inferior but low-cost model and gradually grow the model to increase the computational complexity.

Language Modeling Language Modelling

Overfitting or Underfitting? Understand Robustness Drop in Adversarial Training

2 code implementations15 Oct 2020 Zichao Li, Liyuan Liu, chengyu dong, Jingbo Shang

Our goal is to understand why the robustness drops after conducting adversarial training for too long.

Very Deep Transformers for Neural Machine Translation

4 code implementations18 Aug 2020 Xiaodong Liu, Kevin Duh, Liyuan Liu, Jianfeng Gao

We explore the application of very deep Transformer models for Neural Machine Translation (NMT).

 Ranked #1 on Machine Translation on WMT2014 English-French (using extra training data)

Decoder Machine Translation +2

Partially-Typed NER Datasets Integration: Connecting Practice to Theory

no code implementations1 May 2020 Shi Zhi, Liyuan Liu, Yu Zhang, Shiyin Wang, Qi Li, Chao Zhang, Jiawei Han

While typical named entity recognition (NER) models require the training set to be annotated with all target types, each available datasets may only cover a part of them.

named-entity-recognition Named Entity Recognition +1

Learning to Contextually Aggregate Multi-Source Supervision for Sequence Labeling

1 code implementation ACL 2020 Ouyu Lan, Xiao Huang, Bill Yuchen Lin, He Jiang, Liyuan Liu, Xiang Ren

Its performance is largely influenced by the annotation quality and quantity in supervised learning scenarios, and obtaining ground truth labels is often costly.

Facet-Aware Evaluation for Extractive Summarization

1 code implementation ACL 2020 Yuning Mao, Liyuan Liu, Qi Zhu, Xiang Ren, Jiawei Han

In this paper, we present a facet-aware evaluation setup for better assessment of the information coverage in extracted summaries.

Extractive Summarization Sentence +1

Raw-to-End Name Entity Recognition in Social Media

1 code implementation14 Aug 2019 Liyuan Liu, Zihan Wang, Jingbo Shang, Dandong Yin, Heng Ji, Xiang Ren, Shaowen Wang, Jiawei Han

Our model neither requires the conversion from character sequences to word sequences, nor assumes tokenizer can correctly detect all word boundaries.

named-entity-recognition Named Entity Recognition +1

On the Variance of the Adaptive Learning Rate and Beyond

21 code implementations ICLR 2020 Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, Jiawei Han

The learning rate warmup heuristic achieves remarkable success in stabilizing training, accelerating convergence and improving generalization for adaptive stochastic optimization algorithms like RMSprop and Adam.

Image Classification Language Modeling +4

Arabic Named Entity Recognition: What Works and What's Next

no code implementations WS 2019 Liyuan Liu, Jingbo Shang, Jiawei Han

This paper presents the winning solution to the Arabic Named Entity Recognition challenge run by Topcoder. com.

Ensemble Learning Feature Engineering +4

Looking Beyond Label Noise: Shifted Label Distribution Matters in Distantly Supervised Relation Extraction

1 code implementation IJCNLP 2019 Qinyuan Ye, Liyuan Liu, Maosen Zhang, Xiang Ren

In this paper, we study the problem what limits the performance of DS-trained neural models, conduct thorough analyses, and identify a factor that can influence the performance greatly, shifted label distribution.

Relation Relation Extraction

Cross-relation Cross-bag Attention for Distantly-supervised Relation Extraction

1 code implementation27 Dec 2018 Yujin Yuan, Liyuan Liu, Siliang Tang, Zhongfei Zhang, Yueting Zhuang, ShiLiang Pu, Fei Wu, Xiang Ren

Distant supervision leverages knowledge bases to automatically label instances, thus allowing us to train relation extractor without human annotations.

Relation Relation Extraction +1

Learning Named Entity Tagger using Domain-Specific Dictionary

1 code implementation EMNLP 2018 Jingbo Shang, Liyuan Liu, Xiang Ren, Xiaotao Gu, Teng Ren, Jiawei Han

Recent advances in deep neural models allow us to build reliable named entity recognition (NER) systems without handcrafting features.

named-entity-recognition Named Entity Recognition +1

Efficient Contextualized Representation: Language Model Pruning for Sequence Labeling

1 code implementation EMNLP 2018 Liyuan Liu, Xiang Ren, Jingbo Shang, Jian Peng, Jiawei Han

Many efforts have been made to facilitate natural language processing tasks with pre-trained language models (LMs), and brought significant improvements to various applications.

Language Modeling Language Modelling +1

Expert Finding in Heterogeneous Bibliographic Networks with Locally-trained Embeddings

no code implementations9 Mar 2018 Huan Gui, Qi Zhu, Liyuan Liu, Aston Zhang, Jiawei Han

We study the task of expert finding in heterogeneous bibliographical networks based on two aspects: textual content analysis and authority ranking.

Graph Clustering with Dynamic Embedding

1 code implementation21 Dec 2017 Carl Yang, Mengxiong Liu, Zongyi Wang, Liyuan Liu, Jiawei Han

Unlike most existing embedding methods that are task-agnostic, we simultaneously solve for the underlying node representations and the optimal clustering assignments in an end-to-end manner.

Social and Information Networks Physics and Society

Empower Sequence Labeling with Task-Aware Neural Language Model

3 code implementations13 Sep 2017 Liyuan Liu, Jingbo Shang, Frank F. Xu, Xiang Ren, Huan Gui, Jian Peng, Jiawei Han

In this study, we develop a novel neural framework to extract abundant knowledge hidden in raw texts to empower the sequence labeling task.

Language Modeling Language Modelling +6

Heterogeneous Supervision for Relation Extraction: A Representation Learning Approach

1 code implementation EMNLP 2017 Liyuan Liu, Xiang Ren, Qi Zhu, Shi Zhi, Huan Gui, Heng Ji, Jiawei Han

These annotations, referred as heterogeneous supervision, often conflict with each other, which brings a new challenge to the original relation extraction task: how to infer the true label from noisy labels for a given instance.

Relation Relation Extraction +1

Cannot find the paper you are looking for? You can Submit a new open access paper.