Search Results for author: Wenyu Du

Found 11 papers, 8 papers with code

Unlocking Continual Learning Abilities in Language Models

1 code implementation25 Jun 2024 Wenyu Du, Shuang Cheng, Tongxu Luo, Zihan Qiu, Zeyu Huang, Ka Chun Cheung, Reynold Cheng, Jie Fu

To address this limitation, we introduce $\textbf{MIGU}$ ($\textbf{M}$agn$\textbf{I}$tude-based $\textbf{G}$radient $\textbf{U}$pdating for continual learning), a rehearsal-free and task-label-free method that only updates the model parameters with large magnitudes of output in LMs' linear layers.

Continual Learning Inductive Bias

Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training

no code implementations24 May 2024 Wenyu Du, Tongxu Luo, Zihan Qiu, Zeyu Huang, Yikang Shen, Reynold Cheng, Yike Guo, Jie Fu

For example, compared to a conventionally trained 7B model using 300B tokens, our $G_{\text{stack}}$ model converges to the same loss with 194B tokens, resulting in a 54. 6\% speedup.

m2mKD: Module-to-Module Knowledge Distillation for Modular Transformers

1 code implementation26 Feb 2024 Ka Man Lo, Yiming Liang, Wenyu Du, Yuantao Fan, Zili Wang, Wenhao Huang, Lei Ma, Jie Fu

Additionally, the V-MoE-Base model trained with m2mKD achieves 3. 5% higher accuracy than end-to-end training on ImageNet-1k.

Knowledge Distillation

f-Divergence Minimization for Sequence-Level Knowledge Distillation

1 code implementation27 Jul 2023 Yuqiao Wen, Zichao Li, Wenyu Du, Lili Mou

Experiments across four datasets show that our methods outperform existing KD approaches, and that our symmetric distilling losses can better force the student to learn from the teacher distribution.

Knowledge Distillation

Graphix-T5: Mixing Pre-Trained Transformers with Graph-Aware Layers for Text-to-SQL Parsing

1 code implementation18 Jan 2023 Jinyang Li, Binyuan Hui, Reynold Cheng, Bowen Qin, Chenhao Ma, Nan Huo, Fei Huang, Wenyu Du, Luo Si, Yongbin Li

Recently, the pre-trained text-to-text transformer model, namely T5, though not specialized for text-to-SQL parsing, has achieved state-of-the-art performance on standard benchmarks targeting domain generalization.

Domain Generalization Inductive Bias +2

Optimizing Stock Option Forecasting with the Assembly of Machine Learning Models and Improved Trading Strategies

no code implementations29 Nov 2022 Zheng Cao, Raymond Guo, Wenyu Du, Jiayi Gao, Kirill V. Golubnichiy

This paper introduced key aspects of applying Machine Learning (ML) models, improved trading strategies, and the Quasi-Reversibility Method (QRM) to optimize stock option forecasting and trading results.

Decision Making

Application of Convolutional Neural Networks with Quasi-Reversibility Method Results for Option Forecasting

no code implementations25 Aug 2022 Zheng Cao, Wenyu Du, Kirill V. Golubnichiy

Following results from the paper Quasi-Reversibility Method and Neural Network Machine Learning to Solution of Black-Scholes Equations (appeared on the AMS Contemporary Mathematics journal), we create and evaluate new empirical mathematical models for the Black-Scholes equation to analyze data for 92, 846 companies.

End-to-End AMR Coreference Resolution

1 code implementation ACL 2021 Qiankun Fu, Linfeng Song, Wenyu Du, Yue Zhang

Although parsing to Abstract Meaning Representation (AMR) has become very popular and AMR has been shown effective on the many sentence-level downstream tasks, little work has studied how to generate AMRs that can represent multi-sentence information.

Abstract Meaning Representation coreference-resolution +2

Linguistic Dependencies and Statistical Dependence

1 code implementation EMNLP 2021 Jacob Louis Hoover, Alessandro Sordoni, Wenyu Du, Timothy J. O'Donnell

Are pairs of words that tend to occur together also likely to stand in a linguistic dependency?

GPSP: Graph Partition and Space Projection based Approach for Heterogeneous Network Embedding

1 code implementation7 Mar 2018 Wenyu Du, Shuai Yu, Min Yang, Qiang Qu, Jia Zhu

Finally, we concatenate the projective vectors from bipartite subnetworks with the ones learned from homogeneous subnetworks to form the final representation of the heterogeneous network.

Clustering General Classification +2

Cannot find the paper you are looking for? You can Submit a new open access paper.