Search Results for author: Yuhan Chen

Found 19 papers, 9 papers with code

Interpreting Key Mechanisms of Factual Recall in Transformer-Based Language Models

1 code implementation28 Mar 2024 Ang Lv, Kaiyi Zhang, Yuhan Chen, Yulong Wang, Lifeng Liu, Ji-Rong Wen, Jian Xie, Rui Yan

In this paper, we deeply explore the mechanisms employed by Transformer-based language models in factual recall tasks.

AS-ES Learning: Towards Efficient CoT Learning in Small Models

no code implementations4 Mar 2024 Nuwa Xi, Yuhan Chen, Sendong Zhao, Haochun Wang, Bing Qin, Ting Liu

Chain-of-Thought (CoT) serves as a critical emerging ability in LLMs, especially when it comes to logical reasoning.

Data Augmentation Logical Reasoning

PirateNets: Physics-informed Deep Learning with Residual Adaptive Networks

1 code implementation1 Feb 2024 Sifan Wang, Bowen Li, Yuhan Chen, Paris Perdikaris

While physics-informed neural networks (PINNs) have become a popular deep learning framework for tackling forward and inverse problems governed by partial differential equations (PDEs), their performance is known to degrade when larger and deeper neural network architectures are employed.

LCV2: An Efficient Pretraining-Free Framework for Grounded Visual Question Answering

no code implementations29 Jan 2024 Yuhan Chen, Lumei Su, Lihua Chen, Zhiwei Lin

Experimental implementations were conducted under constrained computational and memory resources, evaluating the proposed method's performance on benchmark datasets including GQA, CLEVR, and VizWiz-VQA-Grounding.

Language Modelling Large Language Model +5

Batch-ICL: Effective, Efficient, and Order-Agnostic In-Context Learning

1 code implementation12 Jan 2024 Kaiyi Zhang, Ang Lv, Yuhan Chen, Hansen Ha, Tao Xu, Rui Yan

In this paper, by treating in-context learning (ICL) as a meta-optimization process, we explain why LLMs are sensitive to the order of ICL examples.

In-Context Learning Zero-Shot Learning

Analyzing the Inherent Response Tendency of LLMs: Real-World Instructions-Driven Jailbreak

no code implementations7 Dec 2023 Yanrui Du, Sendong Zhao, Ming Ma, Yuhan Chen, Bing Qin

The jailbreak idea of our method is "Inherent Response Tendency Analysis" which identifies real-world instructions that can inherently induce LLMs to generate affirmation responses and the corresponding jailbreak strategy is "Real-World Instructions-Driven Jailbreak" which involves strategically splicing real-world instructions identified through the above analysis around the malicious instruction.

Fortify the Shortest Stave in Attention: Enhancing Context Awareness of Large Language Models for Effective Tool Use

1 code implementation7 Dec 2023 Yuhan Chen, Ang Lv, Ting-En Lin, Changyu Chen, Yuchuan Wu, Fei Huang, Yongbin Li, Rui Yan

Specifically, the crucial information in the context will be potentially overlooked by model when it is positioned in the trough zone of the attention waveform, leading to decreased performance.

Trajectory Planning

Are We Falling in a Middle-Intelligence Trap? An Analysis and Mitigation of the Reversal Curse

1 code implementation13 Nov 2023 Ang Lv, Kaiyi Zhang, Shufang Xie, Quan Tu, Yuhan Chen, Ji-Rong Wen, Rui Yan

Recent studies have highlighted a phenomenon in large language models (LLMs) known as "the reversal curse," in which the order of knowledge entities in the training data biases the models' comprehension.

Denoising Language Modelling

Make Your Decision Convincing! A Unified Two-Stage Framework: Self-Attribution and Decision-Making

no code implementations20 Oct 2023 Yanrui Du, Sendong Zhao, Haochun Wang, Yuhan Chen, Rui Bai, Zewen Qiang, MuZhen Cai, Bing Qin

Through extensive experiments on five reasoning datasets from the ERASER benchmark, we demonstrate that our framework not only establishes a more reliable link between the generated rationale and model decision but also achieves competitive results in task performance and the quality of rationale.

Decision Making

From Artificially Real to Real: Leveraging Pseudo Data from Large Language Models for Low-Resource Molecule Discovery

1 code implementation11 Sep 2023 Yuhan Chen, Nuwa Xi, Yanrui Du, Haochun Wang, Jianyu Chen, Sendong Zhao, Bing Qin

Furthermore, our method shows a sustained improvement as the volume of pseudo data increases, revealing the great potential of pseudo data in advancing low-resource cross-modal molecule discovery.

Descriptive Domain Adaptation +2

Knowledge-tuning Large Language Models with Structured Medical Knowledge Bases for Reliable Response Generation in Chinese

1 code implementation8 Sep 2023 Haochun Wang, Sendong Zhao, Zewen Qiang, Zijian Li, Nuwa Xi, Yanrui Du, MuZhen Cai, Haoqiang Guo, Yuhan Chen, Haoming Xu, Bing Qin, Ting Liu

To address this challenge, we propose knowledge-tuning, which leverages structured medical knowledge bases for the LLMs to grasp domain knowledge efficiently and facilitate reliable response generation.

Domain Adaptation Hallucination +2

DialoGPS: Dialogue Path Sampling in Continuous Semantic Space for Data Augmentation in Multi-Turn Conversations

no code implementations29 Jun 2023 Ang Lv, Jinpeng Li, Yuhan Chen, Xing Gao, Ji Zhang, Rui Yan

In open-domain dialogue generation tasks, contexts and responses in most datasets are one-to-one mapped, violating an important many-to-many characteristic: a context leads to various responses, and a response answers multiple contexts.

Data Augmentation Dialogue Generation +2

LSGNN: Towards General Graph Neural Network in Node Classification by Local Similarity

1 code implementation7 May 2023 Yuhan Chen, Yihong Luo, Jing Tang, Liang Yang, Siya Qiu, Chuan Wang, Xiaochun Cao

Motivated by it, we propose to use the local similarity (LocalSim) to learn node-level weighted fusion, which can also serve as a plug-and-play module.

Node Classification

Neural Symplectic Form: Learning Hamiltonian Equations on General Coordinate Systems

no code implementations NeurIPS 2021 Yuhan Chen, Takashi Matsubara, Takaharu Yaguchi

In this study, we propose a model that learns the symplectic form from data using neural networks, thereby providing a method for learning Hamiltonian equations from data represented in general coordinate systems, which are not limited to the generalized coordinates and the generalized momenta.

KAM Theory Meets Statistical Learning Theory: Hamiltonian Neural Networks with Non-Zero Training Loss

no code implementations22 Feb 2021 Yuhan Chen, Takashi Matsubara, Takaharu Yaguchi

To apply the KAM theory, we provide a generalization error bound for Hamiltonian neural networks by deriving an estimate of the covering number of the gradient of the multi-layer perceptron, which is the key ingredient of the model.

Learning Theory

Bayesian graphical compositional regression for microbiome data

2 code implementations13 Dec 2017 Jialiang Mao, Yuhan Chen, Li Ma

An important task in microbiome studies is to test the existence of and give characterization to differences in the microbiome composition across groups of samples.

Methodology Applications Computation

Cannot find the paper you are looking for? You can Submit a new open access paper.