Search Results for author: Yufei Huang

Found 56 papers, 31 papers with code

dyAb: Flow Matching for Flexible Antibody Design with AlphaFold-driven Pre-binding Antigen

1 code implementation1 Mar 2025 Cheng Tan, Yijie Zhang, Zhangyang Gao, Yufei Huang, Haitao Lin, Lirong Wu, Fandi Wu, Mathieu Blanchette, Stan. Z. Li

The development of therapeutic antibodies heavily relies on accurate predictions of how antigens will interact with antibodies.

Deep EEG Super-Resolution: Upsampling EEG Spatial Resolution with Generative Adversarial Networks

no code implementations12 Feb 2025 Isaac Corley, Yufei Huang

The proposed SR EEG by GAN is a promising approach to improve the spatial resolution of low density EEG headsets.

EEG Super-Resolution

A Simple yet Effective DDG Predictor is An Unsupervised Antibody Optimizer and Explainer

1 code implementation10 Feb 2025 Lirong Wu, Yunfan Liu, Haitao Lin, Yufei Huang, Guojiang Zhao, Zhifeng Gao, Stan Z. Li

For the target antibody, we propose a novel Mutation Explainer to learn mutation preferences, which accounts for the marginal benefit of each mutation per residue.

Large Language Model Safety: A Holistic Survey

1 code implementation23 Dec 2024 Dan Shi, Tianhao Shen, Yufei Huang, Zhigen Li, Yongqi Leng, Renren Jin, Chuang Liu, Xinwei Wu, Zishan Guo, Linhao Yu, Ling Shi, Bojian Jiang, Deyi Xiong

The rapid development and deployment of large language models (LLMs) have introduced a new frontier in artificial intelligence, marked by unprecedented capabilities in natural language understanding and generation.

Language Modeling Language Modelling +4

Relation-Aware Equivariant Graph Networks for Epitope-Unknown Antibody Design and Specificity Optimization

no code implementations14 Dec 2024 Lirong Wu, Haitao Lin, Yufei Huang, Zhangyang Gao, Cheng Tan, Yunfan Liu, Tailin Wu, Stan Z. Li

Antibodies are Y-shaped proteins that protect the host by binding to specific antigens, and their binding is mainly determined by the Complementary Determining Regions (CDRs) in the antibody.

Relation Specificity

ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer

1 code implementation10 Dec 2024 Jinyi Hu, Shengding Hu, Yuxuan Song, Yufei Huang, Mingxuan Wang, Hao Zhou, Zhiyuan Liu, Wei-Ying Ma, Maosong Sun

The analysis of the trade-off between autoregressive modeling and diffusion demonstrates the potential of ACDiT to be used in long-horizon visual generation tasks.

Denoising Image Generation +1

MeToken: Uniform Micro-environment Token Boosts Post-Translational Modification Prediction

1 code implementation4 Nov 2024 Cheng Tan, Zhenxiao Cao, Zhangyang Gao, Lirong Wu, Siyuan Li, Yufei Huang, Jun Xia, Bozhen Hu, Stan Z. Li

Post-translational modifications (PTMs) profoundly expand the complexity and functionality of the proteome, regulating protein attributes and interactions that are crucial for biological processes.

Configurable Foundation Models: Building LLMs from a Modular Perspective

no code implementations4 Sep 2024 Chaojun Xiao, Zhengyan Zhang, Chenyang Song, Dazhi Jiang, Feng Yao, Xu Han, Xiaozhi Wang, Shuo Wang, Yufei Huang, GuanYu Lin, Yingfa Chen, Weilin Zhao, Yuge Tu, Zexuan Zhong, Ao Zhang, Chenglei Si, Khai Hao Moo, Chenyang Zhao, Huimin Chen, Yankai Lin, Zhiyuan Liu, Jingbo Shang, Maosong Sun

We first formalize modules into emergent bricks - functional neuron partitions that emerge during the pre-training phase, and customized bricks - bricks constructed via additional post-training to improve the capabilities and knowledge of LLMs.

Computational Efficiency Mixture-of-Experts

CMoralEval: A Moral Evaluation Benchmark for Chinese Large Language Models

1 code implementation19 Aug 2024 Linhao Yu, Yongqi Leng, Yufei Huang, Shang Wu, Haixin Liu, Xinmeng Ji, Jiahui Zhao, Jinwang Song, Tingting Cui, Xiaoqing Cheng, Tao Liu, Deyi Xiong

These help us curate CMoralEval that encompasses both explicit moral scenarios (14, 964 instances) and moral dilemma scenarios (15, 424 instances), each with instances from different data sources.

Diversity Language Modeling +3

FastFiD: Improve Inference Efficiency of Open Domain Question Answering via Sentence Selection

1 code implementation12 Aug 2024 Yufei Huang, Xu Han, Maosong Sun

Open Domain Question Answering (ODQA) has been advancing rapidly in recent times, driven by significant developments in dense passage retrieval and pretrained language models.

Answer Generation Decoder +5

Teach Harder, Learn Poorer: Rethinking Hard Sample Distillation for GNN-to-MLP Knowledge Distillation

1 code implementation20 Jul 2024 Lirong Wu, Yunfan Liu, Haitao Lin, Yufei Huang, Stan Z. Li

To bridge the gaps between powerful Graph Neural Networks (GNNs) and lightweight Multi-Layer Perceptron (MLPs), GNN-to-MLP Knowledge Distillation (KD) proposes to distill knowledge from a well-trained teacher GNN into a student MLP.

Knowledge Distillation

CBGBench: Fill in the Blank of Protein-Molecule Complex Binding Graph

1 code implementation16 Jun 2024 Haitao Lin, Guojiang Zhao, Odin Zhang, Yufei Huang, Lirong Wu, Zicheng Liu, Siyuan Li, Cheng Tan, Zhifeng Gao, Stan Z. Li

To broaden the scope, we have adapted these models to a range of tasks essential in drug design, which are considered sub-tasks within the graph fill-in-the-blank tasks.

Drug Design Fairness

GenBench: A Benchmarking Suite for Systematic Evaluation of Genomic Foundation Models

1 code implementation1 Jun 2024 Zicheng Liu, Jiahui Li, Siyuan Li, Zelin Zang, Cheng Tan, Yufei Huang, Yajing Bai, Stan Z. Li

The Genomic Foundation Model (GFM) paradigm is expected to facilitate the extraction of generalizable representations from massive genomic data, thereby enabling their application across a spectrum of downstream applications.

Benchmarking

UniIF: Unified Molecule Inverse Folding

no code implementations29 May 2024 Zhangyang Gao, Jue Wang, Cheng Tan, Lirong Wu, Yufei Huang, Siyuan Li, Zhirui Ye, Stan Z. Li

We do such unification in two levels: 1) Data-Level: We propose a unified block graph data form for all molecules, including the local frame building and geometric feature initialization.

All Drug Discovery +1

Learning to Predict Mutation Effects of Protein-Protein Interactions by Microenvironment-aware Hierarchical Prompt Learning

1 code implementation16 May 2024 Lirong Wu, Yijun Tian, Haitao Lin, Yufei Huang, Siyuan Li, Nitesh V Chawla, Stan Z. Li

Protein-protein bindings play a key role in a variety of fundamental biological processes, and thus predicting the effects of amino acid mutations on protein-protein binding is crucial.

Prompt Learning

VQDNA: Unleashing the Power of Vector Quantization for Multi-Species Genomic Sequence Modeling

no code implementations13 May 2024 Siyuan Li, Zedong Wang, Zicheng Liu, Di wu, Cheng Tan, Jiangbin Zheng, Yufei Huang, Stan Z. Li

In this paper, we introduce VQDNA, a general-purpose framework that renovates genome tokenization from the perspective of genome vocabulary learning.

Quantization

Deep Lead Optimization: Leveraging Generative AI for Structural Modification

no code implementations30 Apr 2024 Odin Zhang, Haitao Lin, HUI ZHANG, Huifeng Zhao, Yufei Huang, Yuansheng Huang, Dejun Jiang, Chang-Yu Hsieh, Peichen Pan, Tingjun Hou

Through this lens, de novo design can incorporate strategies from lead optimization to address the challenge of generating hard-to-synthesize molecules; inversely, lead optimization can benefit from the innovations in de novo design by approaching it as a task of generating molecules conditioned on certain substructures.

Drug Design

OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and Safety

no code implementations18 Mar 2024 Chuang Liu, Linhao Yu, Jiaxuan Li, Renren Jin, Yufei Huang, Ling Shi, Junhui Zhang, Xinmeng Ji, Tingting Cui, Tao Liu, Jinwang Song, Hongying Zan, Sun Li, Deyi Xiong

In addition to these benchmarks, we have implemented a phased public evaluation and benchmark update strategy to ensure that OpenEval is in line with the development of Chinese LLMs or even able to provide cutting-edge benchmark datasets to guide the development of Chinese LLMs.

Benchmarking Mathematical Reasoning

Deep Geometry Handling and Fragment-wise Molecular 3D Graph Generation

no code implementations15 Mar 2024 Odin Zhang, Yufei Huang, Shichen Cheng, Mengyao Yu, Xujun Zhang, Haitao Lin, Yundian Zeng, Mingyang Wang, Zhenxing Wu, Huifeng Zhao, Zaixi Zhang, Chenqing Hua, Yu Kang, Sunliang Cui, Peichen Pan, Chang-Yu Hsieh, Tingjun Hou

Most earlier 3D structure-based molecular generation approaches follow an atom-wise paradigm, incrementally adding atoms to a partially built molecular fragment within protein pockets.

Graph Generation

Decoupling Weighing and Selecting for Integrating Multiple Graph Pre-training Tasks

1 code implementation3 Mar 2024 Tianyu Fan, Lirong Wu, Yufei Huang, Haitao Lin, Cheng Tan, Zhangyang Gao, Stan Z. Li

In this paper, we identify two important collaborative processes for this topic: (1) select: how to select an optimal task combination from a given task pool based on their compatibility, and (2) weigh: how to weigh the selected tasks based on their importance.

Graph Representation Learning

Unified View of Grokking, Double Descent and Emergent Abilities: A Perspective from Circuits Competition

no code implementations23 Feb 2024 Yufei Huang, Shengding Hu, Xu Han, Zhiyuan Liu, Maosong Sun

Recent studies have uncovered intriguing phenomena in deep learning, such as grokking, double descent, and emergent abilities in large language models, which challenge human intuition and are crucial for a deeper understanding of neural models.

Memorization Multi-Task Learning

MAPE-PPI: Towards Effective and Efficient Protein-Protein Interaction Prediction via Microenvironment-Aware Protein Embedding

1 code implementation22 Feb 2024 Lirong Wu, Yijun Tian, Yufei Huang, Siyuan Li, Haitao Lin, Nitesh V Chawla, Stan Z. Li

In addition, microenvironments defined in previous work are largely based on experimentally assayed physicochemical properties, for which the "vocabulary" is usually extremely small.

Computational Efficiency Prediction

Re-Dock: Towards Flexible and Realistic Molecular Docking with Diffusion Bridge

no code implementations18 Feb 2024 Yufei Huang, Odin Zhang, Lirong Wu, Cheng Tan, Haitao Lin, Zhangyang Gao, Siyuan Li, Stan. Z. Li

Accurate prediction of protein-ligand binding structures, a task known as molecular docking is crucial for drug design but remains challenging.

Drug Design Molecular Docking

PSC-CPI: Multi-Scale Protein Sequence-Structure Contrasting for Efficient and Generalizable Compound-Protein Interaction Prediction

1 code implementation13 Feb 2024 Lirong Wu, Yufei Huang, Cheng Tan, Zhangyang Gao, Bozhen Hu, Haitao Lin, Zicheng Liu, Stan Z. Li

Compound-Protein Interaction (CPI) prediction aims to predict the pattern and strength of compound-protein interactions for rational drug discovery.

Drug Discovery

Understanding YTHDF2-mediated mRNA Degradation By m6A-BERT-Deg

1 code implementation15 Jan 2024 Ting-He Zhang, Sumin Jo, Michelle Zhang, Kai Wang, Shou-Jiang Gao, Yufei Huang

N6-methyladenosine (m6A) is the most abundant mRNA modification within mammalian cells, holding pivotal significance in the regulation of mRNA stability, translation, and splicing.

SRNI-CAR: A comprehensive dataset for analyzing the Chinese automotive market

no code implementations19 Dec 2023 Ruixin Ding, Bowei Chen, James M. Wilson, Zhi Yan, Yufei Huang

The automotive industry plays a critical role in the global economy, and particularly important is the expanding Chinese automobile market due to its immense scale and influence.

Leveraging Foundation Models to Improve Lightweight Clients in Federated Learning

no code implementations14 Nov 2023 Xidong Wu, Wan-Yi Lin, Devin Willmott, Filipe Condessa, Yufei Huang, Zhenzhen Li, Madan Ravi Ganesh

Federated Learning (FL) is a distributed training paradigm that enables clients scattered across the world to cooperatively learn a global model without divulging confidential data.

Federated Learning

Evaluating Large Language Models: A Comprehensive Survey

1 code implementation30 Oct 2023 Zishan Guo, Renren Jin, Chuang Liu, Yufei Huang, Dan Shi, Supryadi, Linhao Yu, Yan Liu, Jiaxuan Li, Bojian Xiong, Deyi Xiong

We hope that this comprehensive overview will stimulate further research interests in the evaluation of LLMs, with the ultimate goal of making evaluation serve as a cornerstone in guiding the responsible development of LLMs.

Survey

Protein 3D Graph Structure Learning for Robust Structure-based Protein Property Prediction

no code implementations14 Oct 2023 Yufei Huang, Siyuan Li, Jin Su, Lirong Wu, Odin Zhang, Haitao Lin, Jingqi Qi, Zihan Liu, Zhangyang Gao, Yuyang Liu, Jiangbin Zheng, Stan. ZQ. Li

To study this problem, we identify a Protein 3D Graph Structure Learning Problem for Robust Protein Property Prediction (PGSL-RP3), collect benchmark datasets, and present a protein Structure embedding Alignment Optimization framework (SAO) to mitigate the problem of structure embedding bias between the predicted and experimental protein structures.

Graph structure learning Prediction +3

Large Language Model Alignment: A Survey

no code implementations26 Sep 2023 Tianhao Shen, Renren Jin, Yufei Huang, Chuang Liu, Weilong Dong, Zishan Guo, Xinwei Wu, Yan Liu, Deyi Xiong

We also envision bridging the gap between the AI alignment research community and the researchers engrossed in the capability exploration of LLMs for both capable and safe LLMs.

Language Modeling Language Modelling +3

CBBQ: A Chinese Bias Benchmark Dataset Curated with Human-AI Collaboration for Large Language Models

1 code implementation28 Jun 2023 Yufei Huang, Deyi Xiong

In this work, we present a Chinese Bias Benchmark dataset that consists of over 100K questions jointly constructed by human experts and generative language models, covering stereotypes and societal biases in 14 social dimensions related to Chinese culture and values.

Quantifying the Knowledge in GNNs for Reliable Distillation into MLPs

2 code implementations9 Jun 2023 Lirong Wu, Haitao Lin, Yufei Huang, Stan Z. Li

To bridge the gaps between topology-aware Graph Neural Networks (GNNs) and inference-efficient Multi-Layer Perceptron (MLPs), GLNN proposes to distill knowledge from a well-trained teacher GNN into a student MLP.

Fair Patient Model: Mitigating Bias in the Patient Representation Learned from the Electronic Health Records

no code implementations5 Jun 2023 Sonish Sivarajkumar, Yufei Huang, Yanshan Wang

Methods: We defined a new loss function, called weighted loss function, in the deep representation learning model to balance the importance of different groups of patients and features.

Fairness Representation Learning

Extracting Low-/High- Frequency Knowledge from Graph Neural Networks and Injecting it into MLPs: An Effective GNN-to-MLP Distillation Framework

1 code implementation18 May 2023 Lirong Wu, Haitao Lin, Yufei Huang, Tianyu Fan, Stan Z. Li

Furthermore, we identified a potential information drowning problem for existing GNN-to-MLP distillation, i. e., the high-frequency knowledge of the pre-trained GNNs may be overwhelmed by the low-frequency knowledge during distillation; we have described in detail what it represents, how it arises, what impact it has, and how to deal with it.

Data-Efficient Protein 3D Geometric Pretraining via Refinement of Diffused Protein Structure Decoy

no code implementations5 Feb 2023 Yufei Huang, Lirong Wu, Haitao Lin, Jiangbin Zheng, Ge Wang, Stan Z. Li

Learning meaningful protein representation is important for a variety of biological downstream tasks such as structure-based drug design.

Diversity Drug Design

A Survey on Protein Representation Learning: Retrospect and Prospect

1 code implementation31 Dec 2022 Lirong Wu, Yufei Huang, Haitao Lin, Stan Z. Li

To pave the way for AI researchers with little bioinformatics background, we present a timely and comprehensive review of PRL formulations and existing PRL methods from the perspective of model architectures, pretext tasks, and downstream applications.

Representation Learning Survey

Protein Language Models and Structure Prediction: Connection and Progression

1 code implementation30 Nov 2022 Bozhen Hu, Jun Xia, Jiangbin Zheng, Cheng Tan, Yufei Huang, Yongjie Xu, Stan Z. Li

The prediction of protein structures from sequences is an important task for function prediction, drug design, and related biological processes understanding.

Drug Design Prediction +3

DiffBP: Generative Diffusion of 3D Molecules for Target Protein Binding

1 code implementation21 Nov 2022 Haitao Lin, Yufei Huang, Odin Zhang, Siqi Ma, Meng Liu, Xuanjing Li, Lirong Wu, Jishui Wang, Tingjun Hou, Stan Z. Li

Previous works usually generate atoms in an auto-regressive way, where element types and 3D coordinates of atoms are generated one by one.

Drug Discovery

FPT: Improving Prompt Tuning Efficiency via Progressive Training

1 code implementation13 Nov 2022 Yufei Huang, Yujia Qin, Huadong Wang, Yichun Yin, Maosong Sun, Zhiyuan Liu, Qun Liu

Inspired by these observations, we propose Fast Prompt Tuning (FPT), which starts by conducting PT using a small-scale partial PLM, and then progressively expands its depth and width until the full-model size.

Using Context-to-Vector with Graph Retrofitting to Improve Word Embeddings

1 code implementation ACL 2022 Jiangbin Zheng, Yile Wang, Ge Wang, Jun Xia, Yufei Huang, Guojiang Zhao, Yue Zhang, Stan Z. Li

Although contextualized embeddings generated from large-scale pre-trained models perform well in many tasks, traditional static embeddings (e. g., Skip-gram, Word2Vec) still play an important role in low-resource and lightweight settings due to their low computational cost, ease of deployment, and stability.

Word Embeddings Word Similarity

Driving Safety Prediction and Safe Route Mapping Using In-vehicle and Roadside Data

no code implementations12 Sep 2022 Yufei Huang, Mohsen Jafari, Peter Jin

In this paper, the Safe Route Mapping (SRM) model, a methodology for developing dynamic risk heat maps of roadways, is extended to consider driver behaviors when making predictions.

Toward Deep Learning Based Access Control

no code implementations28 Mar 2022 Mohammad Nur Nobi, Ram Krishnan, Yufei Huang, Mehrnoosh Shakarami, Ravi Sandhu

A common trait of current access control approaches is the challenging need to engineer abstract and intuitive access control models.

Deep Learning

TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference

1 code implementation NAACL 2021 Deming Ye, Yankai Lin, Yufei Huang, Maosong Sun

To address this issue, we propose a dynamic token reduction approach to accelerate PLMs' inference, named TR-BERT, which could flexibly adapt the layer number of each token in inference to avoid redundant calculation.

Token Reduction

Modeling EEG data distribution with a Wasserstein Generative Adversarial Network to predict RSVP Events

1 code implementation11 Nov 2019 Sharaj Panwar, Paul Rad, Tzyy-Ping Jung, Yufei Huang

Electroencephalography (EEG) data are difficult to obtain due to complex experimental setups and reduced comfort with prolonged wearing.

Classification EEG +4

Convolutional neural network models for cancer type prediction based on gene expression

1 code implementation18 Jun 2019 Milad Mostavi, Yu-Chiao Chiu, Yufei Huang, Yidong Chen

In breast cancer, for instance, our model identified well-known markers, such as GATA3 and ESR1.

Type prediction

GSAE: an autoencoder with embedded gene-set nodes for genomics functional characterization

no code implementations21 May 2018 Hung-I Harry Chen, Yu-Chiao Chiu, Tinghe Zhang, Songyao Zhang, Yufei Huang, Yidong Chen

We introduced the concept of the gene superset, an unbiased combination of gene sets with weights trained by the autoencoder, where each node in the latent layer is a superset.

Survival Analysis

Predicting drug response of tumors from integrated genomic profiles by deep neural networks

1 code implementation20 May 2018 Yu-Chiao Chiu, Hung-I Harry Chen, Tinghe Zhang, Songyao Zhang, Aparna Gorthi, Li-Ju Wang, Yufei Huang, Yidong Chen

We trained and tested the model on a dataset of 622 cancer cell lines and achieved an overall prediction performance of mean squared error at 1. 96 (log-scale IC50 values).

Cannot find the paper you are looking for? You can Submit a new open access paper.