Search Results for author: Yufei Huang

Found 38 papers, 18 papers with code

OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and Safety

no code implementations18 Mar 2024 Chuang Liu, Linhao Yu, Jiaxuan Li, Renren Jin, Yufei Huang, Ling Shi, Junhui Zhang, Xinmeng Ji, Tingting Cui, Tao Liu, Jinwang Song, Hongying Zan, Sun Li, Deyi Xiong

In addition to these benchmarks, we have implemented a phased public evaluation and benchmark update strategy to ensure that OpenEval is in line with the development of Chinese LLMs or even able to provide cutting-edge benchmark datasets to guide the development of Chinese LLMs.

Benchmarking Mathematical Reasoning

Decoupling Weighing and Selecting for Integrating Multiple Graph Pre-training Tasks

1 code implementation3 Mar 2024 Tianyu Fan, Lirong Wu, Yufei Huang, Haitao Lin, Cheng Tan, Zhangyang Gao, Stan Z. Li

In this paper, we identify two important collaborative processes for this topic: (1) select: how to select an optimal task combination from a given task pool based on their compatibility, and (2) weigh: how to weigh the selected tasks based on their importance.

Graph Representation Learning

Unified View of Grokking, Double Descent and Emergent Abilities: A Perspective from Circuits Competition

no code implementations23 Feb 2024 Yufei Huang, Shengding Hu, Xu Han, Zhiyuan Liu, Maosong Sun

Recent studies have uncovered intriguing phenomena in deep learning, such as grokking, double descent, and emergent abilities in large language models, which challenge human intuition and are crucial for a deeper understanding of neural models.

Memorization Multi-Task Learning

MAPE-PPI: Towards Effective and Efficient Protein-Protein Interaction Prediction via Microenvironment-Aware Protein Embedding

1 code implementation22 Feb 2024 Lirong Wu, Yijun Tian, Yufei Huang, Siyuan Li, Haitao Lin, Nitesh V Chawla, Stan Z. Li

In addition, microenvironments defined in previous work are largely based on experimentally assayed physicochemical properties, for which the "vocabulary" is usually extremely small.

Computational Efficiency

Re-Dock: Towards Flexible and Realistic Molecular Docking with Diffusion Bridge

no code implementations18 Feb 2024 Yufei Huang, Odin Zhang, Lirong Wu, Cheng Tan, Haitao Lin, Zhangyang Gao, Siyuan Li, Stan. Z. Li

Accurate prediction of protein-ligand binding structures, a task known as molecular docking is crucial for drug design but remains challenging.

Molecular Docking

PSC-CPI: Multi-Scale Protein Sequence-Structure Contrasting for Efficient and Generalizable Compound-Protein Interaction Prediction

1 code implementation13 Feb 2024 Lirong Wu, Yufei Huang, Cheng Tan, Zhangyang Gao, Bozhen Hu, Haitao Lin, Zicheng Liu, Stan Z. Li

Compound-Protein Interaction (CPI) prediction aims to predict the pattern and strength of compound-protein interactions for rational drug discovery.

Drug Discovery

Understanding YTHDF2-mediated mRNA Degradation By m6A-BERT-Deg

1 code implementation15 Jan 2024 Ting-He Zhang, Sumin Jo, Michelle Zhang, Kai Wang, Shou-Jiang Gao, Yufei Huang

N6-methyladenosine (m6A) is the most abundant mRNA modification within mammalian cells, holding pivotal significance in the regulation of mRNA stability, translation, and splicing.

SRNI-CAR: A comprehensive dataset for analyzing the Chinese automotive market

no code implementations19 Dec 2023 Ruixin Ding, Bowei Chen, James M. Wilson, Zhi Yan, Yufei Huang

The automotive industry plays a critical role in the global economy, and particularly important is the expanding Chinese automobile market due to its immense scale and influence.

Leveraging Foundation Models to Improve Lightweight Clients in Federated Learning

no code implementations14 Nov 2023 Xidong Wu, Wan-Yi Lin, Devin Willmott, Filipe Condessa, Yufei Huang, Zhenzhen Li, Madan Ravi Ganesh

Federated Learning (FL) is a distributed training paradigm that enables clients scattered across the world to cooperatively learn a global model without divulging confidential data.

Federated Learning

Evaluating Large Language Models: A Comprehensive Survey

1 code implementation30 Oct 2023 Zishan Guo, Renren Jin, Chuang Liu, Yufei Huang, Dan Shi, Supryadi, Linhao Yu, Yan Liu, Jiaxuan Li, Bojian Xiong, Deyi Xiong

We hope that this comprehensive overview will stimulate further research interests in the evaluation of LLMs, with the ultimate goal of making evaluation serve as a cornerstone in guiding the responsible development of LLMs.

Protein 3D Graph Structure Learning for Robust Structure-based Protein Property Prediction

no code implementations14 Oct 2023 Yufei Huang, Siyuan Li, Jin Su, Lirong Wu, Odin Zhang, Haitao Lin, Jingqi Qi, Zihan Liu, Zhangyang Gao, Yuyang Liu, Jiangbin Zheng, Stan. ZQ. Li

To study this problem, we identify a Protein 3D Graph Structure Learning Problem for Robust Protein Property Prediction (PGSL-RP3), collect benchmark datasets, and present a protein Structure embedding Alignment Optimization framework (SAO) to mitigate the problem of structure embedding bias between the predicted and experimental protein structures.

Graph structure learning Property Prediction +2

Large Language Model Alignment: A Survey

no code implementations26 Sep 2023 Tianhao Shen, Renren Jin, Yufei Huang, Chuang Liu, Weilong Dong, Zishan Guo, Xinwei Wu, Yan Liu, Deyi Xiong

We also envision bridging the gap between the AI alignment research community and the researchers engrossed in the capability exploration of LLMs for both capable and safe LLMs.

Language Modelling Large Language Model

CBBQ: A Chinese Bias Benchmark Dataset Curated with Human-AI Collaboration for Large Language Models

1 code implementation28 Jun 2023 Yufei Huang, Deyi Xiong

In this work, we present a Chinese Bias Benchmark dataset that consists of over 100K questions jointly constructed by human experts and generative language models, covering stereotypes and societal biases in 14 social dimensions related to Chinese culture and values.

Quantifying the Knowledge in GNNs for Reliable Distillation into MLPs

1 code implementation9 Jun 2023 Lirong Wu, Haitao Lin, Yufei Huang, Stan Z. Li

To bridge the gaps between topology-aware Graph Neural Networks (GNNs) and inference-efficient Multi-Layer Perceptron (MLPs), GLNN proposes to distill knowledge from a well-trained teacher GNN into a student MLP.

Fair Patient Model: Mitigating Bias in the Patient Representation Learned from the Electronic Health Records

no code implementations5 Jun 2023 Sonish Sivarajkumar, Yufei Huang, Yanshan Wang

Methods: We defined a new loss function, called weighted loss function, in the deep representation learning model to balance the importance of different groups of patients and features.

Fairness Representation Learning

Extracting Low-/High- Frequency Knowledge from Graph Neural Networks and Injecting it into MLPs: An Effective GNN-to-MLP Distillation Framework

1 code implementation18 May 2023 Lirong Wu, Haitao Lin, Yufei Huang, Tianyu Fan, Stan Z. Li

Furthermore, we identified a potential information drowning problem for existing GNN-to-MLP distillation, i. e., the high-frequency knowledge of the pre-trained GNNs may be overwhelmed by the low-frequency knowledge during distillation; we have described in detail what it represents, how it arises, what impact it has, and how to deal with it.

Lightweight Contrastive Protein Structure-Sequence Transformation

no code implementations19 Mar 2023 Jiangbin Zheng, Ge Wang, Yufei Huang, Bozhen Hu, Siyuan Li, Cheng Tan, Xinwen Fan, Stan Z. Li

In this work, we introduce a novel unsupervised protein structure representation pretraining with a robust protein language model.

Masked Language Modeling Protein Design +1

Data-Efficient Protein 3D Geometric Pretraining via Refinement of Diffused Protein Structure Decoy

no code implementations5 Feb 2023 Yufei Huang, Lirong Wu, Haitao Lin, Jiangbin Zheng, Ge Wang, Stan Z. Li

Learning meaningful protein representation is important for a variety of biological downstream tasks such as structure-based drug design.

A Survey on Protein Representation Learning: Retrospect and Prospect

1 code implementation31 Dec 2022 Lirong Wu, Yufei Huang, Haitao Lin, Stan Z. Li

To pave the way for AI researchers with little bioinformatics background, we present a timely and comprehensive review of PRL formulations and existing PRL methods from the perspective of model architectures, pretext tasks, and downstream applications.

Representation Learning

Protein Language Models and Structure Prediction: Connection and Progression

1 code implementation30 Nov 2022 Bozhen Hu, Jun Xia, Jiangbin Zheng, Cheng Tan, Yufei Huang, Yongjie Xu, Stan Z. Li

The prediction of protein structures from sequences is an important task for function prediction, drug design, and related biological processes understanding.

Protein Folding Protein Language Model +1

DiffBP: Generative Diffusion of 3D Molecules for Target Protein Binding

no code implementations21 Nov 2022 Haitao Lin, Yufei Huang, Meng Liu, Xuanjing Li, Shuiwang Ji, Stan Z. Li

Previous works usually generate atoms in an auto-regressive way, where element types and 3D coordinates of atoms are generated one by one.

Drug Discovery

FPT: Improving Prompt Tuning Efficiency via Progressive Training

1 code implementation13 Nov 2022 Yufei Huang, Yujia Qin, Huadong Wang, Yichun Yin, Maosong Sun, Zhiyuan Liu, Qun Liu

Inspired by these observations, we propose Fast Prompt Tuning (FPT), which starts by conducting PT using a small-scale partial PLM, and then progressively expands its depth and width until the full-model size.

Using Context-to-Vector with Graph Retrofitting to Improve Word Embeddings

1 code implementation ACL 2022 Jiangbin Zheng, Yile Wang, Ge Wang, Jun Xia, Yufei Huang, Guojiang Zhao, Yue Zhang, Stan Z. Li

Although contextualized embeddings generated from large-scale pre-trained models perform well in many tasks, traditional static embeddings (e. g., Skip-gram, Word2Vec) still play an important role in low-resource and lightweight settings due to their low computational cost, ease of deployment, and stability.

Word Embeddings

Driving Safety Prediction and Safe Route Mapping Using In-vehicle and Roadside Data

no code implementations12 Sep 2022 Yufei Huang, Mohsen Jafari, Peter Jin

In this paper, the Safe Route Mapping (SRM) model, a methodology for developing dynamic risk heat maps of roadways, is extended to consider driver behaviors when making predictions.

Toward Deep Learning Based Access Control

no code implementations28 Mar 2022 Mohammad Nur Nobi, Ram Krishnan, Yufei Huang, Mehrnoosh Shakarami, Ravi Sandhu

A common trait of current access control approaches is the challenging need to engineer abstract and intuitive access control models.

Deep learning tackles single-cell analysis A survey of deep learning for scRNA-seq analysis

no code implementations25 Sep 2021 Mario Flores, Zhentao Liu, Ting-He Zhang, Md Musaddaqui Hasib, Yu-Chiao Chiu, Zhenqing Ye, Karla Paniagua, Sumin Jo, Jianqiu Zhang, Shou-Jiang Gao, Yu-Fang Jin, Yidong Chen, Yufei Huang

Here we present a processing pipeline of single-cell RNA-seq data, survey a total of 25 DL algorithms and their applicability for a specific step in the processing pipeline.

Generative Adversarial Network

TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference

1 code implementation NAACL 2021 Deming Ye, Yankai Lin, Yufei Huang, Maosong Sun

To address this issue, we propose a dynamic token reduction approach to accelerate PLMs' inference, named TR-BERT, which could flexibly adapt the layer number of each token in inference to avoid redundant calculation.

Modeling EEG data distribution with a Wasserstein Generative Adversarial Network to predict RSVP Events

1 code implementation11 Nov 2019 Sharaj Panwar, Paul Rad, Tzyy-Ping Jung, Yufei Huang

Electroencephalography (EEG) data are difficult to obtain due to complex experimental setups and reduced comfort with prolonged wearing.

Classification EEG +5

Convolutional neural network models for cancer type prediction based on gene expression

1 code implementation18 Jun 2019 Milad Mostavi, Yu-Chiao Chiu, Yufei Huang, Yidong Chen

In breast cancer, for instance, our model identified well-known markers, such as GATA3 and ESR1.

Type prediction

GSAE: an autoencoder with embedded gene-set nodes for genomics functional characterization

no code implementations21 May 2018 Hung-I Harry Chen, Yu-Chiao Chiu, Tinghe Zhang, Songyao Zhang, Yufei Huang, Yidong Chen

We introduced the concept of the gene superset, an unbiased combination of gene sets with weights trained by the autoencoder, where each node in the latent layer is a superset.

Survival Analysis

Predicting drug response of tumors from integrated genomic profiles by deep neural networks

1 code implementation20 May 2018 Yu-Chiao Chiu, Hung-I Harry Chen, Tinghe Zhang, Songyao Zhang, Aparna Gorthi, Li-Ju Wang, Yufei Huang, Yidong Chen

We trained and tested the model on a dataset of 622 cancer cell lines and achieved an overall prediction performance of mean squared error at 1. 96 (log-scale IC50 values).

Cannot find the paper you are looking for? You can Submit a new open access paper.