no code implementations • 18 Mar 2024 • Chuang Liu, Linhao Yu, Jiaxuan Li, Renren Jin, Yufei Huang, Ling Shi, Junhui Zhang, Xinmeng Ji, Tingting Cui, Tao Liu, Jinwang Song, Hongying Zan, Sun Li, Deyi Xiong
In addition to these benchmarks, we have implemented a phased public evaluation and benchmark update strategy to ensure that OpenEval is in line with the development of Chinese LLMs or even able to provide cutting-edge benchmark datasets to guide the development of Chinese LLMs.
1 code implementation • 3 Mar 2024 • Tianyu Fan, Lirong Wu, Yufei Huang, Haitao Lin, Cheng Tan, Zhangyang Gao, Stan Z. Li
In this paper, we identify two important collaborative processes for this topic: (1) select: how to select an optimal task combination from a given task pool based on their compatibility, and (2) weigh: how to weigh the selected tasks based on their importance.
no code implementations • 1 Mar 2024 • Rui Sun, Lirong Wu, Haitao Lin, Yufei Huang, Stan Z. Li
Augmentation is an effective alternative to utilize the small amount of labeled protein data.
no code implementations • 23 Feb 2024 • Yufei Huang, Shengding Hu, Xu Han, Zhiyuan Liu, Maosong Sun
Recent studies have uncovered intriguing phenomena in deep learning, such as grokking, double descent, and emergent abilities in large language models, which challenge human intuition and are crucial for a deeper understanding of neural models.
1 code implementation • 22 Feb 2024 • Lirong Wu, Yijun Tian, Yufei Huang, Siyuan Li, Haitao Lin, Nitesh V Chawla, Stan Z. Li
In addition, microenvironments defined in previous work are largely based on experimentally assayed physicochemical properties, for which the "vocabulary" is usually extremely small.
no code implementations • 18 Feb 2024 • Yufei Huang, Odin Zhang, Lirong Wu, Cheng Tan, Haitao Lin, Zhangyang Gao, Siyuan Li, Stan. Z. Li
Accurate prediction of protein-ligand binding structures, a task known as molecular docking is crucial for drug design but remains challenging.
1 code implementation • 13 Feb 2024 • Lirong Wu, Yufei Huang, Cheng Tan, Zhangyang Gao, Bozhen Hu, Haitao Lin, Zicheng Liu, Stan Z. Li
Compound-Protein Interaction (CPI) prediction aims to predict the pattern and strength of compound-protein interactions for rational drug discovery.
no code implementations • 4 Feb 2024 • Zhangyang Gao, Cheng Tan, Jue Wang, Yufei Huang, Lirong Wu, Stan Z. Li
Is there a foreign language describing protein sequences and structures simultaneously?
1 code implementation • 15 Jan 2024 • Ting-He Zhang, Sumin Jo, Michelle Zhang, Kai Wang, Shou-Jiang Gao, Yufei Huang
N6-methyladenosine (m6A) is the most abundant mRNA modification within mammalian cells, holding pivotal significance in the regulation of mRNA stability, translation, and splicing.
no code implementations • 19 Dec 2023 • Ruixin Ding, Bowei Chen, James M. Wilson, Zhi Yan, Yufei Huang
The automotive industry plays a critical role in the global economy, and particularly important is the expanding Chinese automobile market due to its immense scale and influence.
1 code implementation • 11 Dec 2023 • Jiangbin Zheng, Siyuan Li, Yufei Huang, Zhangyang Gao, Cheng Tan, Bozhen Hu, Jun Xia, Ge Wang, Stan Z. Li
Protein design involves generating protein sequences based on their corresponding protein backbones.
no code implementations • 14 Nov 2023 • Xidong Wu, Wan-Yi Lin, Devin Willmott, Filipe Condessa, Yufei Huang, Zhenzhen Li, Madan Ravi Ganesh
Federated Learning (FL) is a distributed training paradigm that enables clients scattered across the world to cooperatively learn a global model without divulging confidential data.
1 code implementation • 30 Oct 2023 • Zishan Guo, Renren Jin, Chuang Liu, Yufei Huang, Dan Shi, Supryadi, Linhao Yu, Yan Liu, Jiaxuan Li, Bojian Xiong, Deyi Xiong
We hope that this comprehensive overview will stimulate further research interests in the evaluation of LLMs, with the ultimate goal of making evaluation serve as a cornerstone in guiding the responsible development of LLMs.
no code implementations • 14 Oct 2023 • Yufei Huang, Siyuan Li, Jin Su, Lirong Wu, Odin Zhang, Haitao Lin, Jingqi Qi, Zihan Liu, Zhangyang Gao, Yuyang Liu, Jiangbin Zheng, Stan. ZQ. Li
To study this problem, we identify a Protein 3D Graph Structure Learning Problem for Robust Protein Property Prediction (PGSL-RP3), collect benchmark datasets, and present a protein Structure embedding Alignment Optimization framework (SAO) to mitigate the problem of structure embedding bias between the predicted and experimental protein structures.
no code implementations • 26 Sep 2023 • Tianhao Shen, Renren Jin, Yufei Huang, Chuang Liu, Weilong Dong, Zishan Guo, Xinwei Wu, Yan Liu, Deyi Xiong
We also envision bridging the gap between the AI alignment research community and the researchers engrossed in the capability exploration of LLMs for both capable and safe LLMs.
1 code implementation • 28 Jun 2023 • Yufei Huang, Deyi Xiong
In this work, we present a Chinese Bias Benchmark dataset that consists of over 100K questions jointly constructed by human experts and generative language models, covering stereotypes and societal biases in 14 social dimensions related to Chinese culture and values.
1 code implementation • 9 Jun 2023 • Lirong Wu, Haitao Lin, Yufei Huang, Stan Z. Li
To bridge the gaps between topology-aware Graph Neural Networks (GNNs) and inference-efficient Multi-Layer Perceptron (MLPs), GLNN proposes to distill knowledge from a well-trained teacher GNN into a student MLP.
no code implementations • 5 Jun 2023 • Sonish Sivarajkumar, Yufei Huang, Yanshan Wang
Methods: We defined a new loss function, called weighted loss function, in the deep representation learning model to balance the importance of different groups of patients and features.
no code implementations • NeurIPS 2023 • Haitao Lin, Yufei Huang, Odin Zhang, Lirong Wu, Siyuan Li, ZhiYuan Chen, Stan Z. Li
In this way, however, it is hard to generate realistic fragments with complicated structures.
1 code implementation • 18 May 2023 • Lirong Wu, Haitao Lin, Yufei Huang, Tianyu Fan, Stan Z. Li
Furthermore, we identified a potential information drowning problem for existing GNN-to-MLP distillation, i. e., the high-frequency knowledge of the pre-trained GNNs may be overwhelmed by the low-frequency knowledge during distillation; we have described in detail what it represents, how it arises, what impact it has, and how to deal with it.
3 code implementations • 17 Apr 2023 • Yujia Qin, Shengding Hu, Yankai Lin, Weize Chen, Ning Ding, Ganqu Cui, Zheni Zeng, Yufei Huang, Chaojun Xiao, Chi Han, Yi Ren Fung, Yusheng Su, Huadong Wang, Cheng Qian, Runchu Tian, Kunlun Zhu, Shihao Liang, Xingyu Shen, Bokai Xu, Zhen Zhang, Yining Ye, Bowen Li, Ziwei Tang, Jing Yi, Yuzhang Zhu, Zhenning Dai, Lan Yan, Xin Cong, Yaxi Lu, Weilin Zhao, Yuxiang Huang, Junxi Yan, Xu Han, Xian Sun, Dahai Li, Jason Phang, Cheng Yang, Tongshuang Wu, Heng Ji, Zhiyuan Liu, Maosong Sun
Considering the lack of a systematic tool learning evaluation in prior works, we experiment with 18 representative tools and show the potential of current foundation models in skillfully utilizing tools.
no code implementations • 19 Mar 2023 • Jiangbin Zheng, Ge Wang, Yufei Huang, Bozhen Hu, Siyuan Li, Cheng Tan, Xinwen Fan, Stan Z. Li
In this work, we introduce a novel unsupervised protein structure representation pretraining with a robust protein language model.
no code implementations • 5 Feb 2023 • Yufei Huang, Lirong Wu, Haitao Lin, Jiangbin Zheng, Ge Wang, Stan Z. Li
Learning meaningful protein representation is important for a variety of biological downstream tasks such as structure-based drug design.
1 code implementation • 31 Dec 2022 • Lirong Wu, Yufei Huang, Haitao Lin, Stan Z. Li
To pave the way for AI researchers with little bioinformatics background, we present a timely and comprehensive review of PRL formulations and existing PRL methods from the perspective of model architectures, pretext tasks, and downstream applications.
no code implementations • 9 Dec 2022 • Haitao Lin, Lirong Wu, Yongjie Xu, Yufei Huang, Siyuan Li, Guojiang Zhao, Stan Z. Li
Solving partial differential equations is difficult.
1 code implementation • 30 Nov 2022 • Bozhen Hu, Jun Xia, Jiangbin Zheng, Cheng Tan, Yufei Huang, Yongjie Xu, Stan Z. Li
The prediction of protein structures from sequences is an important task for function prediction, drug design, and related biological processes understanding.
no code implementations • 21 Nov 2022 • Haitao Lin, Yufei Huang, Meng Liu, Xuanjing Li, Shuiwang Ji, Stan Z. Li
Previous works usually generate atoms in an auto-regressive way, where element types and 3D coordinates of atoms are generated one by one.
1 code implementation • 13 Nov 2022 • Yufei Huang, Yujia Qin, Huadong Wang, Yichun Yin, Maosong Sun, Zhiyuan Liu, Qun Liu
Inspired by these observations, we propose Fast Prompt Tuning (FPT), which starts by conducting PT using a small-scale partial PLM, and then progressively expands its depth and width until the full-model size.
1 code implementation • ACL 2022 • Jiangbin Zheng, Yile Wang, Ge Wang, Jun Xia, Yufei Huang, Guojiang Zhao, Yue Zhang, Stan Z. Li
Although contextualized embeddings generated from large-scale pre-trained models perform well in many tasks, traditional static embeddings (e. g., Skip-gram, Word2Vec) still play an important role in low-resource and lightweight settings due to their low computational cost, ease of deployment, and stability.
no code implementations • 5 Oct 2022 • Lirong Wu, Yufei Huang, Haitao Lin, Zicheng Liu, Tianyu Fan, Stan Z. Li
Self-supervised learning on graphs has recently achieved remarkable success in graph representation learning.
no code implementations • 12 Sep 2022 • Yufei Huang, Mohsen Jafari, Peter Jin
In this paper, the Safe Route Mapping (SRM) model, a methodology for developing dynamic risk heat maps of roadways, is extended to consider driver behaviors when making predictions.
no code implementations • 28 Mar 2022 • Mohammad Nur Nobi, Ram Krishnan, Yufei Huang, Mehrnoosh Shakarami, Ravi Sandhu
A common trait of current access control approaches is the challenging need to engineer abstract and intuitive access control models.
no code implementations • 25 Sep 2021 • Mario Flores, Zhentao Liu, Ting-He Zhang, Md Musaddaqui Hasib, Yu-Chiao Chiu, Zhenqing Ye, Karla Paniagua, Sumin Jo, Jianqiu Zhang, Shou-Jiang Gao, Yu-Fang Jin, Yidong Chen, Yufei Huang
Here we present a processing pipeline of single-cell RNA-seq data, survey a total of 25 DL algorithms and their applicability for a specific step in the processing pipeline.
1 code implementation • NAACL 2021 • Deming Ye, Yankai Lin, Yufei Huang, Maosong Sun
To address this issue, we propose a dynamic token reduction approach to accelerate PLMs' inference, named TR-BERT, which could flexibly adapt the layer number of each token in inference to avoid redundant calculation.
1 code implementation • 11 Nov 2019 • Sharaj Panwar, Paul Rad, Tzyy-Ping Jung, Yufei Huang
Electroencephalography (EEG) data are difficult to obtain due to complex experimental setups and reduced comfort with prolonged wearing.
1 code implementation • 18 Jun 2019 • Milad Mostavi, Yu-Chiao Chiu, Yufei Huang, Yidong Chen
In breast cancer, for instance, our model identified well-known markers, such as GATA3 and ESR1.
no code implementations • 21 May 2018 • Hung-I Harry Chen, Yu-Chiao Chiu, Tinghe Zhang, Songyao Zhang, Yufei Huang, Yidong Chen
We introduced the concept of the gene superset, an unbiased combination of gene sets with weights trained by the autoencoder, where each node in the latent layer is a superset.
1 code implementation • 20 May 2018 • Yu-Chiao Chiu, Hung-I Harry Chen, Tinghe Zhang, Songyao Zhang, Aparna Gorthi, Li-Ju Wang, Yufei Huang, Yidong Chen
We trained and tested the model on a dataset of 622 cancer cell lines and achieved an overall prediction performance of mean squared error at 1. 96 (log-scale IC50 values).