Search Results for author: Shuxin Zheng

Found 27 papers, 15 papers with code

Towards Generalist Prompting for Large Language Models by Mental Models

no code implementations • 28 Feb 2024 • Haoxiang Guan, Jiyan He, Shuxin Zheng, En-Hong Chen, Weiming Zhang, Nenghai Yu

MeMo distills the cores of various prompting methods into individual mental models and allows LLMs to autonomously select the most suitable mental models for the problem, achieving or being near to the state-of-the-art results on diverse tasks such as STEM, logical reasoning, and commonsense reasoning in zero-shot settings.

Logical Reasoning

Paper
Add Code

Control Risk for Potential Misuse of Artificial Intelligence in Science

1 code implementation • 11 Dec 2023 • Jiyan He, Weitao Feng, Yaosen Min, Jingwei Yi, Kunsheng Tang, Shuai Li, Jie Zhang, Kejiang Chen, Wenbo Zhou, Xing Xie, Weiming Zhang, Nenghai Yu, Shuxin Zheng

In this study, we aim to raise awareness of the dangers of AI misuse in science, and call for responsible AI development and use in this domain.

Paper
Code

Inverse Design of Vitrimeric Polymers by Molecular Dynamics and Generative Modeling

no code implementations • 6 Dec 2023 • Yiwen Zheng, Prakash Thakolkaran, Jake A. Smith, Ziheng Lu, Shuxin Zheng, Bichlien H. Nguyen, Siddhant Kumar, Aniruddh Vashisth

Vitrimer is a new class of sustainable polymers with the ability of self-healing through rearrangement of dynamic covalent adaptive networks.

Paper
Add Code

Overcoming the Barrier of Orbital-Free Density Functional Theory for Molecular Systems Using Deep Learning

no code implementations • 28 Sep 2023 • He Zhang, Siyuan Liu, Jiacheng You, Chang Liu, Shuxin Zheng, Ziheng Lu, Tong Wang, Nanning Zheng, Bin Shao

Orbital-free density functional theory (OFDFT) is a quantum chemistry formulation that has a lower cost scaling than the prevailing Kohn-Sham DFT, which is increasingly desired for contemporary molecular research.

Paper
Add Code

Towards Predicting Equilibrium Distributions for Molecular Systems with Deep Learning

no code implementations • 8 Jun 2023 • Shuxin Zheng, Jiyan He, Chang Liu, Yu Shi, Ziheng Lu, Weitao Feng, Fusong Ju, Jiaxi Wang, Jianwei Zhu, Yaosen Min, He Zhang, Shidi Tang, Hongxia Hao, Peiran Jin, Chi Chen, Frank Noé, Haiguang Liu, Tie-Yan Liu

In this paper, we introduce a novel deep learning framework, called Distributional Graphormer (DiG), in an attempt to predict the equilibrium distribution of molecular systems.

Paper
Add Code

Invertible Rescaling Network and Its Extensions

1 code implementation • 9 Oct 2022 • Mingqing Xiao, Shuxin Zheng, Chang Liu, Zhouchen Lin, Tie-Yan Liu

To be specific, we develop invertible models to generate valid degraded images and meanwhile transform the distribution of lost contents to the fixed distribution of a latent variable during the forward degradation.

Colorization Image Compression

620

Paper
Code

One Transformer Can Understand Both 2D & 3D Molecular Data

1 code implementation • 4 Oct 2022 • Shengjie Luo, Tianlang Chen, Yixian Xu, Shuxin Zheng, Tie-Yan Liu, LiWei Wang, Di He

To achieve this goal, in this work, we develop a novel Transformer-based Molecular model called Transformer-M, which can take molecular data of 2D or 3D formats as input and generate meaningful semantic representations.

Ranked #4 on Graph Regression on PCQM4Mv2-LSC

Graph Regression molecular representation +1

191

Paper
Code

From Static to Dynamic Structures: Improving Binding Affinity Prediction with a Graph-Based Deep Learning Model

1 code implementation • 19 Aug 2022 • Yaosen Min, Ye Wei, Peizhuo Wang, Xiaoting Wang, Han Li, Nian Wu, Stefan Bauer, Shuxin Zheng, Yu Shi, Yingheng Wang, Ji Wu, Dan Zhao, Jianyang Zeng

Accurate prediction of the protein-ligand binding affinities is an essential challenge in the structure-based drug design.

Drug Discovery

Paper
Code

Quantized Training of Gradient Boosting Decision Trees

2 code implementations • 20 Jul 2022 • Yu Shi, Guolin Ke, Zhuoming Chen, Shuxin Zheng, Tie-Yan Liu

Recent years have witnessed significant success in Gradient Boosting Decision Trees (GBDT) for a wide range of machine learning applications.

Quantization

16,038

Paper
Code

Your Transformer May Not be as Powerful as You Expect

1 code implementation • 26 May 2022 • Shengjie Luo, Shanda Li, Shuxin Zheng, Tie-Yan Liu, LiWei Wang, Di He

Extensive experiments covering typical architectures and tasks demonstrate that our model is parameter-efficient and can achieve superior performance to strong baselines in a wide range of applications.

Paper
Code

Benchmarking Graphormer on Large-Scale Molecular Modeling Datasets

3 code implementations • 9 Mar 2022 • Yu Shi, Shuxin Zheng, Guolin Ke, Yifei Shen, Jiacheng You, Jiyan He, Shengjie Luo, Chang Liu, Di He, Tie-Yan Liu

This technical note describes the recent updates of Graphormer, including architecture design modifications, and the adaption to 3D molecular dynamics simulation.

Ranked #7 on Initial Structure to Relaxed Energy (IS2RE), Direct on OC20

Benchmarking Graph Regression +1

1,894

Paper
Code

An Empirical Study of Graphormer on Large-Scale Molecular Modeling Datasets

no code implementations • 28 Feb 2022 • Yu Shi, Shuxin Zheng, Guolin Ke, Yifei Shen, Jiacheng You, Jiyan He, Shengjie Luo, Chang Liu, Di He, Tie-Yan Liu

This technical note describes the recent updates of Graphormer, including architecture design modifications, and the adaption to 3D molecular dynamics simulation.

Paper
Add Code

Do Transformers Really Perform Badly for Graph Representation?

no code implementations • NeurIPS 2021 • Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, Tie-Yan Liu

Our key insight to utilizing Transformer in the graph is the necessity of effectively encoding the structural information of a graph into the model.

Graph Representation Learning

Paper
Add Code

Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding

no code implementations • NeurIPS 2021 • Shengjie Luo, Shanda Li, Tianle Cai, Di He, Dinglan Peng, Shuxin Zheng, Guolin Ke, LiWei Wang, Tie-Yan Liu

Since in many state-of-the-art models, relative positional encoding is used as default, designing efficient Transformers that can incorporate RPE is appealing.

Paper
Add Code

First Place Solution of KDD Cup 2021 & OGB Large-Scale Challenge Graph Prediction Track

4 code implementations • 15 Jun 2021 • Chengxuan Ying, Mingqi Yang, Shuxin Zheng, Guolin Ke, Shengjie Luo, Tianle Cai, Chenglin Wu, Yuxin Wang, Yanming Shen, Di He

In this technical report, we present our solution of KDD Cup 2021 OGB Large-Scale Challenge - PCQM4M-LSC Track.

1,894

Paper
Code

Do Transformers Really Perform Bad for Graph Representation?

4 code implementations • 9 Jun 2021 • Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, Tie-Yan Liu

Our key insight to utilizing Transformer in the graph is the necessity of effectively encoding the structural information of a graph into the model.

Ranked #1 on Graph Regression on PCQM4M-LSC

Graph Classification Graph Property Prediction +2

1,894

Paper
Code

How could Neural Networks understand Programs?

1 code implementation • 10 May 2021 • Dinglan Peng, Shuxin Zheng, Yatao Li, Guolin Ke, Di He, Tie-Yan Liu

Inspired by this, we propose a novel program semantics learning paradigm, that the model should learn from information composed of (1) the representations which align well with the fundamental operations in operational semantics, and (2) the information of environment transition, which is indispensable for program understanding.

valid

120

Paper
Code

Revisiting Language Encoding in Learning Multilingual Representations

1 code implementation • 16 Feb 2021 • Shengjie Luo, Kaiyuan Gao, Shuxin Zheng, Guolin Ke, Di He, LiWei Wang, Tie-Yan Liu

The language embedding can be either added to the word embedding or attached at the beginning of the sentence.

Sentence Word Embeddings

Paper
Code

Modeling Lost Information in Lossy Image Compression

no code implementations • 22 Jun 2020 • Yaolong Wang, Mingqing Xiao, Chang Liu, Shuxin Zheng, Tie-Yan Liu

Specifically, ILC introduces an invertible encoding module to replace the encoder-decoder structure to produce the low dimensional informative latent representation, meanwhile, transform the lost information into an auxiliary latent variable that won't be further coded or stored.

Image Compression

Paper
Add Code

MC-BERT: Efficient Language Pre-Training via a Meta Controller

1 code implementation • 10 Jun 2020 • Zhenhui Xu, Linyuan Gong, Guolin Ke, Di He, Shuxin Zheng, Li-Wei Wang, Jiang Bian, Tie-Yan Liu

Pre-trained contextual representations (e. g., BERT) have become the foundation to achieve state-of-the-art results on many NLP tasks.

Binary Classification Cloze Test +4

Paper
Code

Invertible Image Rescaling

10 code implementations • ECCV 2020 • Mingqing Xiao, Shuxin Zheng, Chang Liu, Yaolong Wang, Di He, Guolin Ke, Jiang Bian, Zhouchen Lin, Tie-Yan Liu

High-resolution digital images are usually downscaled to fit various display screens or save the cost of storage and bandwidth, meanwhile the post-upscaling is adpoted to recover the original resolutions or the details in the zoom-in images.

Image Super-Resolution

620

Paper
Code

Cross-Iteration Batch Normalization

2 code implementations • CVPR 2021 • Zhuliang Yao, Yue Cao, Shuxin Zheng, Gao Huang, Stephen Lin

We thus compensate for the network weight changes via a proposed technique based on Taylor polynomials, so that the statistics can be accurately estimated and batch normalization can be effectively applied.

Ranked #180 on Object Detection on COCO test-dev

Image Classification object-detection +1

129

Paper
Code

On Layer Normalization in the Transformer Architecture

8 code implementations • ICML 2020 • Ruibin Xiong, Yunchang Yang, Di He, Kai Zheng, Shuxin Zheng, Chen Xing, Huishuai Zhang, Yanyan Lan, Li-Wei Wang, Tie-Yan Liu

This motivates us to remove the warm-up stage for the training of Pre-LN Transformers.

7,522

Paper
Code

G-SGD: Optimizing ReLU Neural Networks in its Positively Scale-Invariant Space

no code implementations • ICLR 2019 • Qi Meng, Shuxin Zheng, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

Then, a natural question is: \emph{can we construct a new vector space that is positively scale-invariant and sufficient to represent ReLU neural networks so as to better facilitate the optimization process }?

Paper
Add Code

Capacity Control of ReLU Neural Networks by Basis-path Norm

no code implementations • 19 Sep 2018 • Shuxin Zheng, Qi Meng, Huishuai Zhang, Wei Chen, Nenghai Yu, Tie-Yan Liu

Motivated by this, we propose a new norm \emph{Basis-path Norm} based on a group of linearly independent paths to measure the capacity of neural networks more accurately.

Paper
Add Code

$\mathcal{G}$-SGD: Optimizing ReLU Neural Networks in its Positively Scale-Invariant Space

no code implementations • 11 Feb 2018 • Qi Meng, Shuxin Zheng, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

Paper
Add Code

Asynchronous Stochastic Gradient Descent with Delay Compensation

no code implementations • ICML 2017 • Shuxin Zheng, Qi Meng, Taifeng Wang, Wei Chen, Nenghai Yu, Zhi-Ming Ma, Tie-Yan Liu

We propose a novel technology to compensate this delay, so as to make the optimization behavior of ASGD closer to that of sequential SGD.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.