no code implementations • ACL (IWSLT) 2021 • Xueqing Wu, Yingce Xia, Jinhua Zhu, Lijun Wu, Shufang Xie, Yang Fan, Tao Qin
Data augmentation, which refers to manipulating the inputs (e. g., adding random noise, masking specific parts) to enlarge the dataset, has been widely adopted in machine learning.
no code implementations • ICLR 2019 • Lijun Wu, Jinhua Zhu, Di He, Fei Gao, Xu Tan, Tao Qin, Tie-Yan Liu
Neural machine translation, which achieves near human-level performance in some languages, strongly relies on the availability of large amounts of parallel sentences, which hinders its applicability to low-resource language pairs.
no code implementations • EMNLP 2021 • Bo Yang, Lijun Wu
Therefore, in this paper, we first extract the accompanying clinical notes from EHR and propose a method to integrate these data, we also comprehensively study the different models and the data leverage methods for better medical task prediction performance.
no code implementations • 2 Dec 2024 • Kaiyuan Gao, Yusong Wang, Haoxiang Guan, Zun Wang, Qizhi Pei, John E. Hopcroft, Kun He, Lijun Wu
Two primary obstacles emerge: (1) the difficulty in designing a 3D line notation that ensures SE(3)-invariant atomic coordinates, and (2) the non-trivial task of tokenizing continuous coordinates for use in LMs, which inherently require discrete inputs.
no code implementations • 31 Oct 2024 • Liang He, Peiran Jin, Yaosen Min, Shufang Xie, Lijun Wu, Tao Qin, Xiaozhuan Liang, Kaiyuan Gao, Yuliang Jiang, Tie-Yan Liu
Proteins, essential to biological systems, perform functions intricately linked to their three-dimensional structures.
1 code implementation • 10 Oct 2024 • Tianyi Bai, Ling Yang, Zhen Hao Wong, Jiahui Peng, Xinlin Zhuang, Chi Zhang, Lijun Wu, Jiantao Qiu, Wentao Zhang, Binhang Yuan, Conghui He
Efficient data selection is crucial to accelerate the pretraining of large language models (LLMs).
no code implementations • 21 Jul 2024 • Qizhi Pei, Lijun Wu, Zhenyu He, Jinhua Zhu, Yingce Xia, Shufang Xie, Rui Yan
Specifically, we propose a \emph{label aggregation} with \emph{pair-wise retrieval} and a \emph{representation aggregation} with \emph{point-wise retrieval} of the nearest neighbors.
no code implementations • 9 Jun 2024 • Qizhi Pei, Lijun Wu, Kaiyuan Gao, Jinhua Zhu, Rui Yan
This 3D structure token vocabulary enables the seamless combination of 1D sequence and 3D structure representations in a tokenized format, allowing 3D-MolT5 to encode molecular sequence (SELFIES), molecular structure, and text sequences within a unified architecture.
1 code implementation • 6 Jun 2024 • Zun Wang, Chang Liu, Nianlong Zou, He Zhang, Xinran Wei, Lin Huang, Lijun Wu, Bin Shao
In this study, we introduce a unified neural network architecture, the Deep Equilibrium Density Functional Theory Hamiltonian (DEQH) model, which incorporates Deep Equilibrium Models (DEQs) for predicting Density Functional Theory (DFT) Hamiltonians.
1 code implementation • 26 May 2024 • Hongfei Wu, Lijun Wu, Guoqing Liu, Zhirong Liu, Bin Shao, Zun Wang
In this paper, we develop SE3Set, an SE(3) equivariant hypergraph neural network architecture tailored for advanced molecular representation learning.
1 code implementation • 29 Mar 2024 • Kaiyuan Gao, Qizhi Pei, Jinhua Zhu, Kun He, Lijun Wu
Molecular docking is a pivotal process in drug discovery.
Ranked #1 on Blind Docking on PDBbind
2 code implementations • 3 Mar 2024 • Qizhi Pei, Lijun Wu, Kaiyuan Gao, Jinhua Zhu, Yue Wang, Zun Wang, Tao Qin, Rui Yan
The integration of biomolecular modeling with natural language (BL) has emerged as a promising interdisciplinary area at the intersection of artificial intelligence, chemistry and biology.
1 code implementation • 27 Feb 2024 • Qizhi Pei, Lijun Wu, Kaiyuan Gao, Xiaozhuan Liang, Yin Fang, Jinhua Zhu, Shufang Xie, Tao Qin, Rui Yan
However, previous efforts like BioT5 faced challenges in generalizing across diverse tasks and lacked a nuanced understanding of molecular structures, particularly in their textual representations (e. g., IUPAC).
Ranked #2 on Molecule Captioning on ChEBI-20
no code implementations • 16 Jan 2024 • Zhiyuan Li, Wenshuai Zhao, Lijun Wu, Joni Pajarinen
To enable decentralized execution, we introduce \textit{Individual-Global-Consistency} to guarantee mode consistency during joint training of the centralized and decentralized policies and prove that AgentMixer converges to an $\epsilon$-approximate Correlated Equilibrium.
1 code implementation • 11 Oct 2023 • Qizhi Pei, Wei zhang, Jinhua Zhu, Kehan Wu, Kaiyuan Gao, Lijun Wu, Yingce Xia, Rui Yan
Recent advancements in biological research leverage the integration of molecules, proteins, and natural language to enhance drug discovery.
Ranked #3 on Molecule Captioning on ChEBI-20
1 code implementation • NeurIPS 2023 • Qizhi Pei, Kaiyuan Gao, Lijun Wu, Jinhua Zhu, Yingce Xia, Shufang Xie, Tao Qin, Kun He, Tie-Yan Liu, Rui Yan
In this work, we propose $\mathbf{FABind}$, an end-to-end model that combines pocket prediction and docking to achieve accurate and fast protein-ligand binding.
Ranked #4 on Blind Docking on PDBBind
no code implementations • 7 Jun 2023 • Shufang Xie, Rui Yan, Junliang Guo, Yingce Xia, Lijun Wu, Tao Qin
Furthermore, we propose a lightweight adapter to adjust the weights when combing neural network and KNN predictions conditioned on the hidden representation and the retrieved templates.
no code implementations • 18 May 2023 • Zequn Liu, Wei zhang, Yingce Xia, Lijun Wu, Shufang Xie, Tao Qin, Ming Zhang, Tie-Yan Liu
Considering that text is the most important record for scientific discovery, in this paper, we propose MolXPT, a unified language model of text and molecules pre-trained on SMILES (a sequence representation of molecules) wrapped by text.
Ranked #1 on Molecular Property Prediction on BACE
1 code implementation • ICLR 2023 • Jinhua Zhu, Kehan Wu, Bohan Wang, Yingce Xia, Shufang Xie, Qi Meng, Lijun Wu, Tao Qin, Wengang Zhou, Houqiang Li, Tie-Yan Liu
Despite the recent success of molecular modeling with graph neural networks (GNNs), few models explicitly take rings in compounds into consideration, consequently limiting the expressiveness of the models.
Ranked #1 on Graph Regression on PCQM4M-LSC (Validation MAE metric)
no code implementations • 12 Apr 2023 • Zhiyuan Zhao, Lijun Wu, Chuanxin Tang, Dacheng Yin, Yucheng Zhao, Chong Luo
Filler words like ``um" or ``uh" are common in spontaneous speech.
1 code implementation • 13 Mar 2023 • Yisheng Xiao, Ruiyang Xu, Lijun Wu, Juntao Li, Tao Qin, Yan-Tie Liu, Min Zhang
Experiments on \textbf{3} different tasks (neural machine translation, summarization, and code generation) with \textbf{15} datasets in total confirm that our proposed simple method achieves significant performance improvement over the strong CMLM model.
1 code implementation • 24 Feb 2023 • Chang Ma, Haiteng Zhao, Lin Zheng, Jiayi Xin, Qintong Li, Lijun Wu, Zhihong Deng, Yang Lu, Qi Liu, Lingpeng Kong
RSA links query protein sequences to a set of sequences with similar structures or properties in the database and combines these sequences for downstream prediction.
1 code implementation • 2 Feb 2023 • Zijie Geng, Shufang Xie, Yingce Xia, Lijun Wu, Tao Qin, Jie Wang, Yongdong Zhang, Feng Wu, Tie-Yan Liu
The obtained motif vocabulary consists of not only molecular motifs (i. e., the frequent fragments), but also their connection information, indicating how the motifs are connected with each other.
1 code implementation • 31 Oct 2022 • Zhaochen Su, Zecheng Tang, Xinyan Guan, Juntao Li, Lijun Wu, Min Zhang
Existing methods mainly perform continual training to mitigate such a misalignment.
no code implementations • 26 Oct 2022 • Kaiyuan Gao, Lijun Wu, Jinhua Zhu, Tianbo Peng, Yingce Xia, Liang He, Shufang Xie, Tao Qin, Haiguang Liu, Kun He, Tie-Yan Liu
Specifically, we first pre-train an antibody language model based on the sequence data, then propose a one-shot way for sequence and structure generation of CDR to avoid the heavy cost and error propagation from an autoregressive manner, and finally leverage the pre-trained antibody model for the antigen-specific antibody generation model with some carefully designed modules.
1 code implementation • 30 Aug 2022 • Kehan Wu, Yingce Xia, Yang Fan, Pan Deng, Haiguang Liu, Lijun Wu, Shufang Xie, Tong Wang, Tao Qin, Tie-Yan Liu
Structure-based drug design is drawing growing attentions in computer-aided drug discovery.
1 code implementation • 14 Jul 2022 • Jinhua Zhu, Yingce Xia, Lijun Wu, Shufang Xie, Tao Qin, Wengang Zhou, Houqiang Li, Tie-Yan Liu
The model is pre-trained on three tasks: reconstruction of masked atoms and coordinates, 3D conformation generation conditioned on 2D graph, and 2D graph generation conditioned on 3D conformation.
1 code implementation • 23 Jun 2022 • Shufang Xie, Rui Yan, Peng Han, Yingce Xia, Lijun Wu, Chenjuan Guo, Bin Yang, Tao Qin
We observe that the same intermediate molecules are visited many times in the searching process, and they are usually independently treated in previous tree-based methods (e. g., AND-OR tree search, Monte Carlo tree search).
Ranked #2 on Multi-step retrosynthesis on USPTO-190
2 code implementations • 20 Jun 2022 • Qizhi Pei, Lijun Wu, Jinhua Zhu, Yingce Xia, Shufang Xie, Tao Qin, Haiguang Liu, Tie-Yan Liu, Rui Yan
Accurate prediction of Drug-Target Affinity (DTA) is of vital importance in early-stage drug discovery, facilitating the identification of drugs that can effectively interact with specific targets and regulate their activities.
Ranked #1 on Drug Discovery on KIBA
1 code implementation • 20 Apr 2022 • Yisheng Xiao, Lijun Wu, Junliang Guo, Juntao Li, Min Zhang, Tao Qin, Tie-Yan Liu
While NAR generation can significantly accelerate inference speed for machine translation, the speedup comes at the cost of sacrificed translation accuracy compared to its counterpart, autoregressive (AR) generation.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +12
no code implementations • 28 Feb 2022 • Junhan Yang, Zheng Liu, Shitao Xiao, Jianxun Lian, Lijun Wu, Defu Lian, Guangzhong Sun, Xing Xie
Instead of relying on annotation heuristics defined by humans, it leverages the sentence representation model itself and realizes the following iterative self-supervision process: on one hand, the improvement of sentence representation may contribute to the quality of data annotation; on the other hand, more effective data annotation helps to generate high-quality positive samples, which will further improve the current sentence representation model.
no code implementations • 18 Feb 2022 • Lin Huang, Lijun Wu, Jia Zhang, Jiang Bian, Tie-Yan Liu
How to discover the useful implicit relation between entities and effectively utilize the relations for each entity under various circumstances is crucial.
no code implementations • 18 Feb 2022 • Lin Huang, Qiyuan Dong, Lijun Wu, Jia Zhang, Jiang Bian, Tie-Yan Liu
As a specific semantic segmentation task, aerial imagery segmentation has been widely employed in high spatial resolution (HSR) remote sensing images understanding.
1 code implementation • 3 Feb 2022 • Jinhua Zhu, Yingce Xia, Chang Liu, Lijun Wu, Shufang Xie, Yusong Wang, Tong Wang, Tao Qin, Wengang Zhou, Houqiang Li, Haiguang Liu, Tie-Yan Liu
Molecular conformation generation aims to generate three-dimensional coordinates of all the atoms in a molecule and is an important task in bioinformatics and pharmacology.
2 code implementations • 13 Dec 2021 • Chong Liu, Xiaoyang Liu, Rongqin Zheng, Lixin Zhang, Xiaobo Liang, Juntao Li, Lijun Wu, Min Zhang, Leyu Lin
State-of-the-art sequential recommendation models proposed very recently combine contrastive learning techniques for obtaining high-quality user representations.
no code implementations • NeurIPS 2021 • Weijiang Yu, Haoteng Zheng, Mengfei Li, Lei Ji, Lijun Wu, Nong Xiao, Nan Duan
To consider the interdependent knowledge between contextual clips into the network inference, we propose a Siamese Sampling and Reasoning (SiaSamRea) approach, which consists of a siamese sampling mechanism to generate sparse and similar clips (i. e., siamese clips) from the same video, and a novel reasoning strategy for integrating the interdependent knowledge between contextual clips into the network.
1 code implementation • 29 Oct 2021 • Bo Yang, Lijun Wu
Therefore, in this paper, we first extract the accompanying clinical notes from EHR and propose a method to integrate these data, we also comprehensively study the different models and the data leverage methods for better medical task prediction.
no code implementations • 29 Oct 2021 • Liang He, Shizhuo Zhang, Lijun Wu, Huanhuan Xia, Fusong Ju, He Zhang, Siyuan Liu, Yingce Xia, Jianwei Zhu, Pan Deng, Bin Shao, Tao Qin, Tie-Yan Liu
The key problem in the protein sequence representation learning is to capture the co-evolutionary information reflected by the inter-residue co-variation in the sequences.
1 code implementation • ICLR 2022 • Shufang Xie, Ang Lv, Yingce Xia, Lijun Wu, Tao Qin, Rui Yan, Tie-Yan Liu
Autoregressive sequence generation, a prevalent task in machine learning and natural language processing, generates every target token conditioned on both a source input and previously generated target tokens.
no code implementations • 29 Sep 2021 • Yue Wang, Lijun Wu, Xiaobo Liang, Juntao Li, Min Zhang
Starting from the resurgence of deep learning, language models (LMs) have never been so popular.
no code implementations • 29 Sep 2021 • Xiaobo Liang, Runze Mao, Lijun Wu, Juntao Li, Weiqing Liu, Qing Li, Min Zhang
The common approach of consistency training is performed on the data-level, which typically utilizes the data augmentation strategy (or adversarial training) to make the predictions from the augmented input and the original input to be consistent, so that the model is more robust and attains better generalization ability.
no code implementations • 27 Sep 2021 • Yutai Hou, Yingce Xia, Lijun Wu, Shufang Xie, Yang Fan, Jinhua Zhu, Wanxiang Che, Tao Qin, Tie-Yan Liu
We regard the DTI triplets as a sequence and use a Transformer-based model to directly generate them without using the detailed annotations of entities and relations.
8 code implementations • NeurIPS 2021 • Xiaobo Liang, Lijun Wu, Juntao Li, Yue Wang, Qi Meng, Tao Qin, Wei Chen, Min Zhang, Tie-Yan Liu
Dropout is a powerful and widely used technique to regularize the training of deep neural networks.
Ranked #4 on Machine Translation on WMT2014 English-French
no code implementations • 8 Jun 2021 • Shiqi Gong, Qi Meng, Yue Wang, Lijun Wu, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu
In this paper, to reduce the reliance on the numerical solver, we propose to enhance the supervised signal in the training of NODE.
1 code implementation • NA 2021 • Boling Li, Yingce Xia, Shufang Xie, Lijun Wu, Tao Qin
To overcome this difficulty, we propose an anchorbased distance: First, we randomly select K anchor vertices from the graph and then calculate the shortest distances of all vertices in the graph to them.
Ranked #1 on Link Property Prediction on ogbl-ddi
no code implementations • NAACL 2021 • Zhen Wu, Lijun Wu, Qi Meng, Yingce Xia, Shufang Xie, Tao Qin, Xinyu Dai, Tie-Yan Liu
Therefore, in this paper, we integrate different dropout techniques into the training of Transformer models.
Ranked #5 on Machine Translation on IWSLT2014 English-German
1 code implementation • ICLR 2021 • Jinhua Zhu, Lijun Wu, Yingce Xia, Shufang Xie, Tao Qin, Wengang Zhou, Houqiang Li, Tie-Yan Liu
Based on this observation, in this work, we break the assumption of the fixed layer order in the Transformer and introduce instance-wise layer reordering into the model structure.
1 code implementation • 1 Jan 2021 • Xueqing Wu, Yingce Xia, Lijun Wu, Shufang Xie, Weiqing Liu, Tao Qin, Tie-Yan Liu
For wait-k inference, we observe that wait-m training with $m>k$ in simultaneous NMT (i. e., using more future information for training than inference) generally outperforms wait-k training.
1 code implementation • 11 Dec 2020 • Hongshun Tang, Lijun Wu, Weiqing Liu, Jiang Bian
Stock trend forecasting has become a popular research direction that attracts widespread attention in the financial field.
1 code implementation • 15 Oct 2020 • Jinhua Zhu, Yingce Xia, Lijun Wu, Jiajun Deng, Wengang Zhou, Tao Qin, Houqiang Li
During inference, the CNN encoder and the policy network are used to take actions, and the Transformer module is discarded.
2 code implementations • 10 Jul 2020 • Xueqing Wu, Lewen Wang, Yingce Xia, Weiqing Liu, Lijun Wu, Shufang Xie, Tao Qin, Tie-Yan Liu
In many applications, a sequence learning task is usually associated with multiple temporally correlated auxiliary tasks, which are different in terms of how much input information to use or which future step to predict.
no code implementations • 9 Jul 2020 • Yang Fan, Yingce Xia, Lijun Wu, Shufang Xie, Weiqing Liu, Jiang Bian, Tao Qin, Xiang-Yang Li
Recently, the concept of teaching has been introduced into machine learning, in which a teacher model is used to guide the training of a student model (which will be used in real tasks) through data selection, loss function design, etc.
1 code implementation • 18 Jun 2020 • Yang Fan, Shufang Xie, Yingce Xia, Lijun Wu, Tao Qin, Xiang-Yang Li, Tie-Yan Liu
While the multi-branch architecture is one of the key ingredients to the success of computer vision tasks, it has not been well investigated in natural language processing, especially sequence learning tasks.
Ranked #4 on Machine Translation on WMT2014 English-German (SacreBLEU metric)
7 code implementations • NeurIPS 2020 • Xiang Li, Wenhai Wang, Lijun Wu, Shuo Chen, Xiaolin Hu, Jun Li, Jinhui Tang, Jian Yang
Specifically, we merge the quality estimation into the class prediction vector to form a joint representation of localization quality and classification, and use a vector to represent arbitrary distribution of box locations.
Ranked #107 on Object Detection on COCO test-dev
3 code implementations • ICLR 2020 • Jinhua Zhu, Yingce Xia, Lijun Wu, Di He, Tao Qin, Wengang Zhou, Houqiang Li, Tie-Yan Liu
While BERT is more commonly used as fine-tuning instead of contextual embedding for downstream language understanding tasks, in NMT, our preliminary exploration of using BERT as contextual embedding is better than using for fine-tuning.
no code implementations • WS 2019 • Yingce Xia, Xu Tan, Fei Tian, Fei Gao, Weicong Chen, Yang Fan, Linyuan Gong, Yichong Leng, Renqian Luo, Yiren Wang, Lijun Wu, Jinhua Zhu, Tao Qin, Tie-Yan Liu
We Microsoft Research Asia made submissions to 11 language directions in the WMT19 news translation tasks.
no code implementations • IJCNLP 2019 • Lijun Wu, Jinhua Zhu, Di He, Fei Gao, Tao Qin, Jian-Huang Lai, Tie-Yan Liu
1) We provide a simple approach to mine implicitly bilingual sentence pairs from document pairs which can then be used as supervised training signals.
no code implementations • IJCNLP 2019 • Lijun Wu, Yiren Wang, Yingce Xia, Tao Qin, Jian-Huang Lai, Tie-Yan Liu
In this work, we study how to use both the source-side and target-side monolingual data for NMT, and propose an effective strategy leveraging both of them.
Ranked #1 on Machine Translation on WMT2016 English-German (SacreBLEU metric, using extra training data)
no code implementations • 25 Aug 2019 • Xu Tan, Yingce Xia, Lijun Wu, Tao Qin
In this paper, we propose an efficient method to generate a sequence in both left-to-right and right-to-left manners using a single encoder and decoder, combining the advantages of both generation directions.
1 code implementation • ACL 2019 • Lijun Wu, Yiren Wang, Yingce Xia, Fei Tian, Fei Gao, Tao Qin, Jian-Huang Lai, Tie-Yan Liu
While very deep neural networks have shown effectiveness for computer vision and text classification applications, how to increase the network depth of neural machine translation (NMT) models for better translation quality remains a challenging problem.
Ranked #11 on Machine Translation on WMT2014 English-French
1 code implementation • ACL 2019 • Jinhua Zhu, Fei Gao, Lijun Wu, Yingce Xia, Tao Qin, Wengang Zhou, Xue-Qi Cheng, Tie-Yan Liu
While data augmentation is an important trick to boost the accuracy of deep learning methods in computer vision tasks, its study in natural language tasks is still very limited.
no code implementations • NeurIPS 2018 • Lijun Wu, Fei Tian, Yingce Xia, Yang Fan, Tao Qin, Jian-Huang Lai, Tie-Yan Liu
Different from typical learning settings in which the loss function of a machine learning model is predefined and fixed, in our framework, the loss function of a machine learning model (we call it student) is defined by another machine learning model (we call it teacher).
no code implementations • EMNLP 2018 • Lijun Wu, Xu Tan, Di He, Fei Tian, Tao Qin, Jian-Huang Lai, Tie-Yan Liu
Many previous works have discussed the relationship between error propagation and the \emph{accuracy drop} (i. e., the left part of the translated sentence is often better than its right part in left-to-right decoding models) problem.
1 code implementation • EMNLP 2018 • Lijun Wu, Fei Tian, Tao Qin, Jian-Huang Lai, Tie-Yan Liu
Recent studies have shown that reinforcement learning (RL) is an effective approach for improving the performance of neural machine translation (NMT) system.
no code implementations • NAACL 2018 • Fei Gao, Lijun Wu, Li Zhao, Tao Qin, Xue-Qi Cheng, Tie-Yan Liu
Recurrent neural networks have achieved state-of-the-art results in many artificial intelligence tasks, such as language modeling, neural machine translation, speech recognition and so on.
2 code implementations • 15 Mar 2018 • Hany Hassan, Anthony Aue, Chang Chen, Vishal Chowdhary, Jonathan Clark, Christian Federmann, Xuedong Huang, Marcin Junczys-Dowmunt, William Lewis, Mu Li, Shujie Liu, Tie-Yan Liu, Renqian Luo, Arul Menezes, Tao Qin, Frank Seide, Xu Tan, Fei Tian, Lijun Wu, Shuangzhi Wu, Yingce Xia, Dong-dong Zhang, Zhirui Zhang, Ming Zhou
Machine translation has made rapid advances in recent years.
Ranked #3 on Machine Translation on WMT 2017 English-Chinese
no code implementations • NeurIPS 2017 • Yingce Xia, Fei Tian, Lijun Wu, Jianxin Lin, Tao Qin, Nenghai Yu, Tie-Yan Liu
In this work, we introduce the deliberation process into the encoder-decoder framework and propose deliberation networks for sequence generation.
Ranked #22 on Machine Translation on WMT2014 English-French
no code implementations • 20 Apr 2017 • Lijun Wu, Yingce Xia, Li Zhao, Fei Tian, Tao Qin, Jian-Huang Lai, Tie-Yan Liu
The goal of the adversary is to differentiate the translation result generated by the NMT model from that by human.