Search Results for author: Chao Pang

Found 17 papers, 10 papers with code

ERNIE-M: Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora

2 code implementations • EMNLP 2021 • Xuan Ouyang, Shuohuan Wang, Chao Pang, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang

In this paper, we propose ERNIE-M, a new training method that encourages the model to align the representation of multiple languages with monolingual corpora, to overcome the constraint that the parallel corpus size places on the model performance.

Ranked #14 on Zero-Shot Cross-Lingual Transfer on XTREME

Sentence Translation

11,384

Paper
Code

ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation

2 code implementations • 5 Jul 2021 • Yu Sun, Shuohuan Wang, Shikun Feng, Siyu Ding, Chao Pang, Junyuan Shang, Jiaxiang Liu, Xuyi Chen, Yanbin Zhao, Yuxiang Lu, Weixin Liu, Zhihua Wu, Weibao Gong, Jianzhong Liang, Zhizhou Shang, Peng Sun, Wei Liu, Xuan Ouyang, dianhai yu, Hao Tian, Hua Wu, Haifeng Wang

We trained the model with 10 billion parameters on a 4TB corpus consisting of plain texts and a large-scale knowledge graph.

Few-Shot Learning Natural Language Understanding +2

11,384

Paper
Code

ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation

3 code implementations • 23 Dec 2021 • Shuohuan Wang, Yu Sun, Yang Xiang, Zhihua Wu, Siyu Ding, Weibao Gong, Shikun Feng, Junyuan Shang, Yanbin Zhao, Chao Pang, Jiaxiang Liu, Xuyi Chen, Yuxiang Lu, Weixin Liu, Xi Wang, Yangfan Bai, Qiuliang Chen, Li Zhao, Shiyong Li, Peng Sun, dianhai yu, Yanjun Ma, Hao Tian, Hua Wu, Tian Wu, Wei Zeng, Ge Li, Wen Gao, Haifeng Wang

A unified framework named ERNIE 3. 0 was recently proposed for pre-training large-scale knowledge enhanced models and trained a model with 10 billion parameters.

Language Modelling

11,384

Paper
Code

ERNIE-Code: Beyond English-Centric Cross-lingual Pretraining for Programming Languages

1 code implementation • 13 Dec 2022 • Yekun Chai, Shuohuan Wang, Chao Pang, Yu Sun, Hao Tian, Hua Wu

Extensive results show that ERNIE-Code outperforms previous multilingual LLMs for PL or NL across a wide range of end tasks of code intelligence, including multilingual code-to-text, text-to-code, code-to-code, and text-to-text generation.

Code Summarization Language Modelling +2

11,384

Paper
Code

ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech

2 code implementations • 7 Nov 2022 • Xiaoran Fan, Chao Pang, Tian Yuan, He Bai, Renjie Zheng, Pengfei Zhu, Shuohuan Wang, Junkun Chen, Zeyu Chen, Liang Huang, Yu Sun, Hua Wu

In this paper, we extend the pretraining method for cross-lingual multi-speaker speech synthesis tasks, including cross-lingual multi-speaker voice cloning and cross-lingual multi-speaker speech editing.

Representation Learning Speech Synthesis +2

10,098

Paper
Code

Detecting Building Changes with Off-Nadir Aerial Images

1 code implementation • 26 Jan 2023 • Chao Pang, Jiang Wu, Jian Ding, Can Song, Gui-Song Xia

The tilted viewing nature of the off-nadir aerial images brings severe challenges to the building change detection (BCD) problem: the mismatch of the nearby buildings and the semantic ambiguity of the building facades.

Building change detection for remote sensing images Change Detection

Paper
Code

MechRetro is a chemical-mechanism-driven graph learning framework for interpretable retrosynthesis prediction and pathway planning

1 code implementation • 6 Oct 2022 • Yu Wang, Chao Pang, Yuzhe Wang, Yi Jiang, Junru Jin, Sirui Liang, Quan Zou, Leyi Wei

Leveraging artificial intelligence for automatic retrosynthesis speeds up organic pathway planning in digital laboratories.

Drug Discovery Graph Learning +3

Paper
Code

H2RSVLM: Towards Helpful and Honest Remote Sensing Large Vision Language Model

1 code implementation • 29 Mar 2024 • Chao Pang, Jiang Wu, Jiayu Li, Yi Liu, Jiaxing Sun, Weijia Li, Xingxing Weng, Shuai Wang, Litong Feng, Gui-Song Xia, Conghui He

The generic large Vision-Language Models (VLMs) is rapidly developing, but still perform poorly in Remote Sensing (RS) domain, which is due to the unique and specialized nature of RS imagery and the comparatively limited spatial perception of current VLMs.

Hallucination Language Modelling +2

Paper
Code

HiCD: Change Detection in Quality-Varied Images via Hierarchical Correlation Distillation

1 code implementation • 19 Jan 2024 • Chao Pang, Xingxing Weng, Jiang Wu, Qiang Wang, Gui-Song Xia

This ensures effective knowledge transfer while maintaining the student model's training flexibility.

Change Detection Knowledge Distillation +1

Paper
Code

iDNA-ABF: multi-scale deep biological language learning model for the interpretable prediction of DNA methylations

2 code implementations • Genome Biology 2022 • Junru Jin, Yingying Yu, Ruheng Wang, Xin Zeng, Chao Pang, Yi Jiang, Zhongshen Li, Yutong Dai, Ran Su, Quan Zou, Kenta Nakai, Leyi Wei

In this study, we propose iDNA-ABF, a multi-scale deep biological language learning model that enables the interpretable prediction of DNA methylations based on genomic sequences only.

Benchmarking Text Classification

Paper
Code

Building Change Detection for Remote Sensing Images Using a Dual Task Constrained Deep Siamese Convolutional Network Model

no code implementations • 17 Sep 2019 • Yi Liu, Chao Pang, Zongqian Zhan, Xiaomeng Zhang, Xue Yang

In recent years, building change detection methods have made great progress by introducing deep learning, but they still suffer from the problem of the extracted features not being discriminative enough, resulting in incomplete regions and irregular boundaries.

Building change detection for remote sensing images Change Detection +3

Paper
Add Code

abcbpc at SemEval-2021 Task 7: ERNIE-based Multi-task Model for Detecting and Rating Humor and Offense

no code implementations • SEMEVAL 2021 • Chao Pang, Xiaoran Fan, Weiyue Su, Xuyi Chen, Shuohuan Wang, Jiaxiang Liu, Xuan Ouyang, Shikun Feng, Yu Sun

This paper describes our system participated in Task 7 of SemEval-2021: Detecting and Rating Humor and Offense.

Ensemble Learning

Paper
Add Code

Correcting Chinese Spelling Errors with Phonetic Pre-training

no code implementations • Findings (ACL) 2021 • Ruiqing Zhang, Chao Pang, Chuanqiang Zhang, Shuohuan Wang, Zhongjun He, Yu Sun, Hua Wu, Haifeng Wang

Paper
Add Code

CEHR-BERT: Incorporating temporal information from structured EHR data to improve prediction tasks

no code implementations • 10 Nov 2021 • Chao Pang, Xinzhuo Jiang, Krishna S Kalluri, Matthew Spotnitz, Ruijun Chen, Adler Perotte, Karthik Natarajan

CEHR-BERT also demonstrated strong transfer learning capability, as our model trained on only 5% of data outperformed comparison models trained on the entire data set.

Disease Prediction Transfer Learning

Paper
Add Code

Multi-view deep learning based molecule design and structural optimization accelerates the SARS-CoV-2 inhibitor discovery

no code implementations • 3 Dec 2022 • Chao Pang, Yu Wang, Yi Jiang, Ruheng Wang, Ran Su, Leyi Wei

Moreover, case study results on targeted molecule generation for the SARS-CoV-2 main protease (Mpro) show that by integrating molecule docking into our model as chemical priori, we successfully generate new small molecules with desired drug-like properties for the Mpro, potentially accelerating the de novo design of Covid-19 drugs.

Benchmarking Representation Learning

Paper
Add Code

ERNIE-Music: Text-to-Waveform Music Generation with Diffusion Models

no code implementations • 9 Feb 2023 • Pengfei Zhu, Chao Pang, Yekun Chai, Lei LI, Shuohuan Wang, Yu Sun, Hao Tian, Hua Wu

In response to this lacuna, this paper introduces a pioneering contribution in the form of a text-to-waveform music generation model, underpinned by the utilization of diffusion models.

Music Generation Text-to-Music Generation

Paper
Add Code

CEHR-GPT: Generating Electronic Health Records with Chronological Patient Timelines

no code implementations • 6 Feb 2024 • Chao Pang, Xinzhuo Jiang, Nishanth Parameshwar Pavinkurve, Krishna S. Kalluri, Elise L. Minto, Jason Patterson, Linying Zhang, George Hripcsak, Noémie Elhadad, Karthik Natarajan

Synthetic Electronic Health Records (EHR) have emerged as a pivotal tool in advancing healthcare applications and machine learning models, particularly for researchers without direct access to healthcare data.

counterfactual Counterfactual Reasoning +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.