Search Results for author: Nan Du

Found 54 papers, 15 papers with code

Are Large Language Models Good Prompt Optimizers?

no code implementations3 Feb 2024 Ruotian Ma, Xiaolei Wang, Xin Zhou, Jian Li, Nan Du, Tao Gui, Qi Zhang, Xuanjing Huang

Despite the success, the underlying mechanism of this approach remains unexplored, and the true effectiveness of LLMs as Prompt Optimizers requires further validation.

valid

On Diversified Preferences of Large Language Model Alignment

1 code implementation12 Dec 2023 Dun Zeng, Yong Dai, Pengyu Cheng, Tianhao Hu, Wanshun Chen, Nan Du, Zenglin Xu

Our analysis reveals a correlation between the calibration performance of reward models (RMs) and the alignment performance of LLMs.

Language Modelling Large Language Model

Learning to Skip for Language Modeling

no code implementations26 Nov 2023 Dewen Zeng, Nan Du, Tao Wang, Yuanzhong Xu, Tao Lei, Zhifeng Chen, Claire Cui

Overparameterized large-scale language models have impressive generalization performance of in-context few-shot learning.

Few-Shot Learning Language Modelling

Adversarial Preference Optimization

1 code implementation14 Nov 2023 Pengyu Cheng, Yifan Yang, Jian Li, Yong Dai, Tianhao Hu, Peixin Cao, Nan Du

Human preference alignment is essential to improve the interaction quality of large language models (LLMs).

Everyone Deserves A Reward: Learning Customized Human Preferences

1 code implementation6 Sep 2023 Pengyu Cheng, Jiawen Xie, Ke Bai, Yong Dai, Nan Du

Besides, from the perspective of data efficiency, we propose a three-stage customized RM learning scheme, then empirically verify its effectiveness on both general preference datasets and our DSP set.

Imitation Learning

Chunk, Align, Select: A Simple Long-sequence Processing Method for Transformers

no code implementations25 Aug 2023 Jiawen Xie, Pengyu Cheng, Xiao Liang, Yong Dai, Nan Du

Although dominant in natural language processing, transformer-based models remain challenged by the task of long-sequence processing, because the computational cost of self-attention operations in transformers swells quadratically with the input sequence length.

Reading Comprehension Text Summarization

Brainformers: Trading Simplicity for Efficiency

no code implementations29 May 2023 Yanqi Zhou, Nan Du, Yanping Huang, Daiyi Peng, Chang Lan, Da Huang, Siamak Shakeri, David So, Andrew Dai, Yifeng Lu, Zhifeng Chen, Quoc Le, Claire Cui, James Laundon, Jeff Dean

Using this insight, we develop a complex block, named Brainformer, that consists of a diverse sets of layers such as sparsely gated feed-forward layers, dense feed-forward layers, attention layers, and various forms of layer normalization and activation functions.

Lifelong Language Pretraining with Distribution-Specialized Experts

no code implementations20 May 2023 Wuyang Chen, Yanqi Zhou, Nan Du, Yanping Huang, James Laudon, Zhifeng Chen, Claire Cu

Compared to existing lifelong learning approaches, Lifelong-MoE achieves better few-shot performance on 19 downstream NLP tasks.

DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining

2 code implementations NeurIPS 2023 Sang Michael Xie, Hieu Pham, Xuanyi Dong, Nan Du, Hanxiao Liu, Yifeng Lu, Percy Liang, Quoc V. Le, Tengyu Ma, Adams Wei Yu

The mixture proportions of pretraining data domains (e. g., Wikipedia, books, web text) greatly affect language model (LM) performance.

Language Modelling

PaLM 2 Technical Report

1 code implementation17 May 2023 Rohan Anil, Andrew M. Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, Eric Chu, Jonathan H. Clark, Laurent El Shafey, Yanping Huang, Kathy Meier-Hellstern, Gaurav Mishra, Erica Moreira, Mark Omernick, Kevin Robinson, Sebastian Ruder, Yi Tay, Kefan Xiao, Yuanzhong Xu, Yujing Zhang, Gustavo Hernandez Abrego, Junwhan Ahn, Jacob Austin, Paul Barham, Jan Botha, James Bradbury, Siddhartha Brahma, Kevin Brooks, Michele Catasta, Yong Cheng, Colin Cherry, Christopher A. Choquette-Choo, Aakanksha Chowdhery, Clément Crepy, Shachi Dave, Mostafa Dehghani, Sunipa Dev, Jacob Devlin, Mark Díaz, Nan Du, Ethan Dyer, Vlad Feinberg, Fangxiaoyu Feng, Vlad Fienber, Markus Freitag, Xavier Garcia, Sebastian Gehrmann, Lucas Gonzalez, Guy Gur-Ari, Steven Hand, Hadi Hashemi, Le Hou, Joshua Howland, Andrea Hu, Jeffrey Hui, Jeremy Hurwitz, Michael Isard, Abe Ittycheriah, Matthew Jagielski, Wenhao Jia, Kathleen Kenealy, Maxim Krikun, Sneha Kudugunta, Chang Lan, Katherine Lee, Benjamin Lee, Eric Li, Music Li, Wei Li, Yaguang Li, Jian Li, Hyeontaek Lim, Hanzhao Lin, Zhongtao Liu, Frederick Liu, Marcello Maggioni, Aroma Mahendru, Joshua Maynez, Vedant Misra, Maysam Moussalem, Zachary Nado, John Nham, Eric Ni, Andrew Nystrom, Alicia Parrish, Marie Pellat, Martin Polacek, Alex Polozov, Reiner Pope, Siyuan Qiao, Emily Reif, Bryan Richter, Parker Riley, Alex Castro Ros, Aurko Roy, Brennan Saeta, Rajkumar Samuel, Renee Shelby, Ambrose Slone, Daniel Smilkov, David R. So, Daniel Sohn, Simon Tokumine, Dasha Valter, Vijay Vasudevan, Kiran Vodrahalli, Xuezhi Wang, Pidong Wang, ZiRui Wang, Tao Wang, John Wieting, Yuhuai Wu, Kelvin Xu, Yunhan Xu, Linting Xue, Pengcheng Yin, Jiahui Yu, Qiao Zhang, Steven Zheng, Ce Zheng, Weikang Zhou, Denny Zhou, Slav Petrov, Yonghui Wu

Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on downstream tasks across different model sizes, while simultaneously exhibiting faster and more efficient inference compared to PaLM.

Code Generation Common Sense Reasoning +6

Model Based Reinforcement Learning with Non-Gaussian Environment Dynamics and its Application to Portfolio Optimization

no code implementations23 Jan 2023 Huifang Huang, Ting Gao, Pengbo Li, Jin Guo, Peng Zhang, Nan Du

With the fast development of quantitative portfolio optimization in financial engineering, lots of AI-based algorithmic trading strategies have demonstrated promising results, among which reinforcement learning begins to manifest competitive advantages.

Algorithmic Trading Decision Making +5

ReAct: Synergizing Reasoning and Acting in Language Models

5 code implementations6 Oct 2022 Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, Yuan Cao

While large language models (LLMs) have demonstrated impressive capabilities across tasks in language understanding and interactive decision making, their abilities for reasoning (e. g. chain-of-thought prompting) and acting (e. g. action plan generation) have primarily been studied as separate topics.

Decision Making Fact Verification +2

Mixture-of-Experts with Expert Choice Routing

no code implementations18 Feb 2022 Yanqi Zhou, Tao Lei, Hanxiao Liu, Nan Du, Yanping Huang, Vincent Zhao, Andrew Dai, Zhifeng Chen, Quoc Le, James Laudon

Prior work allocates a fixed number of experts to each token using a top-k function regardless of the relative importance of different tokens.

ST-MoE: Designing Stable and Transferable Sparse Expert Models

2 code implementations17 Feb 2022 Barret Zoph, Irwan Bello, Sameer Kumar, Nan Du, Yanping Huang, Jeff Dean, Noam Shazeer, William Fedus

But advancing the state-of-the-art across a broad set of natural language tasks has been hindered by training instabilities and uncertain quality during fine-tuning.

Common Sense Reasoning Coreference Resolution +6

Finetuned Language Models Are Zero-Shot Learners

5 code implementations ICLR 2022 Jason Wei, Maarten Bosma, Vincent Y. Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, Quoc V. Le

We show that instruction tuning -- finetuning language models on a collection of tasks described via instructions -- substantially improves zero-shot performance on unseen tasks.

Common Sense Reasoning Coreference Resolution +8

R2D2: Relational Text Decoding with Transformers

no code implementations10 May 2021 Aryan Arbabi, Mingqiu Wang, Laurent El Shafey, Nan Du, Izhak Shafran

The other side ignores the sequential nature of the text by representing them as fixed-dimensional vectors and apply graph neural networks.

Data-to-Text Generation

Learning to Select Best Forecast Tasks for Clinical Outcome Prediction

no code implementations NeurIPS 2020 Yuan Xue, Nan Du, Anne Mottram, Martin Seneviratne, Andrew M. Dai

The paradigm of pretraining' from a set of relevant auxiliary tasks and thenfinetuning' on a target task has been successfully applied in many different domains.

Meta-Learning

Deep Physiological State Space Model for Clinical Forecasting

no code implementations4 Dec 2019 Yuan Xue, Denny Zhou, Nan Du, Andrew Dai, Zhen Xu, Kun Zhang, Claire Cui

Clinical forecasting based on electronic medical records (EMR) can uncover the temporal correlations between patients' conditions and outcomes from sequences of longitudinal clinical measurements.

Learning to Infer Entities, Properties and their Relations from Clinical Conversations

no code implementations IJCNLP 2019 Nan Du, Mingqiu Wang, Linh Tran, Gang Li, Izhak Shafran

Recently we proposed the Span Attribute Tagging (SAT) Model (Du et al., 2019) to infer clinical entities (e. g., symptoms) and their properties (e. g., duration).

Attribute Relation Extraction

Multi-Grained Named Entity Recognition

1 code implementation ACL 2019 Congying Xia, Chenwei Zhang, Tao Yang, Yaliang Li, Nan Du, Xian Wu, Wei Fan, Fenglong Ma, Philip Yu

This paper presents a novel framework, MGNER, for Multi-Grained Named Entity Recognition where multiple entities or entity mentions in a sentence could be non-overlapping or totally nested.

Multi-Grained Named Entity Recognition named-entity-recognition +5

Extracting Symptoms and their Status from Clinical Conversations

no code implementations ACL 2019 Nan Du, Kai Chen, Anjuli Kannan, Linh Tran, Yu-Hui Chen, Izhak Shafran

This paper describes novel models tailored for a new application, that of extracting the symptoms mentioned in clinical conversations along with their status.

Attribute

Entity Synonym Discovery via Multipiece Bilateral Context Matching

1 code implementation31 Dec 2018 Chenwei Zhang, Yaliang Li, Nan Du, Wei Fan, Philip S. Yu

Being able to automatically discover synonymous entities in an open-world setting benefits various tasks such as entity disambiguation or knowledge graph canonicalization.

Entity Disambiguation

Joint Slot Filling and Intent Detection via Capsule Neural Networks

3 code implementations ACL 2019 Chenwei Zhang, Yaliang Li, Nan Du, Wei Fan, Philip S. Yu

Being able to recognize words as slots and detect the intent of an utterance has been a keen issue in natural language understanding.

Intent Detection Natural Language Understanding +1

Multi-Task Learning with Multi-View Attention for Answer Selection and Knowledge Base Question Answering

2 code implementations6 Dec 2018 Yang Deng, Yuexiang Xie, Yaliang Li, Min Yang, Nan Du, Wei Fan, Kai Lei, Ying Shen

Second, these two tasks can benefit each other: answer selection can incorporate the external knowledge from knowledge base (KB), while KBQA can be improved by learning contextual information from answer selection.

Answer Selection Knowledge Base Question Answering +2

Statistical Robust Chinese Remainder Theorem for Multiple Numbers: Wrapped Gaussian Mixture Model

no code implementations28 Nov 2018 Nan Du, Zhikang Wang, Hanshen Xiao

Generalized Chinese Remainder Theorem (CRT) has been shown to be a powerful approach to solve the ambiguity resolution problem.

Clustering

Learning Temporal Point Processes via Reinforcement Learning

no code implementations NeurIPS 2018 Shuang Li, Shuai Xiao, Shixiang Zhu, Nan Du, Yao Xie, Le Song

Social goods, such as healthcare, smart city, and information networks, often produce ordered event data in continuous time.

Point Processes reinforcement-learning +1

Finding Similar Medical Questions from Question Answering Websites

no code implementations14 Oct 2018 Yaliang Li, Liuyi Yao, Nan Du, Jing Gao, Qi Li, Chuishi Meng, Chenwei Zhang, Wei Fan

Patients who have medical information demands tend to post questions about their health conditions on these crowdsourced Q&A websites and get answers from other users.

Question Answering Retrieval

MedTruth: A Semi-supervised Approach to Discovering Knowledge Condition Information from Multi-Source Medical Data

no code implementations27 Sep 2018 Yang Deng, Yaliang Li, Ying Shen, Nan Du, Wei Fan, Min Yang, Kai Lei

In the light of these challenges, we propose a new truth discovery method, MedTruth, for medical knowledge condition discovery, which incorporates prior source quality information into the source reliability estimation procedure, and also utilizes the knowledge triple information for trustworthy information computation.

Databases

SynonymNet: Multi-context Bilateral Matching for Entity Synonyms

no code implementations27 Sep 2018 Chenwei Zhang, Yaliang Li, Nan Du, Wei Fan, Philip S. Yu

Being able to automatically discover synonymous entities from a large free-text corpus has transformative effects on structured knowledge discovery.

AnatomyNet: Deep Learning for Fast and Fully Automated Whole-volume Segmentation of Head and Neck Anatomy

2 code implementations15 Aug 2018 Wentao Zhu, Yufang Huang, Liang Zeng, Xuming Chen, Yong liu, Zhen Qian, Nan Du, Wei Fan, Xiaohui Xie

Methods: Our deep learning model, called AnatomyNet, segments OARs from head and neck CT images in an end-to-end fashion, receiving whole-volume HaN CT images as input and generating masks of all OARs of interest in one shot.

3D Medical Imaging Segmentation Anatomy

Knowledge as A Bridge: Improving Cross-domain Answer Selection with External Knowledge

no code implementations COLING 2018 Yang Deng, Ying Shen, Min Yang, Yaliang Li, Nan Du, Wei Fan, Kai Lei

In this paper, we propose Knowledge-aware Attentive Network (KAN), a transfer learning framework for cross-domain answer selection, which uses the knowledge base as a bridge to enable knowledge transfer from the source domain to the target domains.

Answer Selection Information Retrieval +2

Cooperative Denoising for Distantly Supervised Relation Extraction

no code implementations COLING 2018 Kai Lei, Daoyuan Chen, Yaliang Li, Nan Du, Min Yang, Wei Fan, Ying Shen

Distantly supervised relation extraction greatly reduces human efforts in extracting relational facts from unstructured texts.

Denoising Information Retrieval +4

Generative Discovery of Relational Medical Entity Pairs

no code implementations ICLR 2018 Chenwei Zhang, Yaliang Li, Nan Du, Wei Fan, Philip S. Yu

Online healthcare services can provide the general public with ubiquitous access to medical knowledge and reduce the information access cost for both individuals and societies.

Bringing Semantic Structures to User Intent Detection in Online Medical Queries

no code implementations22 Oct 2017 Chenwei Zhang, Nan Du, Wei Fan, Yaliang Li, Chun-Ta Lu, Philip S. Yu

The healthcare status, complex medical information needs of patients are expressed diversely and implicitly in their medical text queries.

Intent Detection Multi-Task Learning +1

Time-Dependent Representation for Neural Event Sequence Prediction

no code implementations ICLR 2018 Yang Li, Nan Du, Samy Bengio

Because neural sequence models such as RNN are more amenable for handling token-like input, we propose two methods for time-dependent event representation, based on the intuition on how time is tokenized in everyday life and previous work on embedding contextualization.

Sentence

Scalable Influence Maximization for Multiple Products in Continuous-Time Diffusion Networks

no code implementations8 Dec 2016 Nan Du, YIngyu Liang, Maria-Florina Balcan, Manuel Gomez-Rodriguez, Hongyuan Zha, Le Song

A typical viral marketing model identifies influential users in a social network to maximize a single product adoption assuming unlimited user attention, campaign budgets, and time.

Marketing

Variational hybridization and transformation for large inaccurate noisy-or networks

no code implementations20 May 2016 Yusheng Xie, Nan Du, Wei Fan, Jing Zhai, Weicheng Zhu

In addition, we propose a transformation ranking algorithm that is very stable to large variances in network prior probabilities, a common issue that arises in medical applications of Bayesian networks.

Variational Inference

Time-Sensitive Recommendation From Recurrent User Activities

no code implementations NeurIPS 2015 Nan Du, Yichen Wang, Niao He, Jimeng Sun, Le Song

By making personalized suggestions, a recommender system is playing a crucial role in improving the engagement of users in modern web-services.

Point Processes Recommendation Systems

Learning Time-Varying Coverage Functions

no code implementations NeurIPS 2014 Nan Du, YIngyu Liang, Maria-Florina F. Balcan, Le Song

Coverage functions are an important class of discrete functions that capture laws of diminishing returns.

Shaping Social Activity by Incentivizing Users

no code implementations NeurIPS 2014 Mehrdad Farajtabar, Nan Du, Manuel Gomez Rodriguez, Isabel Valera, Hongyuan Zha, Le Song

Events in an online social network can be categorized roughly into endogenous events, where users just respond to the actions of their neighbors within the network, or exogenous events, where users take actions due to drives external to the network.

Budgeted Influence Maximization for Multiple Products

no code implementations8 Dec 2013 Nan Du, YIngyu Liang, Maria Florina Balcan, Le Song

The typical algorithmic problem in viral marketing aims to identify a set of influential users in a social network, who, when convinced to adopt a product, shall influence other users in the network and trigger a large cascade of adoptions.

Combinatorial Optimization Marketing

Scalable Influence Estimation in Continuous-Time Diffusion Networks

no code implementations NeurIPS 2013 Nan Du, Le Song, Manuel Gomez Rodriguez, Hongyuan Zha

If a piece of information is released from a media site, can it spread, in 1 month, to a million web pages?

Learning Networks of Heterogeneous Influence

no code implementations NeurIPS 2012 Nan Du, Le Song, Ming Yuan, Alex J. Smola

However, the underlying transmission networks are often hidden and incomplete, and we observe only the time stamps when cascades of events happen.

Cannot find the paper you are looking for? You can Submit a new open access paper.