Search Results for author: Nan Ding

Found 31 papers, 12 papers with code

CausalLM is not optimal for in-context learning

1 code implementation14 Aug 2023 Nan Ding, Tomer Levinboim, Jialin Wu, Sebastian Goodman, Radu Soricut

Recent empirical evidence indicates that transformer based in-context learning performs better when using a prefix language model (prefixLM), in which in-context samples can all attend to each other, compared to causal language models (causalLM), which use auto-regressive attention that prohibits in-context samples to attend to future samples.

In-Context Learning Language Modelling

Improving Robust Generalization by Direct PAC-Bayesian Bound Minimization

no code implementations CVPR 2023 Zifan Wang, Nan Ding, Tomer Levinboim, Xi Chen, Radu Soricut

Recent research in robust optimization has shown an overfitting-like phenomenon in which models trained against adversarial attacks exhibit higher robustness on the training set compared to the test set.

Adversarial Robustness

All You May Need for VQA are Image Captions

2 code implementations NAACL 2022 Soravit Changpinyo, Doron Kukliansky, Idan Szpektor, Xi Chen, Nan Ding, Radu Soricut

Visual Question Answering (VQA) has benefited from increasingly sophisticated models, but has not enjoyed the same level of engagement in terms of data creation.

Image Captioning Question Answering +3

PACTran: PAC-Bayesian Metrics for Estimating the Transferability of Pretrained Models to Classification Tasks

1 code implementation10 Mar 2022 Nan Ding, Xi Chen, Tomer Levinboim, Beer Changpinyo, Radu Soricut

With the increasing abundance of pretrained models in recent years, the problem of selecting the best pretrained checkpoint for a particular downstream classification task has been gaining increased attention.

Learning Theory Model Selection +2

Bridging the Gap Between Practice and PAC-Bayes Theory in Few-Shot Meta-Learning

no code implementations NeurIPS 2021 Nan Ding, Xi Chen, Tomer Levinboim, Sebastian Goodman, Radu Soricut

Despite recent advances in its theoretical understanding, there still remains a significant gap in the ability of existing PAC-Bayesian theories on meta-learning to explain performance improvements in the few-shot learning setting, where the number of training examples in the target tasks is severely limited.

Few-Shot Learning

Do Transformer Modifications Transfer Across Implementations and Applications?

1 code implementation EMNLP 2021 Sharan Narang, Hyung Won Chung, Yi Tay, William Fedus, Thibault Fevry, Michael Matena, Karishma Malkan, Noah Fiedel, Noam Shazeer, Zhenzhong Lan, Yanqi Zhou, Wei Li, Nan Ding, Jake Marcus, Adam Roberts, Colin Raffel

The research community has proposed copious modifications to the Transformer architecture since it was introduced over three years ago, relatively few of which have seen widespread adoption.

Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts

3 code implementations CVPR 2021 Soravit Changpinyo, Piyush Sharma, Nan Ding, Radu Soricut

The availability of large-scale image captioning and visual question answering datasets has contributed significantly to recent successes in vision-and-language pre-training.

Caption Generation Image Captioning +2

Multi-wavelength Selected Compton-thick AGNs in Chandra Deep Field-South Survey

no code implementations6 Dec 2020 Xiaotong Guo, Qiusheng Gu, Nan Ding, Xiaoling Yu, Yongyun Chen

We also find that CT AGNs have a higher Eddington ratio than non-CT AGNs, and that both CT AGNs and non-CT AGNs show similar properties of host galaxies.

Astrophysics of Galaxies High Energy Astrophysical Phenomena

Improving Text Generation Evaluation with Batch Centering and Tempered Word Mover Distance

no code implementations EMNLP (Eval4NLP) 2020 Xi Chen, Nan Ding, Tomer Levinboim, Radu Soricut

Recent advances in automatic evaluation metrics for text have shown that deep contextualized word representations, such as those generated by BERT encoders, are helpful for designing metrics that correlate well with human judgements.

Text Generation

TeaForN: Teacher-Forcing with N-grams

no code implementations EMNLP 2020 Sebastian Goodman, Nan Ding, Radu Soricut

Sequence generation models trained with teacher-forcing suffer from issues related to exposure bias and lack of differentiability across timesteps.

Decoder Machine Translation +2

Attention that does not Explain Away

no code implementations29 Sep 2020 Nan Ding, Xinjie Fan, Zhenzhong Lan, Dale Schuurmans, Radu Soricut

Models based on the Transformer architecture have achieved better accuracy than the ones based on competing architectures for a large set of tasks.

Talking-Heads Attention

4 code implementations5 Mar 2020 Noam Shazeer, Zhenzhong Lan, Youlong Cheng, Nan Ding, Le Hou

We introduce "talking-heads attention" - a variation on multi-head attention which includes linearprojections across the attention-heads dimension, immediately before and after the softmax operation. While inserting only a small number of additional parameters and a moderate amount of additionalcomputation, talking-heads attention leads to better perplexities on masked language modeling tasks, aswell as better quality when transfer-learning to language comprehension and question answering tasks.

Language Modelling Masked Language Modeling +2

LOGAN: High-Performance GPU-Based X-Drop Long-Read Alignment

1 code implementation12 Feb 2020 Alberto Zeni, Giulia Guidi, Marquita Ellis, Nan Ding, Marco D. Santambrogio, Steven Hofmeyr, Aydın Buluç, Leonid Oliker, Katherine Yelick

To highlight the impact of our work on a real-world application, we couple LOGAN with a many-to-many long-read alignment software called BELLA, and demonstrate that our implementation improves the overall BELLA runtime by up to 10. 6x.

Vocal Bursts Intensity Prediction

iqiyi Submission to ActivityNet Challenge 2019 Kinetics-700 challenge: Hierarchical Group-wise Attention

no code implementations7 Feb 2020 Qian Liu, Dongyang Cai, Jie Liu, Nan Ding, Tao Wang

The standard non-local (NL) module is effective in aggregating frame-level features on the task of video classification but presents low parameters efficiency and high computational cost.

General Classification Video Classification

Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning

2 code implementations ACL 2018 Piyush Sharma, Nan Ding, Sebastian Goodman, Radu Soricut

We present a new dataset of image caption annotations, Conceptual Captions, which contains an order of magnitude more images than the MS-COCO dataset (Lin et al., 2014) and represents a wider variety of both images and image caption styles.

Image Captioning

SHAPED: Shared-Private Encoder-Decoder for Text Style Adaptation

no code implementations NAACL 2018 Ye Zhang, Nan Ding, Radu Soricut

Supervised training of abstractive language generation models results in learning conditional probabilities over language sequences based on the supervised training signal.

Decoder Text Generation

Cold-Start Reinforcement Learning with Softmax Policy Gradient

1 code implementation NeurIPS 2017 Nan Ding, Radu Soricut

Policy-gradient approaches to reinforcement learning have two common and undesirable overhead procedures, namely warm-start training and sample variance reduction.

Image Captioning Policy Gradient Methods +2

Understanding Image and Text Simultaneously: a Dual Vision-Language Machine Comprehension Task

no code implementations22 Dec 2016 Nan Ding, Sebastian Goodman, Fei Sha, Radu Soricut

We introduce a new multi-modal task for computer systems, posed as a combined vision-language comprehension challenge: identifying the most suitable text describing a scene, given several similar options.

Image Captioning Multi-Task Learning +1

Multilingual Word Embeddings using Multigraphs

no code implementations14 Dec 2016 Radu Soricut, Nan Ding

We present a family of neural-network--inspired models for computing continuous word representations, specifically designed to exploit both monolingual and multilingual text.

Machine Translation Multilingual Word Embeddings +3

Building Large Machine Reading-Comprehension Datasets using Paragraph Vectors

1 code implementation13 Dec 2016 Radu Soricut, Nan Ding

We present a dual contribution to the task of machine reading-comprehension: a technique for creating large-sized machine-comprehension (MC) datasets using paragraph-vector models; and a novel, hybrid neural-network architecture that combines the representation power of recurrent neural networks with the discriminative power of fully-connected multi-layered networks.

Machine Reading Comprehension

Stochastic Gradient MCMC with Stale Gradients

no code implementations NeurIPS 2016 Changyou Chen, Nan Ding, Chunyuan Li, Yizhe Zhang, Lawrence Carin

In this paper we develop theory to show that while the bias and MSE of an SG-MCMC algorithm depend on the staleness of stochastic gradients, its estimation variance (relative to the expected estimate, based on a prescribed number of samples) is independent of it.

On the Convergence of Stochastic Gradient MCMC Algorithms with High-Order Integrators

no code implementations NeurIPS 2015 Changyou Chen, Nan Ding, Lawrence Carin

Our theoretical results show faster convergence rates and more accurate invariant measures for SG-MCMCs with higher-order integrators.

Characterizing Quantum Supremacy in Near-Term Devices

2 code implementations31 Jul 2016 Sergio Boixo, Sergei V. Isakov, Vadim N. Smelyanskiy, Ryan Babbush, Nan Ding, Zhang Jiang, Michael J. Bremner, John M. Martinis, Hartmut Neven

We study the task of sampling from the output distributions of (pseudo-)random quantum circuits, a natural task for benchmarking quantum computers.

Quantum Physics

Embedding Inference for Structured Multilabel Prediction

no code implementations NeurIPS 2015 Farzaneh Mirzazadeh, Siamak Ravanbakhsh, Nan Ding, Dale Schuurmans

A key bottleneck in structured output prediction is the need for inference during training and testing, usually requiring some form of dynamic programming.

Totally Corrective Boosting with Cardinality Penalization

no code implementations7 Apr 2015 Vasil S. Denchev, Nan Ding, Shin Matsushima, S. V. N. Vishwanathan, Hartmut Neven

If actual quantum optimization were to be used with this algorithm in the future, we would expect equivalent or superior results at much smaller time and energy costs during training.

Benchmarking Combinatorial Optimization

Probabilistic Label Relation Graphs with Ising Models

no code implementations ICCV 2015 Nan Ding, Jia Deng, Kevin Murphy, Hartmut Neven

In this paper, we extend the HEX model to allow for soft or probabilistic relations between labels, which is useful when there is uncertainty about the relationship between two labels (e. g., an antelope is "sort of" furry, but not to the same degree as a grizzly bear).

General Classification Relation

Bayesian Sampling Using Stochastic Gradient Thermostats

no code implementations NeurIPS 2014 Nan Ding, Youhan Fang, Ryan Babbush, Changyou Chen, Robert D. Skeel, Hartmut Neven

To remedy this problem, we show that one can leverage a small number of additional variables in order to stabilize momentum fluctuations induced by the unknown noise.

Construction of non-convex polynomial loss functions for training a binary classifier with quantum annealing

no code implementations17 Jun 2014 Ryan Babbush, Vasil Denchev, Nan Ding, Sergei Isakov, Hartmut Neven

Quantum annealing is a heuristic quantum algorithm which exploits quantum resources to minimize an objective function embedded as the energy levels of a programmable physical system.

t-divergence Based Approximate Inference

no code implementations NeurIPS 2011 Nan Ding, Yuan Qi, S. V. N. Vishwanathan

Approximate inference is an important technique for dealing with large, intractable graphical models based on the exponential family of distributions.

t-logistic regression

no code implementations NeurIPS 2010 Nan Ding, S. V. N. Vishwanathan

We extend logistic regression by using t-exponential families which were introduced recently in statistical physics.


Cannot find the paper you are looking for? You can Submit a new open access paper.