Search Results for author: Dan Iter

Found 19 papers, 8 papers with code

Entity Attribute Relation Extraction with Attribute-Aware Embeddings

no code implementations EMNLP (DeeLIO) 2020 Dan Iter, Xiao Yu, Fangtao Li

Entity-attribute relations are a fundamental component for building large-scale knowledge bases, which are widely employed in modern search engines.

Attribute Attribute Extraction +2

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

no code implementations22 Apr 2024 Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Qin Cai, Martin Cai, Caio César Teodoro Mendes, Weizhu Chen, Vishrav Chaudhary, Dong Chen, Dongdong Chen, Yen-Chun Chen, Yi-Ling Chen, Parul Chopra, Xiyang Dai, Allie Del Giorno, Gustavo de Rosa, Matthew Dixon, Ronen Eldan, Victor Fragoso, Dan Iter, Mei Gao, Min Gao, Jianfeng Gao, Amit Garg, Abhishek Goswami, Suriya Gunasekar, Emman Haider, Junheng Hao, Russell J. Hewett, Jamie Huynh, Mojan Javaheripi, Xin Jin, Piero Kauffmann, Nikos Karampatziakis, Dongwoo Kim, Mahoud Khademi, Lev Kurilenko, James R. Lee, Yin Tat Lee, Yuanzhi Li, Yunsheng Li, Chen Liang, Lars Liden, Ce Liu, Mengchen Liu, Weishung Liu, Eric Lin, Zeqi Lin, Chong Luo, Piyush Madan, Matt Mazzola, Arindam Mitra, Hardik Modi, Anh Nguyen, Brandon Norick, Barun Patra, Daniel Perez-Becker, Thomas Portet, Reid Pryzant, Heyang Qin, Marko Radmilac, Corby Rosset, Sambudha Roy, Olatunji Ruwase, Olli Saarikivi, Amin Saied, Adil Salim, Michael Santacroce, Shital Shah, Ning Shang, Hiteshi Sharma, Swadheen Shukla, Xia Song, Masahiro Tanaka, Andrea Tupini, Xin Wang, Lijuan Wang, Chunyu Wang, Yu Wang, Rachel Ward, Guanhua Wang, Philipp Witte, Haiping Wu, Michael Wyatt, Bin Xiao, Can Xu, Jiahang Xu, Weijian Xu, Sonali Yadav, Fan Yang, Jianwei Yang, ZiYi Yang, Yifan Yang, Donghan Yu, Lu Yuan, Chengruidong Zhang, Cyril Zhang, Jianwen Zhang, Li Lyna Zhang, Yi Zhang, Yue Zhang, Yunan Zhang, Xiren Zhou

We introduce phi-3-mini, a 3. 8 billion parameter language model trained on 3. 3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3. 5 (e. g., phi-3-mini achieves 69% on MMLU and 8. 38 on MT-bench), despite being small enough to be deployed on a phone.

Language Modelling

The Shifted and The Overlooked: A Task-oriented Investigation of User-GPT Interactions

1 code implementation19 Oct 2023 Siru Ouyang, Shuohang Wang, Yang Liu, Ming Zhong, Yizhu Jiao, Dan Iter, Reid Pryzant, Chenguang Zhu, Heng Ji, Jiawei Han

Recent progress in Large Language Models (LLMs) has produced models that exhibit remarkable performance across a variety of NLP tasks.

Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models

no code implementations19 Oct 2023 Zhihan Zhang, Shuohang Wang, Wenhao Yu, Yichong Xu, Dan Iter, Qingkai Zeng, Yang Liu, Chenguang Zhu, Meng Jiang

Large language models (LLMs) can perform a wide range of tasks by following natural language instructions, without the necessity of task-specific fine-tuning.

In-Context Demonstration Selection with Cross Entropy Difference

1 code implementation24 May 2023 Dan Iter, Reid Pryzant, Ruochen Xu, Shuohang Wang, Yang Liu, Yichong Xu, Chenguang Zhu

Our method is based on the observation that the effectiveness of in-context demonstrations negatively correlates with the perplexity of the test example by a language model that was finetuned on that demonstration.

Language Modelling Text Generation

InheritSumm: A General, Versatile and Compact Summarizer by Distilling from GPT

no code implementations22 May 2023 Yichong Xu, Ruochen Xu, Dan Iter, Yang Liu, Shuohang Wang, Chenguang Zhu, Michael Zeng

While large models such as GPT-3 demonstrate exceptional performance in zeroshot and fewshot summarization tasks, their extensive serving and fine-tuning costs hinder their utilization in various applications.

LMGQS: A Large-scale Dataset for Query-focused Summarization

no code implementations22 May 2023 Ruochen Xu, Song Wang, Yang Liu, Shuohang Wang, Yichong Xu, Dan Iter, Chenguang Zhu, Michael Zeng

We hypothesize that there is a hidden query for each summary sentence in a generic summarization annotation, and we utilize a large-scale pretrained language model to recover it.

Language Modelling Query-focused Summarization +1

Automatic Prompt Optimization with "Gradient Descent" and Beam Search

4 code implementations4 May 2023 Reid Pryzant, Dan Iter, Jerry Li, Yin Tat Lee, Chenguang Zhu, Michael Zeng

Large Language Models (LLMs) have shown impressive performance as general purpose agents, but their abilities remain highly dependent on prompts which are hand written with onerous trial-and-error effort.

G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment

2 code implementations29 Mar 2023 Yang Liu, Dan Iter, Yichong Xu, Shuohang Wang, Ruochen Xu, Chenguang Zhu

In this work, we present G-Eval, a framework of using large language models with chain-of-thoughts (CoT) and a form-filling paradigm, to assess the quality of NLG outputs.

Dialogue Generation nlg evaluation +1

How Does In-Context Learning Help Prompt Tuning?

no code implementations22 Feb 2023 Simeng Sun, Yang Liu, Dan Iter, Chenguang Zhu, Mohit Iyyer

This motivates the use of parameter-efficient adaptation methods such as prompt tuning (PT), which adds a small number of tunable embeddings to an otherwise frozen model, and in-context learning (ICL), in which demonstrations of the task are provided to the model in natural language without any additional training.

In-Context Learning Text Generation

Generate rather than Retrieve: Large Language Models are Strong Context Generators

2 code implementations21 Sep 2022 Wenhao Yu, Dan Iter, Shuohang Wang, Yichong Xu, Mingxuan Ju, Soumya Sanyal, Chenguang Zhu, Michael Zeng, Meng Jiang

We call our method generate-then-read (GenRead), which first prompts a large language model to generate contextutal documents based on a given question, and then reads the generated documents to produce the final answer.

Language Modelling Large Language Model +1

Focus on what matters: Applying Discourse Coherence Theory to Cross Document Coreference

1 code implementation EMNLP 2021 William Held, Dan Iter, Dan Jurafsky

We model the entities/events in a reader's focus as a neighborhood within a learned latent embedding space which minimizes the distance between mentions and the centroids of their gold coreference clusters.

coreference-resolution Entity Cross-Document Coreference Resolution +2

On the Complementarity of Data Selection and Fine Tuning for Domain Adaptation

no code implementations15 Sep 2021 Dan Iter, David Grangier

Domain adaptation of neural networks commonly relies on three training phases: pretraining, selected data training and then fine tuning.

Domain Generalization Language Modelling +2

Pretraining with Contrastive Sentence Objectives Improves Discourse Performance of Language Models

1 code implementation ACL 2020 Dan Iter, Kelvin Guu, Larry Lansing, Dan Jurafsky

Recent models for unsupervised representation learning of text have employed a number of techniques to improve contextual word representations but have put little focus on discourse-level representations.

Common Sense Reasoning Natural Language Inference +4

FrameIt: Ontology Discovery for Noisy User-Generated Text

no code implementations WS 2018 Dan Iter, Alon Halevy, Wang-Chiew Tan

A common need of NLP applications is to extract structured data from text corpora in order to perform analytics or trigger an appropriate action.

Active Learning Semantic Role Labeling

Automatic Detection of Incoherent Speech for Diagnosing Schizophrenia

no code implementations WS 2018 Dan Iter, Jong Yoon, Dan Jurafsky

Here, we present the first benchmark comparison of previously proposed coherence models for detecting symptoms of schizophrenia and evaluate their performance on a new dataset of recorded interviews between subjects and clinicians.

Sentence Sentence Embedding +2

Socratic Learning: Augmenting Generative Models to Incorporate Latent Subsets in Training Data

no code implementations25 Oct 2016 Paroma Varma, Bryan He, Dan Iter, Peng Xu, Rose Yu, Christopher De Sa, Christopher Ré

Prior work has explored learning accuracies for these sources even without ground truth labels, but they assume that a single accuracy parameter is sufficient to model the behavior of these sources over the entire training set.

Relation Extraction

Omnivore: An Optimizer for Multi-device Deep Learning on CPUs and GPUs

1 code implementation14 Jun 2016 Stefan Hadjis, Ce Zhang, Ioannis Mitliagkas, Dan Iter, Christopher Ré

Given a specification of a convolutional neural network, our goal is to minimize the time to train this model on a cluster of commodity CPUs and GPUs.

Cannot find the paper you are looking for? You can Submit a new open access paper.