Search Results for author: Jerry Wei

Found 14 papers, 6 papers with code

Best Practices and Lessons Learned on Synthetic Data for Language Models

no code implementations • 11 Apr 2024 • Ruibo Liu, Jerry Wei, Fangyu Liu, Chenglei Si, Yanzhe Zhang, Jinmeng Rao, Steven Zheng, Daiyi Peng, Diyi Yang, Denny Zhou, Andrew M. Dai

The success of AI models relies on the availability of large, diverse, and high-quality datasets, which can be challenging to obtain due to data scarcity, privacy concerns, and high costs.

Paper
Add Code

Long-form factuality in large language models

2 code implementations • 27 Mar 2024 • Jerry Wei, Chengrun Yang, Xinying Song, Yifeng Lu, Nathan Hu, Jie Huang, Dustin Tran, Daiyi Peng, Ruibo Liu, Da Huang, Cosmo Du, Quoc V. Le

Empirically, we demonstrate that LLM agents can outperform crowdsourced human annotators - on a set of ~16k individual facts, SAFE agrees with crowdsourced human annotators 72% of the time, and on a random subset of 100 disagreement cases, SAFE wins 76% of the time.

16k

423

Paper
Code

Non-robustness of diffusion estimates on networks with measurement error

no code implementations • 8 Mar 2024 • Arun G. Chandrasekhar, Paul Goldsmith-Pinkham, Tyler H. McCormick, Samuel Thau, Jerry Wei

First, we show that even when measurement error is vanishingly small, such that the share of missed links is close to zero, forecasts about the extent of diffusion will greatly underestimate the truth.

Marketing

Paper
Add Code

FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation

1 code implementation • 5 Oct 2023 • Tu Vu, Mohit Iyyer, Xuezhi Wang, Noah Constant, Jerry Wei, Jason Wei, Chris Tar, Yun-Hsuan Sung, Denny Zhou, Quoc Le, Thang Luong

Specifically, we introduce FreshQA, a novel dynamic QA benchmark encompassing a diverse range of question and answer types, including questions that require fast-changing world knowledge as well as questions with false premises that need to be debunked.

Hallucination World Knowledge

265

Paper
Code

Simple synthetic data reduces sycophancy in large language models

1 code implementation • 7 Aug 2023 • Jerry Wei, Da Huang, Yifeng Lu, Denny Zhou, Quoc V. Le

Adding these data in a lightweight finetuning step can significantly reduce sycophantic behavior on held-out prompts.

Paper
Code

Symbol tuning improves in-context learning in language models

no code implementations • 15 May 2023 • Jerry Wei, Le Hou, Andrew Lampinen, Xiangning Chen, Da Huang, Yi Tay, Xinyun Chen, Yifeng Lu, Denny Zhou, Tengyu Ma, Quoc V. Le

We present symbol tuning - finetuning language models on in-context input-label pairs where natural language labels (e. g., "positive/negative sentiment") are replaced with arbitrary symbols (e. g., "foo/bar").

In-Context Learning

Paper
Add Code

Larger language models do in-context learning differently

no code implementations • 7 Mar 2023 • Jerry Wei, Jason Wei, Yi Tay, Dustin Tran, Albert Webson, Yifeng Lu, Xinyun Chen, Hanxiao Liu, Da Huang, Denny Zhou, Tengyu Ma

We next study semantically-unrelated label ICL (SUL-ICL), in which labels are semantically unrelated to their inputs (e. g., foo/bar instead of negative/positive), thereby forcing language models to learn the input-label mappings shown in in-context exemplars in order to perform the task.

In-Context Learning

Paper
Add Code

Calibrating Histopathology Image Classifiers using Label Smoothing

no code implementations • 28 Jan 2022 • Jerry Wei, Lorenzo Torresani, Jason Wei, Saeed Hassanpour

Moreover, we find that using model confidence as a proxy for annotator agreement also improves calibration and accuracy, suggesting that datasets without multiple annotators can still benefit from our proposed label smoothing methods via our proposed confidence-aware label smoothing methods.

Classification Image Classification

Paper
Add Code

A Petri Dish for Histopathology Image Analysis

no code implementations • 29 Jan 2021 • Jerry Wei, Arief Suriawinata, Bing Ren, Xiaoying Liu, Mikhail Lisovsky, Louis Vaickus, Charles Brown, Michael Baker, Naofumi Tomita, Lorenzo Torresani, Jason Wei, Saeed Hassanpour

With the rise of deep learning, there has been increased interest in using neural networks for histopathology image analysis, a field that investigates the properties of biopsy or resected specimens traditionally manually examined under a microscope by pathologists.

Binary Classification Natural Questions +1

Paper
Add Code

Learn like a Pathologist: Curriculum Learning by Annotator Agreement for Histopathology Image Classification

no code implementations • 29 Sep 2020 • Jerry Wei, Arief Suriawinata, Bing Ren, Xiaoying Liu, Mikhail Lisovsky, Louis Vaickus, Charles Brown, Michael Baker, Mustafa Nasir-Moin, Naofumi Tomita, Lorenzo Torresani, Jason Wei, Saeed Hassanpour

Based on the nature of histopathology images, a range of difficulty inherently exists among examples, and, since medical datasets are often labeled by multiple annotators, annotator agreement can be used as a natural proxy for the difficulty of a given example.

General Classification Image Classification

Paper
Add Code

NewB: 200,000+ Sentences for Political Bias Detection

no code implementations • 4 Jun 2020 • Jerry Wei

We present the Newspaper Bias Dataset (NewB), a text corpus of more than 200, 000 sentences from eleven news sources regarding Donald Trump.

Bias Detection Binary Classification +1

Paper
Add Code

What Are People Asking About COVID-19? A Question Classification Dataset

2 code implementations • ACL 2020 • Jerry Wei, Chengyu Huang, Soroush Vosoughi, Jason Wei

We present COVID-Q, a set of 1, 690 questions about COVID-19 from 13 sources, which we annotate into 15 question categories and 207 question clusters.

Clustering General Classification

Paper
Code

Difficulty Translation in Histopathology Images

1 code implementation • 27 Apr 2020 • Jerry Wei, Arief Suriawinata, Xiaoying Liu, Bing Ren, Mustafa Nasir-Moin, Naofumi Tomita, Jason Wei, Saeed Hassanpour

Our model comprises a scorer, which provides an output confidence to measure the difficulty of images, and an image translator, which learns to translate images from easy-to-classify to hard-to-classify using a training set defined by the scorer.

BIG-bench Machine Learning Translation

Paper
Code

Generative Image Translation for Data Augmentation in Colorectal Histopathology Images

1 code implementation • 13 Oct 2019 • Jerry Wei, Arief Suriawinata, Louis Vaickus, Bing Ren, Xiaoying Liu, Jason Wei, Saeed Hassanpour

We present an image translation approach to generate augmented data for mitigating data imbalances in a dataset of histopathology images of colorectal polyps, adenomatous tumors that can lead to colorectal cancer if left untreated.

Data Augmentation Image Classification +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.