Search Results for author: Jerry Wei

Found 14 papers, 6 papers with code

Best Practices and Lessons Learned on Synthetic Data

no code implementations11 Apr 2024 Ruibo Liu, Jerry Wei, Fangyu Liu, Chenglei Si, Yanzhe Zhang, Jinmeng Rao, Steven Zheng, Daiyi Peng, Diyi Yang, Denny Zhou, Andrew M. Dai

The success of AI models relies on the availability of large, diverse, and high-quality datasets, which can be challenging to obtain due to data scarcity, privacy concerns, and high costs.

Long-form factuality in large language models

2 code implementations27 Mar 2024 Jerry Wei, Chengrun Yang, Xinying Song, Yifeng Lu, Nathan Hu, Jie Huang, Dustin Tran, Daiyi Peng, Ruibo Liu, Da Huang, Cosmo Du, Quoc V. Le

Empirically, we demonstrate that LLM agents can outperform crowdsourced human annotators - on a set of ~16k individual facts, SAFE agrees with crowdsourced human annotators 72% of the time, and on a random subset of 100 disagreement cases, SAFE wins 76% of the time.

16k

Non-robustness of diffusion estimates on networks with measurement error

no code implementations8 Mar 2024 Arun G. Chandrasekhar, Paul Goldsmith-Pinkham, Tyler H. McCormick, Samuel Thau, Jerry Wei

First, we show that even when measurement error is vanishingly small, such that the share of missed links is close to zero, forecasts about the extent of diffusion will greatly underestimate the truth.

Marketing

FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation

2 code implementations5 Oct 2023 Tu Vu, Mohit Iyyer, Xuezhi Wang, Noah Constant, Jerry Wei, Jason Wei, Chris Tar, Yun-Hsuan Sung, Denny Zhou, Quoc Le, Thang Luong

Specifically, we introduce FreshQA, a novel dynamic QA benchmark encompassing a diverse range of question and answer types, including questions that require fast-changing world knowledge as well as questions with false premises that need to be debunked.

Hallucination World Knowledge

Simple synthetic data reduces sycophancy in large language models

1 code implementation7 Aug 2023 Jerry Wei, Da Huang, Yifeng Lu, Denny Zhou, Quoc V. Le

Adding these data in a lightweight finetuning step can significantly reduce sycophantic behavior on held-out prompts.

Symbol tuning improves in-context learning in language models

no code implementations15 May 2023 Jerry Wei, Le Hou, Andrew Lampinen, Xiangning Chen, Da Huang, Yi Tay, Xinyun Chen, Yifeng Lu, Denny Zhou, Tengyu Ma, Quoc V. Le

We present symbol tuning - finetuning language models on in-context input-label pairs where natural language labels (e. g., "positive/negative sentiment") are replaced with arbitrary symbols (e. g., "foo/bar").

In-Context Learning

Larger language models do in-context learning differently

no code implementations7 Mar 2023 Jerry Wei, Jason Wei, Yi Tay, Dustin Tran, Albert Webson, Yifeng Lu, Xinyun Chen, Hanxiao Liu, Da Huang, Denny Zhou, Tengyu Ma

We next study semantically-unrelated label ICL (SUL-ICL), in which labels are semantically unrelated to their inputs (e. g., foo/bar instead of negative/positive), thereby forcing language models to learn the input-label mappings shown in in-context exemplars in order to perform the task.

In-Context Learning

Calibrating Histopathology Image Classifiers using Label Smoothing

no code implementations28 Jan 2022 Jerry Wei, Lorenzo Torresani, Jason Wei, Saeed Hassanpour

Moreover, we find that using model confidence as a proxy for annotator agreement also improves calibration and accuracy, suggesting that datasets without multiple annotators can still benefit from our proposed label smoothing methods via our proposed confidence-aware label smoothing methods.

Classification Image Classification

A Petri Dish for Histopathology Image Analysis

no code implementations29 Jan 2021 Jerry Wei, Arief Suriawinata, Bing Ren, Xiaoying Liu, Mikhail Lisovsky, Louis Vaickus, Charles Brown, Michael Baker, Naofumi Tomita, Lorenzo Torresani, Jason Wei, Saeed Hassanpour

With the rise of deep learning, there has been increased interest in using neural networks for histopathology image analysis, a field that investigates the properties of biopsy or resected specimens traditionally manually examined under a microscope by pathologists.

Binary Classification Natural Questions +1

Learn like a Pathologist: Curriculum Learning by Annotator Agreement for Histopathology Image Classification

no code implementations29 Sep 2020 Jerry Wei, Arief Suriawinata, Bing Ren, Xiaoying Liu, Mikhail Lisovsky, Louis Vaickus, Charles Brown, Michael Baker, Mustafa Nasir-Moin, Naofumi Tomita, Lorenzo Torresani, Jason Wei, Saeed Hassanpour

Based on the nature of histopathology images, a range of difficulty inherently exists among examples, and, since medical datasets are often labeled by multiple annotators, annotator agreement can be used as a natural proxy for the difficulty of a given example.

General Classification Image Classification

NewB: 200,000+ Sentences for Political Bias Detection

no code implementations4 Jun 2020 Jerry Wei

We present the Newspaper Bias Dataset (NewB), a text corpus of more than 200, 000 sentences from eleven news sources regarding Donald Trump.

Bias Detection Binary Classification +1

What Are People Asking About COVID-19? A Question Classification Dataset

2 code implementations ACL 2020 Jerry Wei, Chengyu Huang, Soroush Vosoughi, Jason Wei

We present COVID-Q, a set of 1, 690 questions about COVID-19 from 13 sources, which we annotate into 15 question categories and 207 question clusters.

Clustering General Classification +1

Difficulty Translation in Histopathology Images

1 code implementation27 Apr 2020 Jerry Wei, Arief Suriawinata, Xiaoying Liu, Bing Ren, Mustafa Nasir-Moin, Naofumi Tomita, Jason Wei, Saeed Hassanpour

Our model comprises a scorer, which provides an output confidence to measure the difficulty of images, and an image translator, which learns to translate images from easy-to-classify to hard-to-classify using a training set defined by the scorer.

BIG-bench Machine Learning Translation

Generative Image Translation for Data Augmentation in Colorectal Histopathology Images

1 code implementation13 Oct 2019 Jerry Wei, Arief Suriawinata, Louis Vaickus, Bing Ren, Xiaoying Liu, Jason Wei, Saeed Hassanpour

We present an image translation approach to generate augmented data for mitigating data imbalances in a dataset of histopathology images of colorectal polyps, adenomatous tumors that can lead to colorectal cancer if left untreated.

Data Augmentation Image Classification +1

Cannot find the paper you are looking for? You can Submit a new open access paper.