no code implementations • 11 Apr 2024 • Ruibo Liu, Jerry Wei, Fangyu Liu, Chenglei Si, Yanzhe Zhang, Jinmeng Rao, Steven Zheng, Daiyi Peng, Diyi Yang, Denny Zhou, Andrew M. Dai
The success of AI models relies on the availability of large, diverse, and high-quality datasets, which can be challenging to obtain due to data scarcity, privacy concerns, and high costs.
2 code implementations • 27 Mar 2024 • Jerry Wei, Chengrun Yang, Xinying Song, Yifeng Lu, Nathan Hu, Jie Huang, Dustin Tran, Daiyi Peng, Ruibo Liu, Da Huang, Cosmo Du, Quoc V. Le
Empirically, we demonstrate that LLM agents can outperform crowdsourced human annotators - on a set of ~16k individual facts, SAFE agrees with crowdsourced human annotators 72% of the time, and on a random subset of 100 disagreement cases, SAFE wins 76% of the time.
no code implementations • 8 Mar 2024 • Arun G. Chandrasekhar, Paul Goldsmith-Pinkham, Tyler H. McCormick, Samuel Thau, Jerry Wei
First, we show that even when measurement error is vanishingly small, such that the share of missed links is close to zero, forecasts about the extent of diffusion will greatly underestimate the truth.
2 code implementations • 5 Oct 2023 • Tu Vu, Mohit Iyyer, Xuezhi Wang, Noah Constant, Jerry Wei, Jason Wei, Chris Tar, Yun-Hsuan Sung, Denny Zhou, Quoc Le, Thang Luong
Specifically, we introduce FreshQA, a novel dynamic QA benchmark encompassing a diverse range of question and answer types, including questions that require fast-changing world knowledge as well as questions with false premises that need to be debunked.
1 code implementation • 7 Aug 2023 • Jerry Wei, Da Huang, Yifeng Lu, Denny Zhou, Quoc V. Le
Adding these data in a lightweight finetuning step can significantly reduce sycophantic behavior on held-out prompts.
no code implementations • 15 May 2023 • Jerry Wei, Le Hou, Andrew Lampinen, Xiangning Chen, Da Huang, Yi Tay, Xinyun Chen, Yifeng Lu, Denny Zhou, Tengyu Ma, Quoc V. Le
We present symbol tuning - finetuning language models on in-context input-label pairs where natural language labels (e. g., "positive/negative sentiment") are replaced with arbitrary symbols (e. g., "foo/bar").
no code implementations • 7 Mar 2023 • Jerry Wei, Jason Wei, Yi Tay, Dustin Tran, Albert Webson, Yifeng Lu, Xinyun Chen, Hanxiao Liu, Da Huang, Denny Zhou, Tengyu Ma
We next study semantically-unrelated label ICL (SUL-ICL), in which labels are semantically unrelated to their inputs (e. g., foo/bar instead of negative/positive), thereby forcing language models to learn the input-label mappings shown in in-context exemplars in order to perform the task.
no code implementations • 28 Jan 2022 • Jerry Wei, Lorenzo Torresani, Jason Wei, Saeed Hassanpour
Moreover, we find that using model confidence as a proxy for annotator agreement also improves calibration and accuracy, suggesting that datasets without multiple annotators can still benefit from our proposed label smoothing methods via our proposed confidence-aware label smoothing methods.
no code implementations • 29 Jan 2021 • Jerry Wei, Arief Suriawinata, Bing Ren, Xiaoying Liu, Mikhail Lisovsky, Louis Vaickus, Charles Brown, Michael Baker, Naofumi Tomita, Lorenzo Torresani, Jason Wei, Saeed Hassanpour
With the rise of deep learning, there has been increased interest in using neural networks for histopathology image analysis, a field that investigates the properties of biopsy or resected specimens traditionally manually examined under a microscope by pathologists.
no code implementations • 29 Sep 2020 • Jerry Wei, Arief Suriawinata, Bing Ren, Xiaoying Liu, Mikhail Lisovsky, Louis Vaickus, Charles Brown, Michael Baker, Mustafa Nasir-Moin, Naofumi Tomita, Lorenzo Torresani, Jason Wei, Saeed Hassanpour
Based on the nature of histopathology images, a range of difficulty inherently exists among examples, and, since medical datasets are often labeled by multiple annotators, annotator agreement can be used as a natural proxy for the difficulty of a given example.
no code implementations • 4 Jun 2020 • Jerry Wei
We present the Newspaper Bias Dataset (NewB), a text corpus of more than 200, 000 sentences from eleven news sources regarding Donald Trump.
2 code implementations • ACL 2020 • Jerry Wei, Chengyu Huang, Soroush Vosoughi, Jason Wei
We present COVID-Q, a set of 1, 690 questions about COVID-19 from 13 sources, which we annotate into 15 question categories and 207 question clusters.
1 code implementation • 27 Apr 2020 • Jerry Wei, Arief Suriawinata, Xiaoying Liu, Bing Ren, Mustafa Nasir-Moin, Naofumi Tomita, Jason Wei, Saeed Hassanpour
Our model comprises a scorer, which provides an output confidence to measure the difficulty of images, and an image translator, which learns to translate images from easy-to-classify to hard-to-classify using a training set defined by the scorer.
1 code implementation • 13 Oct 2019 • Jerry Wei, Arief Suriawinata, Louis Vaickus, Bing Ren, Xiaoying Liu, Jason Wei, Saeed Hassanpour
We present an image translation approach to generate augmented data for mitigating data imbalances in a dataset of histopathology images of colorectal polyps, adenomatous tumors that can lead to colorectal cancer if left untreated.