1 code implementation • 28 Oct 2024 • Mirac Suzgun, Tayfun Gur, Federico Bianchi, Daniel E. Ho, Thomas Icard, Dan Jurafsky, James Zou
These findings highlight significant concerns about current LMs' ability to reason about truth, belief, and knowledge while emphasizing the need for advancements in these areas before broad deployment in critical sectors.
2 code implementations • 15 Aug 2024 • Andy K. Zhang, Neil Perry, Riya Dulepet, Joey Ji, Justin W. Lin, Eliot Jones, Celeste Menders, Gashon Hussein, Samantha Liu, Donovan Jasper, Pura Peetathawatchai, Ari Glenn, Vikram Sivashankar, Daniel Zamoshchin, Leo Glikbarg, Derek Askaryar, Mike Yang, Teddy Zhang, Rishi Alluri, Nathan Tran, Rinnara Sangpisit, Polycarpos Yiorkadjis, Kenny Osele, Gautham Raghupathi, Dan Boneh, Daniel E. Ho, Percy Liang
Without subtask guidance, agents leveraging Claude 3. 5 Sonnet, GPT-4o, OpenAI o1-preview, and Claude 3 Opus successfully solved complete tasks that took human teams up to 11 minutes to solve.
no code implementations • 22 Jun 2024 • Kevin Wu, Eric Wu, Kit Rodolfa, Daniel E. Ho, James Zou
In particular, the adaptive nature of AI models presents unique challenges to regulators as updating a model can improve its performance but also introduce safety risks.
no code implementations • 19 Jun 2024 • Sebastian Quaade, Andrea Vallebueno, Olivia D. N. Alcabes, Kit T. Rodolfa, Daniel E. Ho
Overall, our study presents an efficient, scalable and highly adaptable method for monitoring aquaculture production from remote sensing imagery.
1 code implementation • 18 Jun 2024 • Andrea Vallebueno, Cassandra Handan-Nader, Christopher D. Manning, Daniel E. Ho
Static word embeddings are ubiquitous in computational social science applications and contribute to practical decision-making in a variety of fields including law and healthcare.
no code implementations • 30 May 2024 • Varun Magesh, Faiz Surani, Matthew Dahl, Mirac Suzgun, Christopher D. Manning, Daniel E. Ho
While hallucinations are reduced relative to general-purpose chatbots (GPT-4), we find that the AI research tools made by LexisNexis (Lexis+ AI) and Thomson Reuters (Westlaw AI-Assisted Research and Ask Practical Law AI) each hallucinate between 17% and 33% of the time.
1 code implementation • 2 Apr 2024 • Joel Niklaus, Lucia Zheng, Arya D. McCarthy, Christopher Hahn, Brian M. Rosen, Peter Henderson, Daniel E. Ho, Garrett Honke, Percy Liang, Christopher Manning
In this work, we curate LawInstruct, a large legal instruction dataset, covering 17 jurisdictions, 24 languages and a total of 12M examples.
no code implementations • 27 Feb 2024 • Sayash Kapoor, Rishi Bommasani, Kevin Klyman, Shayne Longpre, Ashwin Ramaswami, Peter Cihon, Aspen Hopkins, Kevin Bankston, Stella Biderman, Miranda Bogen, Rumman Chowdhury, Alex Engler, Peter Henderson, Yacine Jernite, Seth Lazar, Stefano Maffulli, Alondra Nelson, Joelle Pineau, Aviya Skowron, Dawn Song, Victor Storchan, Daniel Zhang, Daniel E. Ho, Percy Liang, Arvind Narayanan
To understand their risks of misuse, we design a risk assessment framework for analyzing their marginal risk.
1 code implementation • 3 Feb 2024 • Kevin Wu, Eric Wu, Ally Cassasola, Angela Zhang, Kevin Wei, Teresa Nguyen, Sith Riantawan, Patricia Shi Riantawan, Daniel E. Ho, James Zou
In this paper, we ask: do the sources that LLMs generate actually support the claims that they make?
1 code implementation • 2 Jan 2024 • Matthew Dahl, Varun Magesh, Mirac Suzgun, Daniel E. Ho
Second, we find that legal hallucinations are alarmingly prevalent, occurring between 58% of the time with ChatGPT 4 and 88% with Llama 2, when these models are asked specific, verifiable questions about random federal court cases.
no code implementations • 2 Oct 2023 • Hadi Elzayn, Emily Black, Patrick Vossler, Nathanael Jo, Jacob Goldin, Daniel E. Ho
Unlike similar existing approaches, our methods take advantage of contextual information -- specifically, the relationships between a model's predictions and the probabilistic prediction of protected attributes, given the true protected attribute, and vice versa -- to provide tighter bounds on the true disparity.
no code implementations • 29 Sep 2023 • Emily Black, Rakshit Naidu, Rayid Ghani, Kit T. Rodolfa, Daniel E. Ho, Hoda Heidari
While algorithmic fairness is a thriving area of research, in practice, mitigating issues of bias often gets reduced to enforcing an arbitrarily chosen fairness metric, either by enforcing fairness constraints during the optimization step, post-processing model outputs, or by manipulating the training data.
1 code implementation • NeurIPS 2023 • Neel Guha, Julian Nyarko, Daniel E. Ho, Christopher Ré, Adam Chilton, Aditya Narayana, Alex Chohlas-Wood, Austin Peters, Brandon Waldon, Daniel N. Rockmore, Diego Zambrano, Dmitry Talisman, Enam Hoque, Faiz Surani, Frank Fagan, Galit Sarfaty, Gregory M. Dickinson, Haggai Porat, Jason Hegland, Jessica Wu, Joe Nudell, Joel Niklaus, John Nay, Jonathan H. Choi, Kevin Tobia, Margaret Hagan, Megan Ma, Michael Livermore, Nikon Rasumov-Rahe, Nils Holzenberger, Noam Kolt, Peter Henderson, Sean Rehaag, Sharad Goel, Shang Gao, Spencer Williams, Sunny Gandhi, Tom Zur, Varun Iyer, Zehua Li
The advent of large language models (LLMs) and their adoption by the legal community has given rise to the question: what types of legal reasoning can LLMs perform?
2 code implementations • 15 Jun 2023 • Ronja Stern, Vishvaksenan Rasiah, Veton Matoshi, Srinanda Brügger Bose, Matthias Stürmer, Ilias Chalkidis, Daniel E. Ho, Joel Niklaus
Our benchmark contains diverse datasets from the Swiss legal system, allowing for a comprehensive study of the underlying non-English, inherently multilingual legal system.
no code implementations • 3 Jun 2023 • Joel Niklaus, Veton Matoshi, Matthias Stürmer, Ilias Chalkidis, Daniel E. Ho
Large, high-quality datasets are crucial for training Large Language Models (LLMs).
1 code implementation • 13 Sep 2022 • Neel Guha, Daniel E. Ho, Julian Nyarko, Christopher Ré
Finally-inspired by the Open Science movement-we make a call for the legal and computer science communities to join our efforts by contributing new tasks.
no code implementations • 24 Aug 2022 • Ben Chugg, Peter Henderson, Jacob Goldin, Daniel E. Ho
Entropy regularization is known to improve exploration in sequential decision-making problems.
1 code implementation • 18 Aug 2022 • Ben Chugg, Nicolas Rothbacher, Alex Feng, Xiaoqi Long, Daniel E. Ho
We show that this system effectively appears to detect land application (PR AUC = 0. 93) and we uncover several outlier facilities which appear to apply regularly and excessively.
1 code implementation • 1 Jul 2022 • Peter Henderson, Mark S. Krass, Lucia Zheng, Neel Guha, Christopher D. Manning, Dan Jurafsky, Daniel E. Ho
One concern with the rise of large language models lies with their potential for significant harm, particularly from pretraining on biased, obscene, copyrighted, and private information.
no code implementations • 20 Jun 2022 • Emily Black, Hadi Elzayn, Alexandra Chouldechova, Jacob Goldin, Daniel E. Ho
First, we show how the use of more flexible machine learning (classification) methods -- as opposed to simpler models -- shifts audit burdens from high to middle-income taxpayers.
no code implementations • 25 Apr 2022 • Peter Henderson, Ben Chugg, Brandon Anderson, Kristen Altenburger, Alex Turk, John Guyton, Jacob Goldin, Daniel E. Ho
This approach has the potential to improve audit efficacy, while maintaining policy-relevant estimates of the tax gap.
1 code implementation • 25 Oct 2021 • Ben Chugg, Daniel E. Ho
In many public health settings, there is a perceived tension between allocating resources to known vulnerable areas and learning about the overall prevalence of the problem.
2 code implementations • 16 Aug 2021 • Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel, Noah Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Koh, Mark Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Malik, Christopher D. Manning, Suvir Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Ben Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, Julian Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, aditi raghunathan, Rob Reich, Hongyu Ren, Frieda Rong, Yusuf Roohani, Camilo Ruiz, Jack Ryan, Christopher Ré, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishnan Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Wang, Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, Percy Liang
AI is undergoing a paradigm shift with the rise of models (e. g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks.
1 code implementation • 20 Jun 2021 • Zihan Huang, Charles Low, Mengqiu Teng, Hongyi Zhang, Daniel E. Ho, Mark S. Krass, Matthias Grabmair
Lawyers and judges spend a large amount of time researching the proper legal authority to cite while drafting decisions.
1 code implementation • 29 May 2021 • Ben Chugg, Brandon Anderson, Seiji Eicher, Sandy Lee, Daniel E. Ho
Much environmental enforcement in the United States has historically relied on either self-reported data or physical, resource-intensive, infrequent inspections.
2 code implementations • 18 Apr 2021 • Lucia Zheng, Neel Guha, Brandon R. Anderson, Peter Henderson, Daniel E. Ho
While a Transformer architecture (BERT) pretrained on a general corpus (Google Books and Wikipedia) improves performance, domain pretraining (using corpus of approximately 3. 5M decisions across all courts in the U. S. that is larger than BERT's) with a custom legal vocabulary exhibits the most substantial performance gains with CaseHOLD (gain of 7. 2% on F1, representing a 12% improvement on BERT) and consistent performance gains across two other legal tasks.
Ranked #1 on Text Classification on Overruling
1 code implementation • 17 Mar 2021 • Caleb Robinson, Anthony Ortiz, Juan M. Lavista Ferres, Brandon Anderson, Daniel E. Ho
For instance, in rural settings, the pre-construction area may look similar to the surrounding environment until the building is constructed.
no code implementations • 18 Dec 2020 • Daniel E. Ho, Alice Xiang
While there has been a flurry of research in algorithmic fairness, what is less recognized is that modern antidiscrimination law may prohibit the adoption of such techniques.