Search Results for author: Daniel E. Ho

Found 16 papers, 11 papers with code

SCALE: Scaling up the Complexity for Advanced Language Model Evaluation

2 code implementations15 Jun 2023 Vishvaksenan Rasiah, Ronja Stern, Veton Matoshi, Matthias Stürmer, Ilias Chalkidis, Daniel E. Ho, Joel Niklaus

In this paper, we introduce a novel NLP benchmark that poses challenges to current LLMs across four key dimensions: processing long documents (up to 50K tokens), utilizing domain specific knowledge (embodied in legal texts), multilingual understanding (covering five languages), and multitasking (comprising legal document to document Information Retrieval, Court View Generation, Leading Decision Summarization, Citation Extraction, and eight challenging Text Classification tasks).

Information Retrieval Language Modelling +2

MultiLegalPile: A 689GB Multilingual Legal Corpus

no code implementations3 Jun 2023 Joel Niklaus, Veton Matoshi, Matthias Stürmer, Ilias Chalkidis, Daniel E. Ho

Large, high-quality datasets are crucial for training Large Language Models (LLMs).

LegalBench: Prototyping a Collaborative Benchmark for Legal Reasoning

1 code implementation13 Sep 2022 Neel Guha, Daniel E. Ho, Julian Nyarko, Christopher Ré

Finally-inspired by the Open Science movement-we make a call for the legal and computer science communities to join our efforts by contributing new tasks.

Legal Reasoning

Entropy Regularization for Population Estimation

no code implementations24 Aug 2022 Ben Chugg, Peter Henderson, Jacob Goldin, Daniel E. Ho

Entropy regularization is known to improve exploration in sequential decision-making problems.

Decision Making

Detecting Environmental Violations with Satellite Imagery in Near Real Time: Land Application under the Clean Water Act

1 code implementation18 Aug 2022 Ben Chugg, Nicolas Rothbacher, Alex Feng, Xiaoqi Long, Daniel E. Ho

We show that this system effectively appears to detect land application (PR AUC = 0. 93) and we uncover several outlier facilities which appear to apply regularly and excessively.

object-detection Object Detection

Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset

1 code implementation1 Jul 2022 Peter Henderson, Mark S. Krass, Lucia Zheng, Neel Guha, Christopher D. Manning, Dan Jurafsky, Daniel E. Ho

One concern with the rise of large language models lies with their potential for significant harm, particularly from pretraining on biased, obscene, copyrighted, and private information.

Algorithmic Fairness and Vertical Equity: Income Fairness with IRS Tax Audit Models

no code implementations20 Jun 2022 Emily Black, Hadi Elzayn, Alexandra Chouldechova, Jacob Goldin, Daniel E. Ho

First, we show how the use of more flexible machine learning (classification) methods -- as opposed to simpler models -- shifts audit burdens from high to middle-income taxpayers.

Fairness regression

Reconciling Risk Allocation and Prevalence Estimation in Public Health Using Batched Bandits

1 code implementation25 Oct 2021 Ben Chugg, Daniel E. Ho

In many public health settings, there is a perceived tension between allocating resources to known vulnerable areas and learning about the overall prevalence of the problem.

On the Opportunities and Risks of Foundation Models

3 code implementations16 Aug 2021 Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel, Noah Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Koh, Mark Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Malik, Christopher D. Manning, Suvir Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Ben Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, Julian Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, aditi raghunathan, Rob Reich, Hongyu Ren, Frieda Rong, Yusuf Roohani, Camilo Ruiz, Jack Ryan, Christopher Ré, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishnan Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Wang, Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, Percy Liang

AI is undergoing a paradigm shift with the rise of models (e. g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks.

Transfer Learning

Enhancing Environmental Enforcement with Near Real-Time Monitoring: Likelihood-Based Detection of Structural Expansion of Intensive Livestock Farms

1 code implementation29 May 2021 Ben Chugg, Brandon Anderson, Seiji Eicher, Sandy Lee, Daniel E. Ho

Much environmental enforcement in the United States has historically relied on either self-reported data or physical, resource-intensive, infrequent inspections.

Change Point Detection

When Does Pretraining Help? Assessing Self-Supervised Learning for Law and the CaseHOLD Dataset

2 code implementations18 Apr 2021 Lucia Zheng, Neel Guha, Brandon R. Anderson, Peter Henderson, Daniel E. Ho

While a Transformer architecture (BERT) pretrained on a general corpus (Google Books and Wikipedia) improves performance, domain pretraining (using corpus of approximately 3. 5M decisions across all courts in the U. S. that is larger than BERT's) with a custom legal vocabulary exhibits the most substantial performance gains with CaseHOLD (gain of 7. 2% on F1, representing a 12% improvement on BERT) and consistent performance gains across two other legal tasks.

Multiple-choice Question Answering +3

Temporal Cluster Matching for Change Detection of Structures from Satellite Imagery

1 code implementation17 Mar 2021 Caleb Robinson, Anthony Ortiz, Juan M. Lavista Ferres, Brandon Anderson, Daniel E. Ho

For instance, in rural settings, the pre-construction area may look similar to the surrounding environment until the building is constructed.

Change Detection Data Augmentation +1

Affirmative Algorithms: The Legal Grounds for Fairness as Awareness

no code implementations18 Dec 2020 Daniel E. Ho, Alice Xiang

While there has been a flurry of research in algorithmic fairness, what is less recognized is that modern antidiscrimination law may prohibit the adoption of such techniques.

Causal Inference Fairness +1

Cannot find the paper you are looking for? You can Submit a new open access paper.