1 code implementation • 10 Apr 2025 • Michael J Bommarito II, Jillian Bommarito, Daniel Martin Katz
Practically all large language models have been pre-trained on data that is subject to global uncertainty related to copyright infringement and breach of contract.
1 code implementation • 21 Mar 2025 • Michael J Bommarito, Daniel Martin Katz, Jillian Bommarito
We present the KL3M tokenizers, a family of specialized tokenizers for legal, financial, and governmental text.
1 code implementation • 12 May 2023 • Ilias Chalkidis, Nicolas Garneau, Catalina Goanta, Daniel Martin Katz, Anders Søgaard
To this end, we release a multinational English legal corpus (LeXFiles) and a legal knowledge probing benchmark (LegalLAMA) to facilitate training and detailed analysis of legal-oriented PLMs.
no code implementations • 23 Feb 2023 • Daniel Martin Katz, Dirk Hartung, Lauritz Gerlach, Abhik Jana, Michael J. Bommarito II
To support our analysis, we construct and analyze a nearly complete corpus of more than six hundred NLP & Law related papers published over the past decade.
1 code implementation • 11 Jan 2023 • Jillian Bommarito, Michael Bommarito, Daniel Martin Katz, Jessica Katz
The global economy is increasingly dependent on knowledge workers to meet the needs of public and private organizations.
5 code implementations • 29 Dec 2022 • Michael Bommarito II, Daniel Martin Katz
Nearly all jurisdictions in the United States require a professional license exam, commonly referred to as "the Bar Exam," as a precondition for law practice.
no code implementations • 15 Oct 2021 • Corinna Coupette, Dirk Hartung, Janis Beckedorf, Maximilian Böther, Daniel Martin Katz
Building on the computer science concept of code smells, we initiate the study of law smells, i. e., patterns in legal texts that pose threats to the comprehensibility and maintainability of the law.
1 code implementation • ACL 2022 • Ilias Chalkidis, Abhik Jana, Dirk Hartung, Michael Bommarito, Ion Androutsopoulos, Daniel Martin Katz, Nikolaos Aletras
Laws and their interpretations, legal arguments and agreements\ are typically expressed in writing, leading to the production of vast corpora of legal text.
Ranked #1 on
Natural Language Understanding
on LexGLUE
1 code implementation • 13 Jun 2018 • Michael J Bommarito II, Daniel Martin Katz, Eric M Detterman
OpenEDGAR is an open source Python framework designed to rapidly construct research databases based on the Electronic Data Gathering, Analysis, and Retrieval (EDGAR) system operated by the US Securities and Exchange Commission (SEC).
1 code implementation • 10 Jun 2018 • Michael J Bommarito II, Daniel Martin Katz, Eric M Detterman
LexNLP is an open source Python package focused on natural language processing and machine learning for legal and regulatory text.
2 code implementations • 11 Dec 2016 • Daniel Martin Katz, Michael J Bommarito II, Josh Blackman
Building on developments in machine learning and prior work in the science of judicial prediction, we construct a model designed to predict the behavior of the Supreme Court of the United States in a generalized, out-of-sample context.
Physics and Society