1 code implementation • 9 Apr 2024 • Jiayi Pan, Yichi Zhang, Nicholas Tomlin, Yifei Zhou, Sergey Levine, Alane Suhr
We show that domain-general automatic evaluators can significantly improve the performance of agents for web navigation and device control.
1 code implementation • 31 May 2023 • Jessy Lin, Nicholas Tomlin, Jacob Andreas, Jason Eisner
In each of these settings, AI assistants and users have disparate abilities that they must combine to arrive at the best decision: assistants can access and process large amounts of information, while users have preferences and constraints external to the system.
2 code implementations • 24 May 2023 • Vivek Verma, Eve Fleisig, Nicholas Tomlin, Dan Klein
In conjunction with our model, we release three new datasets of human- and AI-generated text as detection benchmarks in the domains of student essays, creative writing, and news articles.
no code implementations • 20 May 2023 • Vivek Verma, Nicholas Tomlin, Dan Klein
The uniform information density (UID) hypothesis states that humans tend to distribute information roughly evenly across an utterance or discourse.
no code implementations • 16 Nov 2022 • Andre He, Nicholas Tomlin, Dan Klein
We evaluate our performance on the task of reconstructing Latin from a dataset of cognates across five Romance languages, achieving a notable reduction in edit distance from the target word forms compared to previous methods.
no code implementations • 15 Nov 2022 • Daniel Fried, Nicholas Tomlin, Jennifer Hu, Roma Patel, Aida Nematzadeh
People rely heavily on context to enrich meaning beyond what is literally said, enabling concise but effective communication.
1 code implementation • ACL 2022 • Eric Wallace, Nicholas Tomlin, Albert Xu, Kevin Yang, Eshaan Pathak, Matthew Ginsberg, Dan Klein
We present the Berkeley Crossword Solver, a state-of-the-art approach for automatically solving crossword puzzles.
1 code implementation • ACL 2022 • Nicholas Tomlin, Andre He, Dan Klein
We present a new dataset containing 10K human-annotated games of Go and show how these natural language annotations can be used as a tool for model interpretability.