Search Results for author: Haau-Sing Li

Found 6 papers, 4 papers with code

Reranking Laws for Language Generation: A Communication-Theoretic Perspective

no code implementations11 Sep 2024 António Farinhas, Haau-Sing Li, André F. T. Martins

In this paper, we draw a parallel between this strategy and the use of redundancy to decrease the error rate in noisy communication channels.

Code Generation Machine Translation +2

DOCE: Finding the Sweet Spot for Execution-Based Code Generation

1 code implementation25 Aug 2024 Haau-Sing Li, Patrick Fernandes, Iryna Gurevych, André F. T. Martins

Recently, a diverse set of decoding and reranking procedures have been shown effective for LLM-based code generation.

Code Generation

Uncertainty in Natural Language Generation: From Theory to Applications

no code implementations28 Jul 2023 Joris Baan, Nico Daheim, Evgenia Ilia, Dennis Ulmer, Haau-Sing Li, Raquel Fernández, Barbara Plank, Rico Sennrich, Chrysoula Zerva, Wilker Aziz

Recent advances of powerful Language Models have allowed Natural Language Generation (NLG) to emerge as an important technology that can not only perform traditional tasks like summarisation or translation, but also serve as a natural language interface to a variety of applications.

Active Learning Text Generation

Python Code Generation by Asking Clarification Questions

1 code implementation19 Dec 2022 Haau-Sing Li, Mohsen Mesgar, André F. T. Martins, Iryna Gurevych

We hypothesize that the under-specification of a natural language description can be resolved by asking clarification questions.

Code Generation Language Modeling +1

When Do You Need Billions of Words of Pretraining Data?

1 code implementation ACL 2021 Yian Zhang, Alex Warstadt, Haau-Sing Li, Samuel R. Bowman

We adopt four probing methods---classifier probing, information-theoretic probing, unsupervised relative acceptability judgment, and fine-tuning on NLU tasks---and draw learning curves that track the growth of these different measures of linguistic ability with respect to pretraining data volume using the MiniBERTas, a group of RoBERTa models pretrained on 1M, 10M, 100M and 1B words.

Learning Which Features Matter: RoBERTa Acquires a Preference for Linguistic Generalizations (Eventually)

1 code implementation EMNLP 2020 Alex Warstadt, Yian Zhang, Haau-Sing Li, Haokun Liu, Samuel R. Bowman

One reason pretraining on self-supervised linguistic tasks is effective is that it teaches models features that are helpful for language understanding.

Binary Classification Diagnostic

Cannot find the paper you are looking for? You can Submit a new open access paper.