Search Results for author: Juan Diego Rodriguez

Found 14 papers, 10 papers with code

ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models

no code implementations19 May 2025 Liyan Tang, Grace Kim, Xinyu Zhao, Thom Lake, Wenxuan Ding, Fangcong Yin, Prasann Singhal, Manya Wadhwa, Zeyu Leo Liu, Zayne Sprague, Ramya Namuduri, Bodun Hu, Juan Diego Rodriguez, Puyuan Peng, Greg Durrett

Unlike prior chart understanding benchmarks -- where frontier models perform similarly and near saturation -- our benchmark exposes a substantial gap between model and human performance, while effectively differentiating model capabilities: although humans achieve 93% accuracy, the best-performing model Gemini-2. 5-Pro attains only 63. 0%, and the leading open-source LVLM Qwen2. 5-VL-72B-Instruct achieves only 38. 5%.

Chart Question Answering Chart Understanding +2

RankAlign: A Ranking View of the Generator-Validator Gap in Large Language Models

1 code implementation15 Apr 2025 Juan Diego Rodriguez, Wenxuan Ding, Katrin Erk, Greg Durrett

Although large language models (LLMs) have become generally more capable and accurate across many tasks, some fundamental sources of unreliability remain in their behavior.

Question Answering

Parameterized Synthetic Text Generation with SimpleStories

1 code implementation12 Apr 2025 Lennart Finke, Chandan Sreedhara, Thomas Dooms, Mat Allen, Emerald Zhang, Juan Diego Rodriguez, Noa Nabeshima, Thomas Marshall, Dan Braun

We present SimpleStories, a large synthetic story dataset in simple language, consisting of 2 million samples each in English and Japanese.

Diversity Language Modeling +2

Characterizing the Role of Similarity in the Property Inferences of Language Models

1 code implementation29 Oct 2024 Juan Diego Rodriguez, Aaron Mueller, Kanishka Misra

Property inheritance -- a phenomenon where novel properties are projected from higher level categories (e. g., birds) to lower level ones (e. g., sparrows) -- provides a unique window into how humans organize and deploy conceptual knowledge.

To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning

1 code implementation18 Sep 2024 Zayne Sprague, Fangcong Yin, Juan Diego Rodriguez, Dongwei Jiang, Manya Wadhwa, Prasann Singhal, Xinyu Zhao, Xi Ye, Kyle Mahowald, Greg Durrett

Chain-of-thought (CoT) via prompting is the de facto method for eliciting reasoning capabilities from large language models (LLMs).

Math MMLU

Lil-Bevo: Explorations of Strategies for Training Language Models in More Humanlike Ways

1 code implementation26 Oct 2023 Venkata S Govindarajan, Juan Diego Rodriguez, Kaj Bostrom, Kyle Mahowald

We pretrained our masked language models with three ingredients: an initial pretraining with music data, training on shorter sequences before training on longer ones, and masking specific tokens to target some of the BLiMP subtasks.

Language Modeling Language Modelling +1

X-PARADE: Cross-Lingual Textual Entailment and Information Divergence across Paragraphs

1 code implementation16 Sep 2023 Juan Diego Rodriguez, Katrin Erk, Greg Durrett

Aligned paragraphs are sourced from Wikipedia pages in different languages, reflecting real information divergences observed in the wild.

Fact Checking Machine Translation +1

WiCE: Real-World Entailment for Claims in Wikipedia

2 code implementations2 Mar 2023 Ryo Kamoi, Tanya Goyal, Juan Diego Rodriguez, Greg Durrett

Textual entailment models are increasingly applied in settings like fact-checking, presupposition verification in question answering, or summary evaluation.

Fact Checking Natural Language Inference +3

NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation

2 code implementations6 Dec 2021 Kaustubh D. Dhole, Varun Gangal, Sebastian Gehrmann, Aadesh Gupta, Zhenhao Li, Saad Mahamood, Abinaya Mahendiran, Simon Mille, Ashish Shrivastava, Samson Tan, Tongshuang Wu, Jascha Sohl-Dickstein, Jinho D. Choi, Eduard Hovy, Ondrej Dusek, Sebastian Ruder, Sajant Anand, Nagender Aneja, Rabin Banjade, Lisa Barthe, Hanna Behnke, Ian Berlot-Attwell, Connor Boyle, Caroline Brun, Marco Antonio Sobrevilla Cabezudo, Samuel Cahyawijaya, Emile Chapuis, Wanxiang Che, Mukund Choudhary, Christian Clauss, Pierre Colombo, Filip Cornell, Gautier Dagan, Mayukh Das, Tanay Dixit, Thomas Dopierre, Paul-Alexis Dray, Suchitra Dubey, Tatiana Ekeinhor, Marco Di Giovanni, Tanya Goyal, Rishabh Gupta, Louanes Hamla, Sang Han, Fabrice Harel-Canada, Antoine Honore, Ishan Jindal, Przemyslaw K. Joniak, Denis Kleyko, Venelin Kovatchev, Kalpesh Krishna, Ashutosh Kumar, Stefan Langer, Seungjae Ryan Lee, Corey James Levinson, Hualou Liang, Kaizhao Liang, Zhexiong Liu, Andrey Lukyanenko, Vukosi Marivate, Gerard de Melo, Simon Meoni, Maxime Meyer, Afnan Mir, Nafise Sadat Moosavi, Niklas Muennighoff, Timothy Sum Hon Mun, Kenton Murray, Marcin Namysl, Maria Obedkova, Priti Oli, Nivranshu Pasricha, Jan Pfister, Richard Plant, Vinay Prabhu, Vasile Pais, Libo Qin, Shahab Raji, Pawan Kumar Rajpoot, Vikas Raunak, Roy Rinberg, Nicolas Roberts, Juan Diego Rodriguez, Claude Roux, Vasconcellos P. H. S., Ananya B. Sai, Robin M. Schmidt, Thomas Scialom, Tshephisho Sefara, Saqib N. Shamsi, Xudong Shen, Haoyue Shi, Yiwen Shi, Anna Shvets, Nick Siegel, Damien Sileo, Jamie Simon, Chandan Singh, Roman Sitelew, Priyank Soni, Taylor Sorensen, William Soto, Aman Srivastava, KV Aditya Srivatsa, Tony Sun, Mukund Varma T, A Tabassum, Fiona Anting Tan, Ryan Teehan, Mo Tiwari, Marie Tolkiehn, Athena Wang, Zijian Wang, Gloria Wang, Zijie J. Wang, Fuxuan Wei, Bryan Wilie, Genta Indra Winata, Xinyi Wu, Witold Wydmański, Tianbao Xie, Usama Yaseen, Michael A. Yee, Jing Zhang, Yue Zhang

Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on.

Data Augmentation Diversity

Reusable Templates and Guides For Documenting Datasets and Models for Natural Language Processing and Generation: A Case Study of the HuggingFace and GEM Data and Model Cards

no code implementations ACL (GEM) 2021 Angelina McMillan-Major, Salomey Osei, Juan Diego Rodriguez, Pawan Sasanka Ammanamanchi, Sebastian Gehrmann, Yacine Jernite

Developing documentation guidelines and easy-to-use templates for datasets and models is a challenging task, especially given the variety of backgrounds, skills, and incentives of the people involved in the building of natural language processing (NLP) tools.

Text Generation

Leveraging WordNet Paths for Neural Hypernym Prediction

1 code implementation COLING 2020 Yejin Cho, Juan Diego Rodriguez, Yifan Gao, Katrin Erk

We formulate the problem of hypernym prediction as a sequence generation task, where the sequences are taxonomy paths in WordNet.

Decoder Prediction

Cannot find the paper you are looking for? You can Submit a new open access paper.