Search Results for author: Carolyn Jane Anderson

Found 14 papers, 8 papers with code

PhD Knowledge Not Required: A Reasoning Challenge for Large Language Models

1 code implementation3 Feb 2025 Carolyn Jane Anderson, Joydeep Biswas, Aleksander Boruch-Gruszecki, Federico Cassano, Molly Q Feldman, Arjun Guha, Francesca Lucchetti, Zixuan Wu

We also quantify the effectiveness of reasoning longer with R1 and Gemini Thinking to identify the point beyond which more reasoning is unlikely to improve accuracy on our benchmark.

General Knowledge

Substance Beats Style: Why Beginning Students Fail to Code with LLMs

no code implementations15 Oct 2024 Francesca Lucchetti, Zixuan Wu, Arjun Guha, Molly Q Feldman, Carolyn Jane Anderson

Although LLMs are increasing the productivity of professional programmers, existing work shows that beginners struggle to prompt LLMs to solve text-to-code tasks.

Code Generation

Evaluating Computational Representations of Character: An Austen Character Similarity Benchmark

no code implementations28 Aug 2024 Funing Yang, Carolyn Jane Anderson

Several systems have been developed to extract information about characters to aid computational analysis of English literature.

GlyphPattern: An Abstract Pattern Recognition for Vision-Language Models

1 code implementation12 Aug 2024 Zixuan Wu, Yoolim Kim, Carolyn Jane Anderson

GlyphPattern evaluates abstract pattern recognition in VLMs, requiring models to understand and judge natural language descriptions of visual patterns.

Natural Language Understanding

Solving and Generating NPR Sunday Puzzles with Large Language Models

1 code implementation21 Jun 2023 Jingmiao Zhao, Carolyn Jane Anderson

We explore the ability of large language models to solve and generate puzzles from the NPR Sunday Puzzle game show using PUZZLEQA, a dataset comprising 15 years of on-air puzzles.

Multiple-choice Prompt Engineering

StudentEval: A Benchmark of Student-Written Prompts for Large Language Models of Code

no code implementations7 Jun 2023 Hannah McLean Babe, Sydney Nguyen, Yangtian Zi, Arjun Guha, Molly Q Feldman, Carolyn Jane Anderson

We use StudentEval to evaluate 5 Code LLMs and find that StudentEval is a better discriminator of model performance than existing benchmarks.

Code Generation

StarCoder: may the source be with you!

4 code implementations9 May 2023 Raymond Li, Loubna Ben allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, Qian Liu, Evgenii Zheltonozhskii, Terry Yue Zhuo, Thomas Wang, Olivier Dehaene, Mishig Davaadorj, Joel Lamy-Poirier, João Monteiro, Oleh Shliazhko, Nicolas Gontier, Nicholas Meade, Armel Zebaze, Ming-Ho Yee, Logesh Kumar Umapathi, Jian Zhu, Benjamin Lipkin, Muhtasham Oblokulov, Zhiruo Wang, Rudra Murthy, Jason Stillerman, Siva Sankalp Patel, Dmitry Abulkhanov, Marco Zocca, Manan Dey, Zhihan Zhang, Nour Fahmy, Urvashi Bhattacharyya, Wenhao Yu, Swayam Singh, Sasha Luccioni, Paulo Villegas, Maxim Kunakov, Fedor Zhdanov, Manuel Romero, Tony Lee, Nadav Timor, Jennifer Ding, Claire Schlesinger, Hailey Schoelkopf, Jan Ebert, Tri Dao, Mayank Mishra, Alex Gu, Jennifer Robinson, Carolyn Jane Anderson, Brendan Dolan-Gavitt, Danish Contractor, Siva Reddy, Daniel Fried, Dzmitry Bahdanau, Yacine Jernite, Carlos Muñoz Ferrandis, Sean Hughes, Thomas Wolf, Arjun Guha, Leandro von Werra, Harm de Vries

The BigCode community, an open-scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder and StarCoderBase: 15. 5B parameter models with 8K context length, infilling capabilities and fast large-batch inference enabled by multi-query attention.

8k Code Generation +1

Cannot find the paper you are looking for? You can Submit a new open access paper.