Search Results for author: Juan Ciro

Found 9 papers, 6 papers with code

Clinical knowledge in LLMs does not translate to human interactions

1 code implementation26 Apr 2025 Andrew M. Bean, Rebecca Payne, Guy Parsons, Hannah Rose Kirk, Juan Ciro, Rafael Mosquera, Sara Hincapié Monsalve, Aruna S. Ekanayaka, Lionel Tarassenko, Luc Rocher, Adam Mahdi

Tested alone, LLMs complete the scenarios accurately, correctly identifying conditions in 94. 9% of cases and disposition in 56. 3% on average.

Clinical Knowledge

Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora

1 code implementation10 Apr 2025 Alex Warstadt, Aaron Mueller, Leshem Choshen, Ethan Wilcox, Chengxu Zhuang, Juan Ciro, Rafael Mosquera, Bhargavi Paranjape, Adina Williams, Tal Linzen, Ryan Cotterell

These intensive resource demands limit the ability of researchers to train new models and use existing models as developmentally plausible cognitive models.

Adversarial Nibbler: An Open Red-Teaming Method for Identifying Diverse Harms in Text-to-Image Generation

1 code implementation14 Feb 2024 Jessica Quaye, Alicia Parrish, Oana Inel, Charvi Rastogi, Hannah Rose Kirk, Minsuk Kahng, Erin Van Liemt, Max Bartolo, Jess Tsang, Justin White, Nathan Clement, Rafael Mosquera, Juan Ciro, Vijay Janapa Reddi, Lora Aroyo

By focusing on ``implicitly adversarial'' prompts (those that trigger T2I models to generate unsafe images for non-obvious reasons), we isolate a set of difficult safety issues that human creativity is well-suited to uncover.

Red Teaming Text-to-Image Generation

Speech Wikimedia: A 77 Language Multilingual Speech Dataset

1 code implementation30 Aug 2023 Rafael Mosquera Gómez, Julián Eusse, Juan Ciro, Daniel Galvez, Ryan Hileman, Kurt Bollacker, David Kanter

The Speech Wikimedia Dataset is a publicly available compilation of audio with transcriptions extracted from Wikimedia Commons.

Machine Translation speech-recognition +2

LSH methods for data deduplication in a Wikipedia artificial dataset

no code implementations10 Dec 2021 Juan Ciro, Daniel Galvez, Tim Schlippe, David Kanter

This paper illustrates locality sensitive hasing (LSH) models for the identification and removal of nearly redundant data in a text dataset.

The People's Speech: A Large-Scale Diverse English Speech Recognition Dataset for Commercial Usage

no code implementations17 Nov 2021 Daniel Galvez, Greg Diamos, Juan Ciro, Juan Felipe Cerón, Keith Achorn, Anjali Gopi, David Kanter, Maximilian Lam, Mark Mazumder, Vijay Janapa Reddi

The People's Speech is a free-to-download 30, 000-hour and growing supervised conversational English speech recognition dataset licensed for academic and commercial usage under CC-BY-SA (with a CC-BY subset).

speech-recognition Speech Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.