Search Results for author: Baptiste Pannier

Found 4 papers, 1 papers with code

The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only

1 code implementation1 Jun 2023 Guilherme Penedo, Quentin Malartic, Daniel Hesslow, Ruxandra Cojocaru, Alessandro Cappelli, Hamza Alobeidli, Baptiste Pannier, Ebtesam Almazrouei, Julien Launay

Large language models are commonly trained on a mixture of filtered web data and curated high-quality corpora, such as social media conversations, books, or technical papers.

Zero-shot Generalization

Cannot find the paper you are looking for? You can Submit a new open access paper.