no code implementations • 6 Jul 2022 • Benjamin Charles Germain Lee
By surveying existing projects, including my own project, Newspaper Navigator, I justify the "Collections as ML Data" checklist and demonstrate how the formulated guiding questions can be employed and operationalized.
no code implementations • 5 Dec 2021 • Benjamin Charles Germain Lee, Trevor Owens
This paper utilizes a Library of Congress dataset of 1, 000 government PDFs in order to offer initial approaches for searching and analyzing these PDFs at scale.
no code implementations • 3 Sep 2021 • Benjamin Charles Germain Lee, Joshua Ortiz Baco, Sarah H. Salter, Jim Casey
This paper presents a computational method of analysis that draws from machine learning, library science, and literary studies to map the visual layouts of multi-ethnic newspapers from the late 19th and early 20th century United States.
6 code implementations • 29 Mar 2021 • Zejiang Shen, Ruochen Zhang, Melissa Dell, Benjamin Charles Germain Lee, Jacob Carlson, Weining Li
Recent advances in document image analysis (DIA) have been primarily driven by the application of neural networks.
2 code implementations • 4 May 2020 • Benjamin Charles Germain Lee, Jaime Mears, Eileen Jakeway, Meghan Ferriter, Chris Adams, Nathan Yarasavage, Deborah Thomas, Kate Zwaard, Daniel S. Weld
We report the results of running the pipeline on 16. 3 million pages from the Chronicling America corpus and describe the resulting Newspaper Navigator dataset, the largest dataset of extracted visual content from historic newspapers ever produced.
1 code implementation • 9 Mar 2020 • Benjamin Charles Germain Lee, Doug Downey, Kyle Lo, Daniel S. Weld
We show our method improves accuracy compared to a rigorous baseline on the image classification domains.