Search Results for author: Benjamin Charles Germain Lee

Found 6 papers, 3 papers with code

The "Collections as ML Data" Checklist for Machine Learning & Cultural Heritage

no code implementations • 6 Jul 2022 • Benjamin Charles Germain Lee

By surveying existing projects, including my own project, Newspaper Navigator, I justify the "Collections as ML Data" checklist and demonstrate how the formulated guiding questions can be employed and operationalized.

BIG-bench Machine Learning

Paper
Add Code

Grappling with the Scale of Born-Digital Government Publications: Toward Pipelines for Processing and Searching Millions of PDFs

no code implementations • 5 Dec 2021 • Benjamin Charles Germain Lee, Trevor Owens

This paper utilizes a Library of Congress dataset of 1, 000 government PDFs in order to offer initial approaches for searching and analyzing these PDFs at scale.

Paper
Add Code

Navigating the Mise-en-Page: Interpretive Machine Learning Approaches to the Visual Layouts of Multi-Ethnic Periodicals

no code implementations • 3 Sep 2021 • Benjamin Charles Germain Lee, Joshua Ortiz Baco, Sarah H. Salter, Jim Casey

This paper presents a computational method of analysis that draws from machine learning, library science, and literary studies to map the visual layouts of multi-ethnic newspapers from the late 19th and early 20th century United States.

BIG-bench Machine Learning

Paper
Add Code

LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis

6 code implementations • 29 Mar 2021 • Zejiang Shen, Ruochen Zhang, Melissa Dell, Benjamin Charles Germain Lee, Jacob Carlson, Weining Li

Recent advances in document image analysis (DIA) have been primarily driven by the application of neural networks.

38,490

Paper
Code

The Newspaper Navigator Dataset: Extracting And Analyzing Visual Content from 16 Million Historic Newspaper Pages in Chronicling America

2 code implementations • 4 May 2020 • Benjamin Charles Germain Lee, Jaime Mears, Eileen Jakeway, Meghan Ferriter, Chris Adams, Nathan Yarasavage, Deborah Thomas, Kate Zwaard, Daniel S. Weld

We report the results of running the pipeline on 16. 3 million pages from the Chronicling America corpus and describe the resulting Newspaper Navigator dataset, the largest dataset of extracted visual content from historic newspapers ever produced.

Optical Character Recognition (OCR)

224

Paper
Code

LIMEADE: From AI Explanations to Advice Taking

1 code implementation • 9 Mar 2020 • Benjamin Charles Germain Lee, Doug Downey, Kyle Lo, Daniel S. Weld

We show our method improves accuracy compared to a rigorous baseline on the image classification domains.

BIG-bench Machine Learning Image Classification +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.