Search Results for author: Jacob Carlson

Found 4 papers, 2 papers with code

EfficientOCR: An Extensible, Open-Source Package for Efficiently Digitizing World Knowledge

no code implementations16 Oct 2023 Tom Bryan, Jacob Carlson, Abhishek Arora, Melissa Dell

Given the diversity and sheer quantity of public domain texts, liberating them at scale requires optical character recognition (OCR) that is accurate, extremely cheap to deploy, and sample-efficient to customize to novel collections, languages, and character sets.

Image Retrieval Language Modelling +3

American Stories: A Large-Scale Structured Text Dataset of Historical U.S. Newspapers

no code implementations NeurIPS 2023 Melissa Dell, Jacob Carlson, Tom Bryan, Emily Silcock, Abhishek Arora, Zejiang Shen, Luca D'Amico-Wong, Quan Le, Pablo Querubin, Leander Heldring

The resulting American Stories dataset provides high quality data that could be used for pre-training a large language model to achieve better understanding of historical English and historical world knowledge.

Language Modelling Large Language Model +3

Efficient OCR for Building a Diverse Digital History

1 code implementation5 Apr 2023 Jacob Carlson, Tom Bryan, Melissa Dell

Thousands of users consult digital archives daily, but the information they can access is unrepresentative of the diversity of documentary history.

Image Retrieval Language Modelling +3

LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis

6 code implementations29 Mar 2021 Zejiang Shen, Ruochen Zhang, Melissa Dell, Benjamin Charles Germain Lee, Jacob Carlson, Weining Li

Recent advances in document image analysis (DIA) have been primarily driven by the application of neural networks.

Cannot find the paper you are looking for? You can Submit a new open access paper.