no code implementations • 16 Mar 2022 • Daniel King, Zejiang Shen, Nishant Subramani, Daniel S. Weld, Iz Beltagy, Doug Downey
Based on our findings, we present PINOCCHIO, a new decoding method that improves the consistency of a transformer-based abstractive summarizer by constraining beam search to avoid hallucinations.
1 code implementation • 1 Jun 2021 • Zejiang Shen, Kyle Lo, Lucy Lu Wang, Bailey Kuehl, Daniel S. Weld, Doug Downey
Experiments are conducted on a newly curated evaluation suite, S2-VLUE, that unifies existing automatically-labeled datasets and includes a new dataset of manual annotations covering diverse papers from 19 scientific disciplines.
5 code implementations • 29 Mar 2021 • Zejiang Shen, Ruochen Zhang, Melissa Dell, Benjamin Charles Germain Lee, Jacob Carlson, Weining Li
Recent advances in document image analysis (DIA) have been primarily driven by the application of neural networks.
1 code implementation • ACL 2021 • Mark Neumann, Zejiang Shen, Sam Skjonsberg
Adobe's Portable Document Format (PDF) is a popular way of distributing view-only documents with a rich visual markup.
1 code implementation • 5 Oct 2020 • Zejiang Shen, Jian Zhao, Melissa Dell, YaoLiang Yu, Weining Li
Document images often have intricate layout structures, with numerous content regions (e. g. texts, figures, tables) densely arranged on each page.
3 code implementations • 18 Apr 2020 • Zejiang Shen, Kaixuan Zhang, Melissa Dell
Deep learning-based approaches for automatic document layout analysis and content extraction have the potential to unlock rich information trapped in historical documents on a large scale.
1 code implementation • 1 Jan 2020 • Youssef Alami Mejjati, Zejiang Shen, Michael Snower, Aaron Gokaslan, Oliver Wang, James Tompkin, Kwang In Kim
We present an algorithm to generate diverse foreground objects and composite them into background images using a GAN architecture.
no code implementations • NeurIPS Workshop Document_Intelligen 2019 • Kaixuan Zhang, Zejiang Shen, Jie zhou, Melissa Dell
Recent innovations have improved layout analysis of document images, significantly improving our ability to identify text and non-text regions.