1 code implementation • 14 Mar 2025 • Ahmed Nassar, Andres Marafioti, Matteo Omenetti, Maksym Lysak, Nikolaos Livathinos, Christoph Auer, Lucas Morin, Rafael Teixeira de Lima, Yusik Kim, A. Said Gurbuz, Michele Dolfi, Miquel Farré, Peter W. J. Staar
We introduce SmolDocling, an ultra-compact vision-language model targeting end-to-end document conversion.
no code implementations • 27 Jan 2025 • Nikolaos Livathinos, Christoph Auer, Maksym Lysak, Ahmed Nassar, Michele Dolfi, Panos Vagenas, Cesar Berrospi Ramis, Matteo Omenetti, Kasper Dinkla, Yusik Kim, Shubham Gupta, Rafael Teixeira de Lima, Valery Weber, Lucas Morin, Ingmar Meijer, Viktor Kuropiatnyk, Peter W. J. Staar
We introduce Docling, an easy-to-use, self-contained, MIT-licensed, open-source toolkit for document conversion, that can parse several types of popular document formats into a unified, richly structured representation.
3 code implementations • 19 Aug 2024 • Christoph Auer, Maksym Lysak, Ahmed Nassar, Michele Dolfi, Nikolaos Livathinos, Panos Vagenas, Cesar Berrospi Ramis, Matteo Omenetti, Fabian Lindlbauer, Kasper Dinkla, Lokesh Mishra, Yusik Kim, Shubham Gupta, Rafael Teixeira de Lima, Valery Weber, Lucas Morin, Ingmar Meijer, Viktor Kuropiatnyk, Peter W. J. Staar
This technical report introduces Docling, an easy to use, self-contained, MIT-licensed open-source package for PDF document conversion.
1 code implementation • 1 May 2024 • Oshri Naparstek, Roi Pony, Inbar Shapira, Foad Abo Dahood, Ophir Azulai, Yevgeny Yaroker, Nadav Rubinstein, Maksym Lysak, Peter Staar, Ahmed Nassar, Nikolaos Livathinos, Christoph Auer, Elad Amrani, Idan Friedman, Orit Prince, Yevgeny Burshtein, Adi Raz Goldfarb, Udi Barzelay
In recent years, the challenge of extracting information from business documents has emerged as a critical task, finding applications across numerous domains.
no code implementations • 30 Nov 2023 • Lokesh Mishra, Cesar Berrospi, Kasper Dinkla, Diego Antognini, Francesco Fusco, Benedikt Bothur, Maksym Lysak, Nikolaos Livathinos, Ahmed Nassar, Panagiotis Vagenas, Lucas Morin, Christoph Auer, Michele Dolfi, Peter Staar
We present Deep Search DocQA.
no code implementations • 24 May 2023 • Christoph Auer, Ahmed Nassar, Maksym Lysak, Michele Dolfi, Nikolaos Livathinos, Peter Staar
The results demonstrate substantial progress towards achieving robust and highly generalizing methods for document layout understanding.
no code implementations • 5 May 2023 • Maksym Lysak, Ahmed Nassar, Nikolaos Livathinos, Christoph Auer, Peter Staar
The benefits of OTSL are that it reduces the number of tokens to 5 (HTML needs 28+) and shortens the sequence length to half of HTML on average.
1 code implementation • 8 Sep 2022 • Amit Alfassy, Assaf Arbelle, Oshri Halimi, Sivan Harary, Roei Herzig, Eli Schwartz, Rameswar Panda, Michele Dolfi, Christoph Auer, Kate Saenko, PeterW. J. Staar, Rogerio Feris, Leonid Karlinsky
However, as we show in this paper, FMs still have poor out-of-the-box performance on expert tasks (e. g. retrieval of car manuals technical illustrations from language queries), data for which is either unseen or belonging to a long-tail part of the data distribution of the huge datasets used for FM pre-training.
Ranked #1 on
Image-to-Text Retrieval
on FETA Car-Manuals
3 code implementations • 2 Jun 2022 • Birgit Pfitzmann, Christoph Auer, Michele Dolfi, Ahmed S Nassar, Peter W J Staar
Lastly, we compare models trained on PubLayNet, DocBank and DocLayNet, showing that layout predictions of the DocLayNet-trained models are more robust and thus the preferred choice for general-purpose document-layout analysis.
1 code implementation • 1 Jun 2022 • Christoph Auer, Michele Dolfi, André Carvalho, Cesar Berrospi Ramis, Peter W. J. Staar
In this paper, we focus on the case of document conversion to illustrate the particular challenges of scaling a complex data processing pipeline with a strong reliance on machine-learning methods on cloud infrastructure.
no code implementations • 18 Feb 2021 • Nikolaos Livathinos, Cesar Berrospi, Maksym Lysak, Viktor Kuropiatnyk, Ahmed Nassar, Andre Carvalho, Michele Dolfi, Christoph Auer, Kasper Dinkla, Peter Staar
In this paper, we present a novel approach to document structure recovery in PDF using recurrent neural networks to process the low-level PDF data representation directly, instead of relying on a visual re-interpretation of the rendered PDF page, as has been proposed in previous literature.
no code implementations • 19 Jul 2019 • Matteo Manica, Christoph Auer, Valery Weber, Federico Zipoli, Michele Dolfi, Peter Staar, Teodoro Laino, Costas Bekas, Akihiro Fujita, Hiroki Toda, Shuichi Hirose, Yasumitsu Orii
Information extraction and data mining in biochemical literature is a daunting task that demands resource-intensive computation and appropriate means to scale knowledge ingestion.
no code implementations • 24 May 2018 • Peter W J Staar, Michele Dolfi, Christoph Auer, Costas Bekas
In this paper, we present a modular, cloud-based platform to ingest documents at scale.
no code implementations • 15 May 2018 • Peter W J Staar, Michele Dolfi, Christoph Auer, Costas Bekas
We present a platform to ingest documents at scale which is powered by Machine Learning techniques and allows the user to train custom models on document collections.