Search Results

Operationalizing a National Digital Library: The Case for a Norwegian Transformer Model

2 code implementations NoDaLiDa 2021

In this work, we show the process of building a large-scale training set from digital and digitized collections at a national library.

Language Modeling Language Modelling +2

DiGIT: Multi-Dilated Gated Encoder and Central-Adjacent Region Integrated Decoder for Temporal Action Detection Transformer

1 code implementation9 May 2025

Our proposed encoder reduces the redundant information caused by multi-level features while maintaining the ability to capture fine-grained and long-range temporal information.

Action Detection Decoder +2

Fast Autofocusing using Tiny Transformer Networks for Digital Holographic Microscopy

1 code implementation15 Mar 2022

Tiny DL models are proposed and compared such as a tiny Vision Transformer (TViT), tiny VGG16 (TVGG) and a tiny Swin-Transfomer (TSwinT).

Robust 6DoF Pose Estimation Against Depth Noise and a Comprehensive Evaluation on a Mobile Dataset

3 code implementations24 Sep 2023

In this paper, we extensively investigate the robustness of existing RGBD-based 6DoF pose estimation methods against varying levels of depth sensor noise.

3D Object Tracking Object +2

Deep Learning Framework for Measuring the Digital Strategy of Companies from Earnings Calls

1 code implementation COLING 2020

Companies today are racing to leverage the latest digital technologies, such as artificial intelligence, blockchain, and cloud computing.

Cloud Computing Clustering +2

Nougat: Neural Optical Understanding for Academic Documents

3 code implementations25 Aug 2023

Scientific knowledge is predominantly stored in books and scientific journals, often in the form of PDFs.

Optical Character Recognition Optical Character Recognition (OCR)

LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding

8 code implementations ACL 2021

Pre-training of text and layout has proved effective in a variety of visually-rich document understanding tasks due to its effective model architecture and the advantage of large-scale unlabeled scanned/digital-born documents.

Document Image Classification Document Layout Analysis +7

Supervised Multimodal Bitransformers for Classifying Images and Text

6 code implementations6 Sep 2019

Self-supervised bidirectional transformer models such as BERT have led to dramatic improvements in a wide variety of textual classification tasks.

 Ranked #1 on Natural Language Inference on V-SNLI (using extra training data)

General Classification Natural Language Inference