In this work, we show the process of building a large-scale training set from digital and digitized collections at a national library.
Our proposed encoder reduces the redundant information caused by multi-level features while maintaining the ability to capture fine-grained and long-range temporal information.
Tiny DL models are proposed and compared such as a tiny Vision Transformer (TViT), tiny VGG16 (TVGG) and a tiny Swin-Transfomer (TSwinT).
In this paper, we extensively investigate the robustness of existing RGBD-based 6DoF pose estimation methods against varying levels of depth sensor noise.
Digital twin is a problem of augmenting real objects with their digital counterparts.
Companies today are racing to leverage the latest digital technologies, such as artificial intelligence, blockchain, and cloud computing.
Text recognition is a long-standing research problem for document digitalization.
Ranked #1 on
Handwritten Text Recognition
on IAM(line-level)
(using extra training data)
Scientific knowledge is predominantly stored in books and scientific journals, often in the form of PDFs.
Optical Character Recognition
Optical Character Recognition (OCR)
Pre-training of text and layout has proved effective in a variety of visually-rich document understanding tasks due to its effective model architecture and the advantage of large-scale unlabeled scanned/digital-born documents.
Ranked #1 on
Key Information Extraction
on SROIE
Self-supervised bidirectional transformer models such as BERT have led to dramatic improvements in a wide variety of textual classification tasks.
Ranked #1 on
Natural Language Inference
on V-SNLI
(using extra training data)