Search Results for author: Dinei Florencio

Found 16 papers, 6 papers with code

Precision Enhancement of 3D Surfaces from Multiple Compressed Depth Maps

no code implementations • 25 Feb 2014 • Pengfei Wan, Gene Cheung, Philip A. Chou, Dinei Florencio, Cha Zhang, Oscar C. Au

In texture-plus-depth representation of a 3D scene, depth maps from different camera viewpoints are typically lossily compressed via the classical transform coding / coefficient quantization paradigm.

Quantization

Paper
Add Code

Speech Enhancement In Multiple-Noise Conditions using Deep Neural Networks

2 code implementations • 9 May 2016 • Anurag Kumar, Dinei Florencio

In this paper we consider the problem of speech enhancement in real-world like conditions where multiple noises can simultaneously corrupt speech.

Sound

Paper
Code

Joint Denoising / Compression of Image Contours via Shape Prior and Context Tree

no code implementations • 30 Apr 2017 • Amin Zheng, Gene Cheung, Dinei Florencio

We first prove theoretically that in general a joint denoising / compression approach can outperform a separate two-stage approach that first denoises then encodes contours lossily.

Action Recognition Denoising +3

Paper
Add Code

Foreground Detection in Camouflaged Scenes

no code implementations • 11 Jul 2017 • Shuai Li, Dinei Florencio, Yaqin Zhao, Chris Cook, Wanqing Li

This paper proposes a texture guided weighted voting (TGWV) method which can efficiently detect foreground objects in camouflaged scenes.

Paper
Add Code

Deep Learning Based Speech Beamforming

no code implementations • 15 Feb 2018 • Kaizhi Qian, Yang Zhang, Shiyu Chang, Xuesong Yang, Dinei Florencio, Mark Hasegawa-Johnson

On the other hand, deep learning based enhancement approaches are able to learn complicated speech distributions and perform efficient inference, but they are unable to deal with variable number of input channels.

Speech Enhancement

Paper
Add Code

A Fusion Framework for Camouflaged Moving Foreground Detection in the Wavelet Domain

no code implementations • 16 Apr 2018 • Shuai Li, Dinei Florencio, Wanqing Li, Yaqin Zhao, Chris Cook

Conventional methods cannot distinguish the foreground from background due to the small differences between them and thus suffer from under-detection of the camouflaged foreground objects.

Paper
Add Code

RePr: Improved Training of Convolutional Filters

1 code implementation • CVPR 2019 • Aaditya Prakash, James Storer, Dinei Florencio, Cha Zhang

We show that by temporarily pruning and then restoring a subset of the model's filters, and repeating this process cyclically, overlap in the learned features is reduced, producing improved generalization.

Paper
Code

TAP: Text-Aware Pre-training for Text-VQA and Text-Caption

1 code implementation • CVPR 2021 • Zhengyuan Yang, Yijuan Lu, JianFeng Wang, Xi Yin, Dinei Florencio, Lijuan Wang, Cha Zhang, Lei Zhang, Jiebo Luo

Due to this aligned representation learning, even pre-trained on the same downstream task dataset, TAP already boosts the absolute accuracy on the TextVQA dataset by +5. 4%, compared with a non-TAP baseline.

Caption Generation Language Modelling +5

Paper
Code

LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding

5 code implementations • ACL 2021 • Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou

Pre-training of text and layout has proved effective in a variety of visually-rich document understanding tasks due to its effective model architecture and the advantage of large-scale unlabeled scanned/digital-born documents.

Ranked #1 on Key Information Extraction on SROIE

Document Image Classification Document Layout Analysis +6

124,593

Paper
Code

LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding

6 code implementations • 18 Apr 2021 • Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei

In this paper, we present LayoutXLM, a multimodal pre-trained model for multilingual document understanding, which aims to bridge the language barriers for visually-rich document understanding.

Ranked #13 on Document Image Classification on RVL-CDIP

Document Image Classification document understanding

124,593

Paper
Code

TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models

2 code implementations • 21 Sep 2021 • Minghao Li, Tengchao Lv, Jingye Chen, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei

Text recognition is a long-standing research problem for document digitalization.

Ranked #3 on Handwritten Text Recognition on IAM

Handwritten Text Recognition Language Modelling +4

124,593

Paper
Code

Improving Structured Text Recognition with Regular Expression Biasing

no code implementations • 10 Nov 2021 • Baoguang Shi, WenFeng Cheng, Yijuan Lu, Cha Zhang, Dinei Florencio

We study the problem of recognizing structured text, i. e. text that follows certain formats, and propose to improve the recognition accuracy of structured text by specifying regular expressions (regexes) for biasing.

Paper
Add Code

Understanding Long Documents with Different Position-Aware Attentions

no code implementations • 17 Aug 2022 • Hai Pham, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang

Despite several successes in document understanding, the practical task for long document understanding is largely under-explored due to several challenges in computation and how to efficiently absorb long multimodal input.

document understanding Position

Paper
Add Code

Diffusion-based Document Layout Generation

no code implementations • 19 Mar 2023 • Liu He, Yijuan Lu, John Corring, Dinei Florencio, Cha Zhang

Our empirical analysis shows that our diffusion-based approach is comparable to or outperforming other previous methods for layout generation across various document datasets.

Paper
Add Code

From Characters to Words: Hierarchical Pre-trained Language Model for Open-vocabulary Language Understanding

no code implementations • 23 May 2023 • Li Sun, Florian Luisier, Kayhan Batmanghelich, Dinei Florencio, Cha Zhang

Current state-of-the-art models for natural language understanding require a preprocessing step to convert raw text into discrete tokens.

Language Modelling Natural Language Understanding

Paper
Add Code

XFUND: A Benchmark Dataset for Multilingual Visually Rich Form Understanding

no code implementations • Findings (ACL) 2022 • Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei

Multimodal pre-training with text, layout, and image has achieved SOTA performance for visually rich document understanding tasks recently, which demonstrates the great potential for joint learning across different modalities.

document understanding

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.