Search Results for author: Umapada Pal

Found 73 papers, 28 papers with code

SVGCraft: Beyond Single Object Text-to-SVG Synthesis with Comprehensive Canvas Layout

no code implementations30 Mar 2024 Ayan Banerjee, Nityanand Mathur, Josep Lladós, Umapada Pal, Anjan Dutta

In response, this work introduces SVGCraft, a novel end-to-end framework for the creation of vector graphics depicting entire scenes from textual descriptions.

Vector Graphics

GraphKD: Exploring Knowledge Distillation Towards Document Object Detection with Structured Graph Creation

1 code implementation17 Feb 2024 Ayan Banerjee, Sanket Biswas, Josep Lladós, Umapada Pal

Object detection in documents is a key step to automate the structural elements identification process in a digital or scanned document through understanding the hierarchical structure and relationships between different elements.

Knowledge Distillation object-detection +1

Static and Dynamic Synthesis of Bengali and Devanagari Signatures

no code implementations30 Jan 2024 Miguel A. Ferrer, Sukalpa Chanda, Moises Diaz, Chayan Kr. Banerjee, Anirban Majumdar, Cristina Carmona-Duarte, Parikshit Acharya, Umapada Pal

This paper aims to adapt this scheme for the generation of synthetic signatures in two Indic scripts, Bengali (Bangla), and Devanagari (Hindi).

Handwriting generation

A Layer-Wise Tokens-to-Token Transformer Network for Improved Historical Document Image Enhancement

1 code implementation6 Dec 2023 Risab Biswas, Swalpa Kumar Roy, Umapada Pal

Instead of using a simple ViT and hard splitting of images for the document image enhancement task, we employed a progressive tokenization technique to capture this local information from an image to achieve more effective results.

Binarization Image Enhancement

DocBinFormer: A Two-Level Transformer Network for Effective Document Image Binarization

no code implementations6 Dec 2023 Risab Biswas, Swalpa Kumar Roy, Ning Wang, Umapada Pal, Guang-Bin Huang

Instead of using a simple vision transformer block to extract information from the image patches, the proposed architecture uses two transformer blocks for greater coverage of the extracted feature space on a global and local scale.

Binarization

Diving into the Depths of Spotting Text in Multi-Domain Noisy Scenes

no code implementations1 Oct 2023 Alloy Das, Sanket Biswas, Umapada Pal, Josep Lladós

When used in a real-world noisy environment, the capacity to generalize to multiple domains is essential for any autonomous scene text spotting system.

Super-Resolution Text Spotting

FAST: Font-Agnostic Scene Text Editing

no code implementations5 Aug 2023 Alloy Das, Prasun Roy, Saumik Bhattacharya, Subhankar Ghosh, Umapada Pal, Michael Blumenstein

However, most of the existing STE methods show inferior editing performance because of (1) complex image backgrounds, (2) various font styles, and (3) varying word lengths within the text.

Scene Text Editing Style Transfer +1

DySTreSS: Dynamically Scaled Temperature in Self-Supervised Contrastive Learning

no code implementations2 Aug 2023 Siladittya Manna, Soumitri Chattopadhyay, Rakesh Dey, Saumik Bhattacharya, Umapada Pal

We propose a cosine similarity-dependent temperature scaling function to effectively optimize the distribution of the samples in the feature space.

Contrastive Learning

Scene Text Recognition with Image-Text Matching-guided Dictionary

no code implementations8 May 2023 Jiajun Wei, Hongjian Zhan, Xiao Tu, Yue Lu, Umapada Pal

Inspired by ITC, the SITM network combines the visual features and the text features of all candidates to identify the candidate with the minimum distance in the feature space.

Image-text matching Language Modelling +2

SwinDocSegmenter: An End-to-End Unified Domain Adaptive Transformer for Document Instance Segmentation

1 code implementation8 May 2023 Ayan Banerjee, Sanket Biswas, Josep Lladós, Umapada Pal

Instance-level segmentation of documents consists in assigning a class-aware and instance-aware label to each pixel of the image.

Instance Segmentation Segmentation +1

SelfDocSeg: A Self-Supervised vision-based Approach towards Document Segmentation

1 code implementation1 May 2023 Subhajit Maity, Sanket Biswas, Siladittya Manna, Ayan Banerjee, Josep Lladós, Saumik Bhattacharya, Umapada Pal

Document layout analysis is a known problem to the documents research community and has been vastly explored yielding a multitude of solutions ranging from text mining, and recognition to graph-based representation, visual feature extraction, etc.

Document Layout Analysis object-detection +1

ICDAR 2023 Video Text Reading Competition for Dense and Small Text

no code implementations10 Apr 2023 Weijia Wu, Yuzhong Zhao, Zhuang Li, Jiahong Li, Mike Zheng Shou, Umapada Pal, Dimosthenis Karatzas, Xiang Bai

In this competition report, we establish a video text reading benchmark, DSText, which focuses on dense and small text reading challenges in the video with various scenarios.

Task 2 Text Detection +2

A CNN Based Framework for Unistroke Numeral Recognition in Air-Writing

1 code implementation14 Mar 2023 Prasun Roy, Subhankar Ghosh, Umapada Pal

Air-writing refers to virtually writing linguistic characters through hand gestures in three-dimensional space with six degrees of freedom.

Segmentation Transfer Learning

Global Context-Aware Person Image Generation

no code implementations28 Feb 2023 Prasun Roy, Saumik Bhattacharya, Subhankar Ghosh, Umapada Pal, Michael Blumenstein

The proposed strategy enables us to synthesize semantically coherent realistic persons that can blend into an existing scene without altering the global context.

Image Generation

Effective Document Image Enhancement Using tokens-to-token Transformer Network

1 code implementation Preprint 2023 Risab Biswas, Swalpa Kumar Roy, Umapada Pal

Instead of using a simple ViT and hard splitting of images for the document image enhancement task, we employed a progressive tokeniza-tion technique to capture this local information from an image for achieving more effective results.

Binarization Image Enhancement

TIPS: Text-Induced Pose Synthesis

no code implementations24 Jul 2022 Prasun Roy, Subhankar Ghosh, Saumik Bhattacharya, Umapada Pal, Michael Blumenstein

In computer vision, human pose synthesis and transfer deal with probabilistic image generation of a person in a previously unseen pose from an already available observation of that person.

Descriptive Pose Transfer

SGBANet: Semantic GAN and Balanced Attention Network for Arbitrarily Oriented Scene Text Recognition

no code implementations21 Jul 2022 Dajian Zhong, Shujing Lyu, Palaiahnakote Shivakumara, Bing Yin, Jiajia Wu, Umapada Pal, Yue Lu

For target images (scene text images), the Semantic Generator Module generates simple semantic features that share the same feature distribution with support images (clear text images).

Image-to-Image Translation Scene Text Recognition

Scene Aware Person Image Generation through Global Contextual Conditioning

no code implementations6 Jun 2022 Prasun Roy, Subhankar Ghosh, Saumik Bhattacharya, Umapada Pal, Michael Blumenstein

Finally, the target image is generated from the refined skeleton using another generative network conditioned on a given image of the target person.

Generative Adversarial Network Image Generation

SWIS: Self-Supervised Representation Learning For Writer Independent Offline Signature Verification

no code implementations26 Feb 2022 Siladittya Manna, Soumitri Chattopadhyay, Saumik Bhattacharya, Umapada Pal

Writer independent offline signature verification is one of the most challenging tasks in pattern recognition as there is often a scarcity of training data.

Representation Learning Self-Supervised Learning

Multi-scale Attention Guided Pose Transfer

1 code implementation14 Feb 2022 Prasun Roy, Saumik Bhattacharya, Subhankar Ghosh, Umapada Pal

Pose transfer refers to the probabilistic image generation of a person with a previously unseen novel pose from another image of that person having a different pose.

Pose Transfer

MIO : Mutual Information Optimization using Self-Supervised Binary Contrastive Learning

no code implementations24 Nov 2021 Siladittya Manna, Umapada Pal, Saumik Bhattacharya

After 200 epochs of pre-training with ResNet-18 as the backbone, the proposed model achieves an accuracy of 86. 2\%, 58. 18\%, 77. 49\%, and 30. 87\% on CIFAR-10, CIFAR-100, STL-10, and Tiny-ImageNet datasets, respectively, and surpasses the SOTA contrastive baseline by 1. 23\%, 3. 57\%, 2. 00\%, and 0. 33\%, respectively.

Binary Classification Contrastive Learning

AGA-GAN: Attribute Guided Attention Generative Adversarial Network with U-Net for Face Hallucination

no code implementations20 Nov 2021 Abhishek Srivastava, Sukalpa Chanda, Umapada Pal

The performance of facial super-resolution methods relies on their ability to recover facial structures and salient features effectively.

Attribute Face Hallucination +3

Exploiting Multi-Scale Fusion, Spatial Attention and Patch Interaction Techniques for Text-Independent Writer Identification

1 code implementation20 Nov 2021 Abhishek Srivastava, Sukalpa Chanda, Umapada Pal

Our methods are based on the hypothesis that handwritten text images have specific spatial regions which are more unique to a writer's style, multi-scale features propagate characteristic features with respect to individual writers and patch-based features give more general and robust representations that helps to discriminate handwriting from different writers.

PAANet: Progressive Alternating Attention for Automatic Medical Image Segmentation

no code implementations20 Nov 2021 Abhishek Srivastava, Sukalpa Chanda, Debesh Jha, Michael A. Riegler, Pål Halvorsen, Dag Johansen, Umapada Pal

We develop progressive alternating attention dense (PAAD) blocks, which construct a guiding attention map (GAM) after every convolutional layer in the dense blocks using features from all scales.

Decision Making Image Segmentation +3

GradML: A Gradient-based Loss for Deep Metric Learning

no code implementations NeurIPS Workshop ICBINB 2021 Bhavya Vasudeva, Puneesh Deora, Saumik Bhattacharya, Umapada Pal, Sukalpa Chanda

Deep metric learning (ML) uses a carefully designed loss function to learn distance metrics for improving the discriminatory ability for tasks like clustering and retrieval.

Metric Learning Retrieval

LoOp: Looking for Optimal Hard Negative Embeddings for Deep Metric Learning

1 code implementation ICCV 2021 Bhavya Vasudeva, Puneesh Deora, Saumik Bhattacharya, Umapada Pal, Sukalpa Chanda

Deep metric learning has been effectively used to learn distance metrics for different visual tasks like image retrieval, clustering, etc.

Image Retrieval Metric Learning +1

Graph-based Deep Generative Modelling for Document Layout Generation

no code implementations9 Jul 2021 Sanket Biswas, Pau Riba, Josep Lladós, Umapada Pal

One of the major prerequisites for any deep learning approach is the availability of large-scale training data.

DocSynth: A Layout Guided Approach for Controllable Document Image Synthesis

1 code implementation6 Jul 2021 Sanket Biswas, Pau Riba, Josep Lladós, Umapada Pal

The results highlight that our model can successfully generate realistic and diverse document images with multiple objects.

Document Layout Analysis Image Generation

PLSM: A Parallelized Liquid State Machine for Unintentional Action Detection

1 code implementation6 May 2021 Dipayan Das, Saumik Bhattacharya, Umapada Pal, Sukalpa Chanda

Reservoir Computing (RC) offers a viable option to deploy AI algorithms on low-end embedded system platforms.

Action Detection

Self-Supervised Representation Learning for Detection of ACL Tear Injury in Knee MR Videos

1 code implementation15 Jul 2020 Siladittya Manna, Saumik Bhattacharya, Umapada Pal

In this paper, we propose a self-supervised learning approach to learn transferable features from MR video clips by enforcing the model to learn anatomical features.

Representation Learning Self-Supervised Learning

UDBNET: Unsupervised Document Binarization Network via Adversarial Game

1 code implementation14 Jul 2020 Amandeep Kumar, Shuvozit Ghose, Pinaki Nath Chowdhury, Partha Pratim Roy, Umapada Pal

In this paper, we present a novel approach towards document image binarization by introducing three-player min-max adversarial game.

Binarization

A New Unified Method for Detecting Text from Marathon Runners and Sports Players in Video

no code implementations26 May 2020 Sauradip Nag, Palaiahnakote Shivakumara, Umapada Pal, Tong Lu, Michael Blumenstein

The proposed method fuses gradient magnitude and direction coherence of text pixels in a new way for detecting candidate regions.

Clustering Text Detection

Modeling Extent-of-Texture Information for Ground Terrain Recognition

1 code implementation17 Apr 2020 Shuvozit Ghose, Pinaki Nath Chowdhury, Partha Pratim Roy, Umapada Pal

Ground Terrain Recognition is a difficult task as the context information varies significantly over the regions of a ground terrain image.

Image Classification

DELP-DAR System for License Plate Detection and Recognition

no code implementations4 Oct 2019 Zied Selmi, Mohamed Ben Halima, Umapada Pal, M. Adel Alimi

For this, we present in this paper an automatic framework for License Plate (LP) detection and recognition from complex scenes.

License Plate Detection

Distance Metric Learned Collaborative Representation Classifier

no code implementations3 May 2019 Tapabrata Chakraborti, Brendan McCane, Steven Mills, Umapada Pal

We present a simple effective way of achieving this by learning a generic Mahalanabis distance in a collaborative loss function in an end-to-end fashion with any standard convolutional network as the feature learner.

General Classification

PProCRC: Probabilistic Collaboration of Image Patches

no code implementations21 Mar 2019 Tapabrata Chakraborti, Brendan McCane, Steven Mills, Umapada Pal

We present a conditional probabilistic framework for collaborative representation of image patches.

Face Recognition

CoCoNet: A Collaborative Convolutional Network

no code implementations28 Jan 2019 Tapabrata Chakraborti, Brendan McCane, Steven Mills, Umapada Pal

We present an end-to-end deep network for fine-grained visual categorization called Collaborative Convolutional Network (CoCoNet).

Fine-Grained Visual Categorization Fine-Grained Visual Recognition +1

A Deep One-Shot Network for Query-based Logo Retrieval

2 code implementations4 Nov 2018 Ayan Kumar Bhunia, Ankan Kumar Bhunia, Shuvozit Ghose, Abhirup Das, Partha Pratim Roy, Umapada Pal

Logo detection in real-world scene images is an important problem with applications in advertisement and marketing.

Marketing object-detection +4

Effects of Degradations on Deep Neural Network Architectures

2 code implementations26 Jul 2018 Prasun Roy, Subhankar Ghosh, Saumik Bhattacharya, Umapada Pal

Deep convolutional neural networks (CNN) have massively influenced recent advances in large-scale image classification.

General Classification Image Classification

Bag-of-Visual-Words for Signature-Based Multi-Script Document Retrieval

no code implementations18 Jul 2018 Ranju Mandal, Partha Pratim Roy, Umapada Pal, Michael Blumenstein

Finally, three distance measures were used to match a query signature with the signature present in target documents for retrieval.

Retrieval

A New COLD Feature based Handwriting Analysis for Ethnicity/Nationality Identification

no code implementations19 Jun 2018 Sauradip Nag, Palaiahnakote Shivakumara, Wu Yirui, Umapada Pal, Tong Lu

For each line segment, the proposed method estimates angle and length, which gives a point in polar domain.

Learning Cross-Modal Deep Embeddings for Multi-Object Image Retrieval using Text and Sketch

no code implementations28 Apr 2018 Sounak Dey, Anjan Dutta, Suman K. Ghosh, Ernest Valveny, Josep Lladós, Umapada Pal

In this work we introduce a cross modal image retrieval system that allows both text and sketch as input modalities for the query.

Image Retrieval Retrieval

Indic Handwritten Script Identification using Offline-Online Multimodal Deep Network

no code implementations23 Feb 2018 Ayan Kumar Bhunia, Subham Mukherjee, Aneeshan Sain, Ankan Kumar Bhunia, Partha Pratim Roy, Umapada Pal

In this paper, we propose a novel approach of word-level Indic script identification using only character-level data in training stage.

Handwriting Trajectory Recovery using End-to-End Deep Encoder-Decoder Network

no code implementations22 Jan 2018 Ayan Kumar Bhunia, Abir Bhowmick, Ankan Kumar Bhunia, Aishik Konwer, Prithaj Banerjee, Partha Pratim Roy, Umapada Pal

Our encoder module consists of Convolutional LSTM network, which takes an offline character image as the input and encodes the feature sequence to a hidden representation.

Retrieval

Word Level Font-to-Font Image Translation using Convolutional Recurrent Generative Adversarial Networks

no code implementations22 Jan 2018 Ankan Kumar Bhunia, Ayan Kumar Bhunia, Prithaj Banerjee, Aishik Konwer, Abir Bhowmick, Partha Pratim Roy, Umapada Pal

We employ a novel convolutional recurrent model architecture in the Generator that efficiently deals with the word images of arbitrary width.

Translation

Script Identification in Natural Scene Image and Video Frame using Attention based Convolutional-LSTM Network

1 code implementation1 Jan 2018 Ankan Kumar Bhunia, Aishik Konwer, Ayan Kumar Bhunia, Abir Bhowmick, Partha P. Roy, Umapada Pal

In this paper, we propose a novel method that involves extraction of local and global features using CNN-LSTM framework and weighting them dynamically for script identification.

Cross-language Framework for Word Recognition and Spotting of Indic Scripts

no code implementations19 Dec 2017 Ayan Kumar Bhunia, Partha Pratim Roy, Akash Mohta, Umapada Pal

This paper presents a novel cross language platform for handwritten word recognition and spotting for such low-resource scripts where training is performed with a sufficiently large dataset of an available script (considered as source script) and testing is done on other scripts (considered as target script).

Zone-based Keyword Spotting in Bangla and Devanagari Documents

no code implementations5 Dec 2017 Ayan Kumar Bhunia, Partha Pratim Roy, Umapada Pal

Also, we propose a novel feature combining foreground and background information of text line images for keyword-spotting by character filler models.

Keyword Spotting Segmentation

LOOP Descriptor: Local Optimal Oriented Pattern

no code implementations25 Oct 2017 Tapabrata Chakraborti, Brendan McCane, Steven Mills, Umapada Pal

This letter introduces the LOOP binary descriptor (local optimal oriented pattern) that encodes rotation invariance into the main formulation itself.

Word Searching in Scene Image and Video Frame in Multi-Script Scenario using Dynamic Shape Coding

no code implementations18 Aug 2017 Partha Pratim Roy, Ayan Kumar Bhunia, Avirup Bhattacharyya, Umapada Pal

To evaluate the proposed system for searching keyword from natural scene image and video frames, we have considered two popular Indic scripts such as Bangla (Bengali) and Devanagari along with English.

Keyword Spotting Optical Character Recognition (OCR) +2

HMM-based Indic Handwritten Word Recognition using Zone Segmentation

no code implementations1 Aug 2017 Partha Pratim Roy, Ayan Kumar Bhunia, Ayan Das, Prasenjit Dey, Umapada Pal

To avoid character segmentation in such scripts, HMM-based sequence modeling has been used earlier in holistic way.

Segmentation

Multi-Oriented Text Detection and Verification in Video Frames and Scene Images

no code implementations22 Jul 2017 Aneeshan Sain, Ayan Kumar Bhunia, Partha Pratim Roy, Umapada Pal

Until now only a few methods have been proposed that look into curved text detection in video frames, wherein lies our novelty.

Clustering Curved Text Detection +3

HMM-based Writer Identification in Music Score Documents without Staff-Line Removal

no code implementations21 Jul 2017 Partha Pratim Roy, Ayan Kumar Bhunia, Umapada Pal

A novel Factor Analysis based feature selection technique is applied in sliding window features to reduce the noise appearing from staff lines which proves efficiency in writer identification performance. In our framework we have also proposed a novel score line detection approach in musical sheet using HMM.

feature selection Line Detection

Date-Field Retrieval in Scene Image and Video Frames using Text Enhancement and Shape Coding

no code implementations21 Jul 2017 Partha Pratim Roy, Ayan Kumar Bhunia, Umapada Pal

We propose a line based date spotting approach using Hidden Markov Model (HMM) which is used to detect the date information in a given text.

Information Retrieval Retrieval

Text Recognition in Scene Image and Video Frame using Color Channel Selection

no code implementations21 Jul 2017 Ayan Kumar Bhunia, Gautam Kumar, Partha Pratim Roy, R. Balasubramanian, Umapada Pal

In this paper, we present a novel approach based on color channel selection for text recognition from scene images and video frames.

Binarization Optical Character Recognition (OCR)

Product Graph-based Higher Order Contextual Similarities for Inexact Subgraph Matching

no code implementations1 Feb 2017 Anjan Dutta, Josep Lladós, Horst Bunke, Umapada Pal

Many algorithms formulate graph matching as an optimization of an objective function of pairwise quantification of nodes and edges of two graphs to be matched.

Graph Matching

Local Binary Pattern for Word Spotting in Handwritten Historical Document

no code implementations20 Apr 2016 Sounak Dey, Anguelos Nicolaou, Josep Llados, Umapada Pal

Digital libraries store images which can be highly degraded and to index this kind of images we resort to word spot- ting as our information retrieval system.

Information Retrieval Retrieval

Cannot find the paper you are looking for? You can Submit a new open access paper.