Search Results for author: Ayush Maheshwari

Found 17 papers, 11 papers with code

LexGen: Domain-aware Multilingual Lexicon Generation

no code implementations18 May 2024 Karthika NJ, Ayush Maheshwari, Atul Kumar Singh, Preethi Jyothi, Ganesh Ramakrishnan, Krishnakant Bhatt

Owing to the research gap in lexicon generation, especially with a limited focus on the domain-specific area, we propose a new model to generate dictionary words for 6 Indian languages in the multi-domain setting.

FAIR: Filtering of Automatically Induced Rules

no code implementations23 Feb 2024 Divya Jyoti Bajpai, Ayush Maheshwari, Manjesh Kumar Hanawal, Ganesh Ramakrishnan

The availability of large annotated data can be a critical bottleneck in training machine learning algorithms successfully, especially when applied to diverse domains.

text-classification Text Classification

EIGEN: Expert-Informed Joint Learning Aggregation for High-Fidelity Information Extraction from Document Images

1 code implementation23 Nov 2023 Abhishek Singh, Venkatapathy Subramanian, Ayush Maheshwari, Pradeep Narayan, Devi Prasad Shetty, Ganesh Ramakrishnan

We empirically show that our EIGEN framework can significantly improve the performance of state-of-the-art deep models with the availability of very few labeled data instances.

Sāmayik: A Benchmark and Dataset for English-Sanskrit Translation

1 code implementation23 May 2023 Ayush Maheshwari, Ashim Gupta, Amrith Krishna, Atul Kumar Singh, Ganesh Ramakrishnan, G. Anil Kumar, Jitin Singla

Translation models trained on our dataset demonstrate statistically significant improvements when translating out-of-domain contemporary corpora, outperforming models trained on older classical-era poetry datasets.

Machine Translation Translation

A Benchmark and Dataset for Post-OCR text correction in Sanskrit

1 code implementation15 Nov 2022 Ayush Maheshwari, Nikhil Singh, Amrith Krishna, Ganesh Ramakrishnan

Keeping this in mind, we release a multi-domain dataset, from areas as diverse as astronomy, medicine and mathematics, with some of them as old as 18 centuries.

Astronomy Optical Character Recognition (OCR)

DICTDIS: Dictionary Constrained Disambiguation for Improved NMT

no code implementations13 Oct 2022 Ayush Maheshwari, Piyush Sharma, Preethi Jyothi, Ganesh Ramakrishnan

In this work we present \dictdis, a lexically constrained NMT system that disambiguates between multiple candidate translations derived from dictionaries.

Machine Translation NMT

UDAAN: Machine Learning based Post-Editing tool for Document Translation

1 code implementation3 Mar 2022 Ayush Maheshwari, Ajay Ravindran, Venkatapathy Subramanian, Ganesh Ramakrishnan

UDAAN has an end-to-end Machine Translation (MT) plus post-editing pipeline wherein users can upload a document to obtain raw MT output.

BIG-bench Machine Learning Document Translation +3

Adaptive Mixing of Auxiliary Losses in Supervised Learning

1 code implementation7 Feb 2022 Durga Sivasubramanian, Ayush Maheshwari, Pradeep Shenoy, Prathosh AP, Ganesh Ramakrishnan

In several supervised learning scenarios, auxiliary losses are used in order to introduce additional information or constraints into the supervised learning objective.

Denoising Knowledge Distillation +1

Error Correction in ASR using Sequence-to-Sequence Models

no code implementations2 Feb 2022 Samrat Dutta, Shreyansh Jain, Ayush Maheshwari, Souvik Pal, Ganesh Ramakrishnan, Preethi Jyothi

Post-editing in Automatic Speech Recognition (ASR) entails automatically correcting common and systematic errors produced by the ASR system.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

SPEAR : Semi-supervised Data Programming in Python

1 code implementation1 Aug 2021 Guttu Sai Abhishek, Harshad Ingole, Parth Laturia, Vineeth Dorna, Ayush Maheshwari, Rishabh Iyer, Ganesh Ramakrishnan

SPEAR facilitates weak supervision in the form of heuristics (or rules) and association of noisy labels to the training dataset.

text-classification Text Classification

Joint Learning of Hyperbolic Label Embeddings for Hierarchical Multi-label Classification

1 code implementation EACL 2021 Soumya Chatterjee, Ayush Maheshwari, Ganesh Ramakrishnan, Saketha Nath Jagaralpudi

Such a joint learning is expected to provide a twofold advantage: i) the classifier generalizes better as it leverages the prior knowledge of existence of a hierarchy over the labels, and ii) in addition to the label co-occurrence information, the label-embedding may benefit from the manifold structure of the input datapoints, leading to embeddings that are more faithful to the label hierarchy.

General Classification Hierarchical Multi-label Classification +1

Semi-Supervised Data Programming with Subset Selection

1 code implementation Findings (ACL) 2021 Ayush Maheshwari, Oishik Chatterjee, KrishnaTeja Killamsetty, Ganesh Ramakrishnan, Rishabh Iyer

The first contribution of this work is an introduction of a framework, \model which is a semi-supervised data programming paradigm that learns a \emph{joint model} that effectively uses the rules/labelling functions along with semi-supervised loss functions on the feature space.

text-classification Text Classification

Tale of tails using rule augmented sequence labeling for event extraction

no code implementations19 Aug 2019 Ayush Maheshwari, Hrishikesh Patel, Nandan Rathod, Ritesh Kumar, Ganesh Ramakrishnan, Pushpak Bhattacharyya

The problem of event extraction is a relatively difficult task for low resource languages due to the non-availability of sufficient annotated data.

Event Extraction

Entity Resolution and Location Disambiguation in the Ancient Hindu Temples Domain using Web Data

no code implementations NAACL 2018 Ayush Maheshwari, Vishwajeet Kumar, Ganesh Ramakrishnan, J. Saketha Nath

We present a system for resolving entities and disambiguating locations based on publicly available web data in the domain of ancient Hindu Temples.

Clustering Entity Resolution

Cannot find the paper you are looking for? You can Submit a new open access paper.