Search Results for author: Anoop Kunchukuttan

Found 53 papers, 10 papers with code

Aksharantar: Towards building open transliteration tools for the next billion users

no code implementations6 May 2022 Yash Madhani, Sushane Parthan, Priyanka Bedekar, Ruchi Khapra, Vivek Seshadri, Anoop Kunchukuttan, Pratyush Kumar, Mitesh M. Khapra

We introduce a new, large, diverse testset for Indic language transliteration containing 103k words pairs spanning 19 languages that enables fine-grained analysis of transliteration models.

Transliteration

IndicXNLI: Evaluating Multilingual Inference for Indian Languages

1 code implementation19 Apr 2022 Divyanshu Aggarwal, Vivek Gupta, Anoop Kunchukuttan

While Indic NLP has made rapid advances recently in terms of the availability of corpora and pre-trained models, benchmark datasets on standard NLU tasks are limited.

Cross-Lingual Transfer Machine Translation +1

Towards Building ASR Systems for the Next Billion Users

no code implementations6 Nov 2021 Tahir Javed, Sumanth Doddapaneni, Abhigyan Raman, Kaushal Santosh Bhogale, Gowtham Ramesh, Anoop Kunchukuttan, Pratyush Kumar, Mitesh M. Khapra

Second, using this raw speech data we pretrain several variants of wav2vec style models for 40 Indian languages.

An Empirical Investigation of Multi-bridge Multilingual NMT models

no code implementations14 Oct 2021 Anoop Kunchukuttan

In this paper, we present an extensive investigation of multi-bridge, many-to-many multilingual NMT models (MB-M2M) ie., models trained on non-English language pairs in addition to English-centric language pairs.

Translation

Samanantar: The Largest Publicly Available Parallel Corpora Collection for 11 Indic Languages

1 code implementation12 Apr 2021 Gowtham Ramesh, Sumanth Doddapaneni, Aravinth Bheemaraj, Mayank Jobanputra, Raghavan AK, Ajitesh Sharma, Sujit Sahoo, Harshita Diddee, Mahalakshmi J, Divyanshu Kakwani, Navneet Kumar, Aswin Pradeep, Srihari Nagaraj, Kumar Deepak, Vivek Raghavan, Anoop Kunchukuttan, Pratyush Kumar, Mitesh Shantadevi Khapra

We mine the parallel sentences from the web by combining many corpora, tools, and methods: (a) web-crawled monolingual corpora, (b) document OCR for extracting sentences from scanned documents, (c) multilingual representation models for aligning sentences, and (d) approximate nearest neighbor search for searching in a large collection of sentences.

Machine Translation Multilingual NLP +1

A Large-scale Evaluation of Neural Machine Transliteration for Indic Languages

1 code implementation EACL 2021 Anoop Kunchukuttan, Siddharth Jain, Rahul Kejriwal

We take up the task of large-scale evaluation of neural machine transliteration between English and Indic languages, with a focus on multilingual transliteration to utilize orthographic similarity between Indian languages.

Translation Transliteration

Multilingual Neural Machine Translation

no code implementations COLING 2020 Raj Dabre, Chenhui Chu, Anoop Kunchukuttan

The advent of neural machine translation (NMT) has opened up exciting research in building multilingual translation systems i. e. translation models that can handle more than one language pair.

Machine Translation Transfer Learning +1

AI4Bharat-IndicNLP Corpus: Monolingual Corpora and Word Embeddings for Indic Languages

2 code implementations30 Apr 2020 Anoop Kunchukuttan, Divyanshu Kakwani, Satish Golla, Gokul N. C., Avik Bhattacharyya, Mitesh M. Khapra, Pratyush Kumar

We present the IndicNLP corpus, a large-scale, general-domain corpus containing 2. 7 billion words for 10 Indian languages from two language families.

Word Embeddings

Learning Geometric Word Meta-Embeddings

no code implementations WS 2020 Pratik Jawanpuria, N T V Satya Dev, Anoop Kunchukuttan, Bamdev Mishra

We propose a geometric framework for learning meta-embeddings of words from different embedding sources.

Word Similarity

Utilizing Language Relatedness to improve Machine Translation: A Case Study on Languages of the Indian Subcontinent

no code implementations19 Mar 2020 Anoop Kunchukuttan, Pushpak Bhattacharyya

To the best of our knowledge, this is the first large-scale study specifically devoted to utilizing language relatedness to improve translation between related languages.

Machine Translation Translation

A Comprehensive Survey of Multilingual Neural Machine Translation

no code implementations4 Jan 2020 Raj Dabre, Chenhui Chu, Anoop Kunchukuttan

We present a survey on multilingual neural machine translation (MNMT), which has gained a lot of traction in the recent years.

Machine Translation Transfer Learning +1

Overview of the 6th Workshop on Asian Translation

no code implementations WS 2019 Toshiaki Nakazawa, Nobushige Doi, Shohei Higashiyama, Chenchen Ding, Raj Dabre, Hideya Mino, Isao Goto, Win Pa Pa, Anoop Kunchukuttan, Yusuke Oda, Shantipriya Parida, Ond{\v{r}}ej Bojar, Sadao Kurohashi

This paper presents the results of the shared tasks from the 6th workshop on Asian translation (WAT2019) including Ja↔En, Ja↔Zh scientific paper translation subtasks, Ja↔En, Ja↔Ko, Ja↔En patent translation subtasks, Hi↔En, My↔En, Km↔En, Ta↔En mixed domain subtasks and Ru↔Ja news commentary translation task.

Translation

A Brief Survey of Multilingual Neural Machine Translation

no code implementations14 May 2019 Raj Dabre, Chenhui Chu, Anoop Kunchukuttan

We present a survey on multilingual neural machine translation (MNMT), which has gained a lot of traction in the recent years.

Machine Translation Transfer Learning +1

McTorch, a manifold optimization library for deep learning

1 code implementation3 Oct 2018 Mayank Meghwanshi, Pratik Jawanpuria, Anoop Kunchukuttan, Hiroyuki Kasai, Bamdev Mishra

In this paper, we introduce McTorch, a manifold optimization library for deep learning that extends PyTorch.

Learning Multilingual Word Embeddings in Latent Metric Space: A Geometric Approach

2 code implementations TACL 2019 Pratik Jawanpuria, Arjun Balgovind, Anoop Kunchukuttan, Bamdev Mishra

Our approach decouples learning the transformation from the source language to the target language into (a) learning rotations for language-specific embeddings to align them to a common space, and (b) learning a similarity metric in the common space to model similarities between the embeddings.

Bilingual Lexicon Induction Multilingual Word Embeddings +4

Comparing Recurrent and Convolutional Architectures for English-Hindi Neural Machine Translation

no code implementations WS 2017 S. Singh, hya, Ritesh Panjwani, Anoop Kunchukuttan, Pushpak Bhattacharyya

In this paper, we empirically compare the two encoder-decoder neural machine translation architectures: convolutional sequence to sequence model (ConvS2S) and recurrent sequence to sequence model (RNNS2S) for English-Hindi language pair as part of IIT Bombay{'}s submission to WAT2017 shared task.

Image Captioning Language Modelling +4

IIT Bombay's English-Indonesian submission at WAT: Integrating Neural Language Models with SMT

no code implementations WS 2016 S. Singh, hya, Anoop Kunchukuttan, Pushpak Bhattacharyya

The Neural Probabilistic Language Model (NPLM) gave relatively high BLEU points for Indonesian to English translation system while the Neural Network Joint Model (NNJM) performed better for English to Indonesian direction of translation system.

Language Modelling Machine Translation +1

Faster decoding for subword level Phrase-based SMT between related languages

no code implementations WS 2016 Anoop Kunchukuttan, Pushpak Bhattacharyya

The increase in length is also impacted by the specific choice of data format for representing the sentences as subwords.

Translation

Learning variable length units for SMT between related languages via Byte Pair Encoding

no code implementations WS 2017 Anoop Kunchukuttan, Pushpak Bhattacharyya

We explore the use of segments learnt using Byte Pair Encoding (referred to as BPE units) as basic units for statistical machine translation between related languages and compare it with orthographic syllables, which are currently the best performing basic units for this translation task.

Machine Translation Translation

Orthographic Syllable as basic unit for SMT between Related Languages

no code implementations EMNLP 2016 Anoop Kunchukuttan, Pushpak Bhattacharyya

We explore the use of the orthographic syllable, a variable-length consonant-vowel sequence, as a basic unit of translation between related languages which use abugida or alphabetic scripts.

Translation

Shata-Anuvadak: Tackling Multiway Translation of Indian Languages

no code implementations LREC 2014 Anoop Kunchukuttan, Abhijit Mishra, Rajen Chatterjee, Ritesh Shah, Pushpak Bhattacharyya

We present a compendium of 110 Statistical Machine Translation systems built from parallel corpora of 11 Indian languages belonging to both Indo-Aryan and Dravidian families.

Translation Transliteration

Experiences in Resource Generation for Machine Translation through Crowdsourcing

no code implementations LREC 2012 Anoop Kunchukuttan, Shourya Roy, Pratik Patel, Kushal Ladha, Somya Gupta, Mitesh M. Khapra, Pushpak Bhattacharyya

The logistics of collecting resources for Machine Translation (MT) has always been a cause of concern for some of the resource deprived languages of the world.

Machine Translation Translation

Cannot find the paper you are looking for? You can Submit a new open access paper.