Search Results for author: Avijit Thawani

Found 7 papers, 3 papers with code

Numeracy enhances the Literacy of Language Models

no code implementations EMNLP 2021 Avijit Thawani, Jay Pujara, Filip Ilievski

This paper studies the effect of using six different number encoders on the task of masked word prediction (MWP), as a proxy for evaluating literacy.

Sentence

BPE beyond Word Boundary: How NOT to use Multi Word Expressions in Neural Machine Translation

1 code implementation insights (ACL) 2022 Dipesh Kumar, Avijit Thawani

BPE tokenization merges characters into longer tokens by finding frequently occurring contiguous patterns within the word boundary.

Machine Translation NMT +1

Learn Your Tokens: Word-Pooled Tokenization for Language Modeling

1 code implementation17 Oct 2023 Avijit Thawani, Saurabh Ghanekar, Xiaoyuan Zhu, Jay Pujara

Language models typically tokenize text into subwords, using a deterministic, hand-engineered heuristic of combining characters into longer surface-level strings such as 'ing' or whole words.

Language Modelling

Estimating Numbers without Regression

no code implementations9 Oct 2023 Avijit Thawani, Jay Pujara, Ashwin Kalyan

Despite recent successes in language models, their ability to represent numbers is insufficient.

Language Modelling regression

SWOW-8500: Word Association task for Intrinsic Evaluation of Word Embeddings

1 code implementation WS 2019 Avijit Thawani, Biplav Srivastava, Anil Singh

Downstream evaluation of pretrained word embeddings is expensive, more so for tasks where current state of the art models are very large architectures.

General Classification Natural Language Inference +4

Cannot find the paper you are looking for? You can Submit a new open access paper.