no code implementations • EMNLP 2021 • Avijit Thawani, Jay Pujara, Filip Ilievski
This paper studies the effect of using six different number encoders on the task of masked word prediction (MWP), as a proxy for evaluating literacy.
1 code implementation • insights (ACL) 2022 • Dipesh Kumar, Avijit Thawani
BPE tokenization merges characters into longer tokens by finding frequently occurring contiguous patterns within the word boundary.
1 code implementation • 17 Oct 2023 • Avijit Thawani, Saurabh Ghanekar, Xiaoyuan Zhu, Jay Pujara
Language models typically tokenize text into subwords, using a deterministic, hand-engineered heuristic of combining characters into longer surface-level strings such as 'ing' or whole words.
no code implementations • 9 Oct 2023 • Avijit Thawani, Jay Pujara, Ashwin Kalyan
Despite recent successes in language models, their ability to represent numbers is insufficient.
no code implementations • NAACL 2021 • Avijit Thawani, Jay Pujara, Pedro A. Szekely, Filip Ilievski
NLP systems rarely give special consideration to numbers found in text.
1 code implementation • WS 2019 • Avijit Thawani, Biplav Srivastava, Anil Singh
Downstream evaluation of pretrained word embeddings is expensive, more so for tasks where current state of the art models are very large architectures.
no code implementations • IJCNLP 2017 • Anil Kumar Singh, Avijit Thawani, Mayank Panchal, Anubhav Gupta, Julian McAuley
Unlike Entity Disambiguation in web search results, Opinion Disambiguation is a relatively unexplored topic.