1 code implementation • ACL 2022 • FatemehSadat Mireshghallah, Kartik Goyal, Taylor Berg-Kirkpatrick
Recent work on controlled text generation has either required attribute-based fine-tuning of the base language model (LM), or has restricted the parameterization of the attribute discriminator to be compatible with the base autoregressive LM.
no code implementations • 15 Nov 2023 • Davis Yoshida, Kartik Goyal, Kevin Gimpel
Contrastingly, we argue that degenerate modes can even occur in the absence of any modeling error, due to contamination of the training data.
no code implementations • 12 Jun 2023 • Nikolai Vogler, Kartik Goyal, Kishore PV Reddy, Elizaveta Pertseva, Samuel V. Lemley, Christopher N. Warren, Max G'Sell, Taylor Berg-Kirkpatrick
Specifically, we focus on matching uniquely damaged character type-imprints in anonymously printed books to works with known printers in order to provide evidence of their origins.
1 code implementation • 24 Mar 2022 • FatemehSadat Mireshghallah, Kartik Goyal, Taylor Berg-Kirkpatrick
Recent work on controlled text generation has either required attribute-based fine-tuning of the base language model (LM), or has restricted the parameterization of the attribute discriminator to be compatible with the base autoregressive LM.
no code implementations • 8 Mar 2022 • FatemehSadat Mireshghallah, Kartik Goyal, Archit Uniyal, Taylor Berg-Kirkpatrick, Reza Shokri
The wide adoption and application of Masked language models~(MLMs) on sensitive data (from legal to medical) necessitates a thorough quantitative investigation into their privacy vulnerabilities -- to what extent do MLMs leak information about their training data?
no code implementations • ICLR 2022 • Kartik Goyal, Chris Dyer, Taylor Berg-Kirkpatrick
While recent work has shown that scores from models trained by the ubiquitous masked language modeling (MLM) objective effectively discriminate probable from improbable sequences, it is still an open question if these MLMs specify a principled probability distribution over the space of possible sequences.
no code implementations • ACL 2020 • Kartik Goyal, Chris Dyer, Christopher Warren, Max G'Sell, Taylor Berg-Kirkpatrick
We show that our approach outperforms rigid interpretable clustering baselines (Ocular) and overly-flexible deep generative models (VAE) alike on the task of completely unsupervised discovery of typefaces in mixed-font documents.
no code implementations • NAACL 2019 • Kartik Goyal, Chris Dyer, Taylor Berg-Kirkpatrick
Globally normalized neural sequence models are considered superior to their locally normalized equivalents because they may ameliorate the effects of label bias.
no code implementations • 1 Aug 2017 • Kartik Goyal, Graham Neubig, Chris Dyer, Taylor Berg-Kirkpatrick
In experiments, we show that optimizing this new training objective yields substantially better results on two sequence tasks (Named Entity Recognition and CCG Supertagging) when compared with both cross entropy trained greedy decoding and cross entropy trained beam decoding baselines.
Ranked #3 on Motion Segmentation on Hopkins155
no code implementations • ACL 2017 • Kartik Goyal, Chris Dyer, Taylor Berg-Kirkpatrick
We demonstrate that a continuous relaxation of the argmax operation can be used to create a differentiable approximation to greedy decoding for sequence-to-sequence (seq2seq) models.
1 code implementation • COLING 2016 • David R. Mortensen, Patrick Littell, Akash Bharadwaj, Kartik Goyal, Chris Dyer, Lori Levin
This paper contributes to a growing body of evidence that{---}when coupled with appropriate machine-learning techniques{--}linguistically motivated, information-rich representations can outperform one-hot encodings of linguistic data.
no code implementations • COLING 2016 • Patrick Littell, Kartik Goyal, David R. Mortensen, Alexa Little, Chris Dyer, Lori Levin
This paper describes our construction of named-entity recognition (NER) systems in two Western Iranian languages, Sorani Kurdish and Tajik, as a part of a pilot study of {``}Linguistic Rapid Response{''} to potential emergency humanitarian relief situations.
no code implementations • LREC 2016 • Patrick Littell, David R. Mortensen, Kartik Goyal, Chris Dyer, Lori Levin
In Sorani Kurdish, one of the most useful orthographic features in named-entity recognition {--} capitalization {--} is absent, as the language{'}s Perso-Arabic script does not make a distinction between uppercase and lowercase letters.