no code implementations • EMNLP (LaTeCHCLfL, CLFL, LaTeCH) 2021 • A. Cooper, Maria Antoniak, Christopher De Sa, Marilyn Migiel, David Mimno
We explore Boccaccio’s Decameron to see how digital humanities tools can be used for tasks that have limited data in a language no longer in contemporary use: medieval Italian.
no code implementations • 31 Jan 2024 • Rebecca M. M. Hicke, David Mimno
Coreference annotation and resolution is a vital component of computational literary studies.
1 code implementation • 14 Jan 2024 • Maria Antoniak, David Mimno, Rosamond Thalken, Melanie Walsh, Matthew Wilkens, Gregory Yauney
Meanwhile, the digitization of the lending library records of Shakespeare and Company provides a window into the reading activity of an earlier, smaller community in interwar Paris.
1 code implementation • 29 Nov 2023 • Andrea W Wen-Yi, David Mimno
This ubiquitous layer of language models is often overlooked.
1 code implementation • 15 Nov 2023 • Gregory Yauney, Emily Reif, David Mimno
Large language models achieve high performance on many but not all downstream tasks.
1 code implementation • 27 Oct 2023 • Rosamond Thalken, Edward H. Stiglitz, David Mimno, Matthew Wilkens
Our findings generally sound a note of caution in the use of generative LMs on complex tasks without fine-tuning and point to the continued relevance of human annotation-intensive classification methods.
no code implementations • 27 Oct 2023 • Rebecca M. M. Hicke, David Mimno
Large language models have shown breakthrough potential in many NLP domains.
1 code implementation • 23 May 2023 • Hamed Rahimi, Jacob Louis Hoover, David Mimno, Hubert Naacke, Camelia Constantin, Bernd Amann
The recent explosion in work on neural topic modeling has been criticized for optimizing automated topic evaluation metrics at the expense of actual meaningful topic identification.
no code implementations • 22 May 2023 • Shayne Longpre, Gregory Yauney, Emily Reif, Katherine Lee, Adam Roberts, Barret Zoph, Denny Zhou, Jason Wei, Kevin Robinson, David Mimno, Daphne Ippolito
Second, we explore the effect of quality and toxicity filters, showing a trade-off between performance on standard benchmarks and risk of toxic generations.
1 code implementation • 23 Jan 2023 • LeAnn McDowall, Maria Antoniak, David Mimno
Selecting a birth control method is a complex healthcare decision.
no code implementations • 7 Oct 2022 • Siddhartha Brahma, Polina Zablotskaia, David Mimno
Transformers allow attention between all pairs of tokens, but there is reason to believe that most of these connections - and their quadratic time and memory - may not be necessary.
no code implementations • 5 Oct 2022 • Jacob Eisenstein, Daniel Andor, Bernd Bohnet, Michael Collins, David Mimno
But what sorts of rationales are useful and how can we train systems to produce them?
no code implementations • 12 Nov 2021 • Moontae Lee, Sungjun Cho, Kun Dong, David Mimno, David Bindel
Across many data domains, co-occurrence statistics about the joint appearance of objects are powerfully informative.
no code implementations • 22 Sep 2021 • A. Feder Cooper, Maria Antoniak, Christopher De Sa, Marilyn Migiel, David Mimno
We explore Boccaccio's Decameron to see how digital humanities tools can be used for tasks that have limited data in a language no longer in contemporary use: medieval Italian.
1 code implementation • EMNLP 2021 • Gregory Yauney, David Mimno
Much of the progress in contemporary NLP has come from learning representations, such as masked language model (MLM) contextual embeddings, that turn challenging problems into simple classification tasks.
1 code implementation • ACL 2021 • Maria Antoniak, David Mimno
A common factor in bias measurement methods is the use of hand-curated seed lexicons, but there remains little guidance for their selection.
1 code implementation • EMNLP 2020 • Gregory Yauney, Jack Hessel, David Mimno
Images can give us insights into the contextual meanings of words, but current image-text grounding approaches require detailed annotations.
no code implementations • 23 Oct 2020 • Laure Thompson, David Mimno
Clustering token-level contextualized word representations produces output that shares many similarities with topic models for English text collections.
no code implementations • IJCNLP 2019 • Moontae Lee, Sungjun Cho, David Bindel, David Mimno
Despite great scalability on large data and their ability to understand correlations between topics, spectral topic models have not been widely used due to the absence of reliability in real data and lack of practical implementations.
no code implementations • 2 Jul 2019 • Dong Nguyen, Maria Liakata, Simon DeDeo, Jacob Eisenstein, David Mimno, Rebekah Tromble, Jane Winters
Second, we hope to provide a set of best practices for working with thick social and cultural concepts.
2 code implementations • IJCNLP 2019 • Jack Hessel, Lillian Lee, David Mimno
Images and text co-occur constantly on the web, but explicit links between images and sentences (or other intra-document textual units) are often not present.
1 code implementation • COLING 2018 • Laure Thompson, David Mimno
Most previous work in unsupervised semantic modeling in the presence of metadata has assumed that our goal is to make latent dimensions more correlated with metadata, but in practice the exact opposite is often true.
1 code implementation • NAACL 2018 • Jack Hessel, David Mimno, Lillian Lee
Multimodal machine learning algorithms aim to learn visual-textual correspondences.
no code implementations • TACL 2018 • Maria Antoniak, David Mimno
Word embeddings are increasingly being used as a tool to study word associations in specific corpora.
no code implementations • 19 Nov 2017 • Moontae Lee, David Bindel, David Mimno
Spectral topic modeling algorithms operate on matrices/tensors of word co-occurrence statistics to learn topic-specific word distributions.
no code implementations • EMNLP 2014 • Moontae Lee, David Mimno
The anchor words algorithm performs provably efficient topic model inference by finding an approximate convex hull in a high-dimensional word co-occurrence space.
no code implementations • EMNLP 2017 • David Mimno, Laure Thompson
Despite their ubiquity, word embeddings trained with skip-gram negative sampling (SGNS) remain poorly understood.
no code implementations • EMNLP 2017 • Alex Schofield, ra, Laure Thompson, David Mimno
Duplicate documents are a pervasive problem in text datasets and can have a strong effect on unsupervised models.
no code implementations • EACL 2017 • Alex Schofield, ra, M{\aa}ns Magnusson, David Mimno
It is often assumed that topic models benefit from the use of a manually curated stopword list.
1 code implementation • 6 Mar 2017 • Jack Hessel, Lillian Lee, David Mimno
The content of today's social media is becoming more and more rich, increasingly mixing text, images, videos, and audio.
no code implementations • NeurIPS 2015 • Moontae Lee, David Bindel, David Mimno
Spectral inference provides fast algorithms and provable optimality for latent topic analysis.
no code implementations • NeurIPS 2016 • Moontae Lee, Seok Hyun Jin, David Mimno
Many online communities present user-contributed responses such as reviews of products and answers to questions.
no code implementations • TACL 2016 • Alex Schofield, ra, David Mimno
Rule-based stemmers such as the Porter stemmer are frequently used to preprocess English corpora for topic modeling.
2 code implementations • 19 Dec 2012 • Sanjeev Arora, Rong Ge, Yoni Halpern, David Mimno, Ankur Moitra, David Sontag, Yichen Wu, Michael Zhu
Topic models provide a useful method for dimensionality reduction and exploratory data analysis in large text corpora.