Search Results for author: David Mimno

Found 36 papers, 15 papers with code

‘Tecnologica cosa’: Modeling Storyteller Personalities in Boccaccio’s ‘Decameron’

no code implementations EMNLP (LaTeCHCLfL, CLFL, LaTeCH) 2021 A. Cooper, Maria Antoniak, Christopher De Sa, Marilyn Migiel, David Mimno

We explore Boccaccio’s Decameron to see how digital humanities tools can be used for tasks that have limited data in a language no longer in contemporary use: medieval Italian.

Stronger Random Baselines for In-Context Learning

2 code implementations19 Apr 2024 Gregory Yauney, David Mimno

Evaluating the in-context learning classification performance of language models poses challenges due to small dataset sizes, extensive prompt-selection using the validation set, and intentionally difficult tasks that lead to near-random performance.

In-Context Learning

The Afterlives of Shakespeare and Company in Online Social Readership

1 code implementation14 Jan 2024 Maria Antoniak, David Mimno, Rosamond Thalken, Melanie Walsh, Matthew Wilkens, Gregory Yauney

Meanwhile, the digitization of the lending library records of Shakespeare and Company provides a window into the reading activity of an earlier, smaller community in interwar Paris.

Data Similarity is Not Enough to Explain Language Model Performance

1 code implementation15 Nov 2023 Gregory Yauney, Emily Reif, David Mimno

Large language models achieve high performance on many but not all downstream tasks.

Language Modelling

Modeling Legal Reasoning: LM Annotation at the Edge of Human Agreement

1 code implementation27 Oct 2023 Rosamond Thalken, Edward H. Stiglitz, David Mimno, Matthew Wilkens

Our findings generally sound a note of caution in the use of generative LMs on complex tasks without fine-tuning and point to the continued relevance of human annotation-intensive classification methods.

Jurisprudence Legal Reasoning +1

Contextualized Topic Coherence Metrics

1 code implementation23 May 2023 Hamed Rahimi, Jacob Louis Hoover, David Mimno, Hubert Naacke, Camelia Constantin, Bernd Amann

The recent explosion in work on neural topic modeling has been criticized for optimizing automated topic evaluation metrics at the expense of actual meaningful topic identification.

Topic Models

A Pretrainer's Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity

no code implementations22 May 2023 Shayne Longpre, Gregory Yauney, Emily Reif, Katherine Lee, Adam Roberts, Barret Zoph, Denny Zhou, Jason Wei, Kevin Robinson, David Mimno, Daphne Ippolito

Second, we explore the effect of quality and toxicity filters, showing a trade-off between performance on standard benchmarks and risk of toxic generations.


Sensemaking About Contraceptive Methods Across Online Platforms

1 code implementation23 Jan 2023 LeAnn McDowall, Maria Antoniak, David Mimno

Selecting a birth control method is a complex healthcare decision.

Breaking BERT: Evaluating and Optimizing Sparsified Attention

no code implementations7 Oct 2022 Siddhartha Brahma, Polina Zablotskaia, David Mimno

Transformers allow attention between all pairs of tokens, but there is reason to believe that most of these connections - and their quadratic time and memory - may not be necessary.

On-the-Fly Rectification for Robust Large-Vocabulary Topic Inference

no code implementations12 Nov 2021 Moontae Lee, Sungjun Cho, Kun Dong, David Mimno, David Bindel

Across many data domains, co-occurrence statistics about the joint appearance of objects are powerfully informative.

Community Detection

Tecnologica cosa: Modeling Storyteller Personalities in Boccaccio's Decameron

no code implementations22 Sep 2021 A. Feder Cooper, Maria Antoniak, Christopher De Sa, Marilyn Migiel, David Mimno

We explore Boccaccio's Decameron to see how digital humanities tools can be used for tasks that have limited data in a language no longer in contemporary use: medieval Italian.

Comparing Text Representations: A Theory-Driven Approach

1 code implementation EMNLP 2021 Gregory Yauney, David Mimno

Much of the progress in contemporary NLP has come from learning representations, such as masked language model (MLM) contextual embeddings, that turn challenging problems into simple classification tasks.

Language Modelling Learning Theory +1

Bad Seeds: Evaluating Lexical Methods for Bias Measurement

1 code implementation ACL 2021 Maria Antoniak, David Mimno

A common factor in bias measurement methods is the use of hand-curated seed lexicons, but there remains little guidance for their selection.

Domain-Specific Lexical Grounding in Noisy Visual-Textual Documents

1 code implementation EMNLP 2020 Gregory Yauney, Jack Hessel, David Mimno

Images can give us insights into the contextual meanings of words, but current image-text grounding approaches require detailed annotations.

Clustering object-detection +2

Topic Modeling with Contextualized Word Representation Clusters

no code implementations23 Oct 2020 Laure Thompson, David Mimno

Clustering token-level contextualized word representations produces output that shares many similarities with topic models for English text collections.

Clustering Topic Models +1

Practical Correlated Topic Modeling and Analysis via the Rectified Anchor Word Algorithm

no code implementations IJCNLP 2019 Moontae Lee, Sungjun Cho, David Bindel, David Mimno

Despite great scalability on large data and their ability to understand correlations between topics, spectral topic models have not been widely used due to the absence of reliability in real data and lack of practical implementations.

Topic Models

How we do things with words: Analyzing text as social and cultural data

no code implementations2 Jul 2019 Dong Nguyen, Maria Liakata, Simon DeDeo, Jacob Eisenstein, David Mimno, Rebekah Tromble, Jane Winters

Second, we hope to provide a set of best practices for working with thick social and cultural concepts.

Unsupervised Discovery of Multimodal Links in Multi-image, Multi-sentence Documents

2 code implementations IJCNLP 2019 Jack Hessel, Lillian Lee, David Mimno

Images and text co-occur constantly on the web, but explicit links between images and sentences (or other intra-document textual units) are often not present.


Authorless Topic Models: Biasing Models Away from Known Structure

1 code implementation COLING 2018 Laure Thompson, David Mimno

Most previous work in unsupervised semantic modeling in the presence of metadata has assumed that our goal is to make latent dimensions more correlated with metadata, but in practice the exact opposite is often true.

Document Classification Topic Models +1

Prior-aware Dual Decomposition: Document-specific Topic Inference for Spectral Topic Models

no code implementations19 Nov 2017 Moontae Lee, David Bindel, David Mimno

Spectral topic modeling algorithms operate on matrices/tensors of word co-occurrence statistics to learn topic-specific word distributions.

Topic Models

Low-dimensional Embeddings for Interpretable Anchor-based Topic Inference

no code implementations EMNLP 2014 Moontae Lee, David Mimno

The anchor words algorithm performs provably efficient topic model inference by finding an approximate convex hull in a high-dimensional word co-occurrence space.

Quantifying the Effects of Text Duplication on Semantic Models

no code implementations EMNLP 2017 Alex Schofield, ra, Laure Thompson, David Mimno

Duplicate documents are a pervasive problem in text datasets and can have a strong effect on unsupervised models.

Cats and Captions vs. Creators and the Clock: Comparing Multimodal Content to Context in Predicting Relative Popularity

1 code implementation6 Mar 2017 Jack Hessel, Lillian Lee, David Mimno

The content of today's social media is becoming more and more rich, increasingly mixing text, images, videos, and audio.

Robust Spectral Inference for Joint Stochastic Matrix Factorization

no code implementations NeurIPS 2015 Moontae Lee, David Bindel, David Mimno

Spectral inference provides fast algorithms and provable optimality for latent topic analysis.

Beyond Exchangeability: The Chinese Voting Process

no code implementations NeurIPS 2016 Moontae Lee, Seok Hyun Jin, David Mimno

Many online communities present user-contributed responses such as reviews of products and answers to questions.


A Practical Algorithm for Topic Modeling with Provable Guarantees

2 code implementations19 Dec 2012 Sanjeev Arora, Rong Ge, Yoni Halpern, David Mimno, Ankur Moitra, David Sontag, Yichen Wu, Michael Zhu

Topic models provide a useful method for dimensionality reduction and exploratory data analysis in large text corpora.

Dimensionality Reduction Topic Models

Cannot find the paper you are looking for? You can Submit a new open access paper.