Search Results for author: Robert West

Found 94 papers, 62 papers with code

Meat-Free Day Reduces Greenhouse Gas Emissions but Poses Challenges for Customer Retention and Adherence to Dietary Guidelines

no code implementations2 Apr 2025 Giuseppe Russo, Kristina Gligorić, Vincent Moreau, Robert West

Moreover, the increase in plant-based meals did not carry over to subsequent days, as evidenced by a 3. 5% rebound in animal-based meal consumption on days immediately following treated days.

Controlling Latent Diffusion Using Latent CLIP

1 code implementation11 Mar 2025 Jason Becker, Chris Wendler, Peter Baylies, Robert West, Christian Wressnegger

We train Latent-CLIP on 2. 7B pairs of latent images and descriptive texts, and show that it matches zero-shot classification performance of similarly sized CLIP models on both the ImageNet benchmark and a LDM-generated version of it, demonstrating its effectiveness in assessing both real and generated content.

Denoising Descriptive +1

Generating Structured Outputs from Language Models: Benchmark and Studies

2 code implementations18 Jan 2025 Saibo Geng, Hudson Cooper, Michał Moskal, Samuel Jenkins, Julian Berman, Nathan Ranchin, Robert West, Eric Horvitz, Harsha Nori

Constrained decoding has emerged as the dominant technology across sectors for enforcing structured outputs during generation.

Assessing Social Alignment: Do Personality-Prompted Large Language Models Behave Like Humans?

no code implementations21 Dec 2024 Ivan Zakazov, Mikolaj Boronski, Lorenzo Drudi, Robert West

The ongoing revolution in language modeling has led to various novel applications, some of which rely on the emerging social abilities of large language models (LLMs).

Diversity Language Modeling +1

Byte BPE Tokenization as an Inverse string Homomorphism

no code implementations4 Dec 2024 Saibo Geng, Sankalp Gambhir, Chris Wendler, Robert West

Tokenization is an important preprocessing step in the training and inference of large language models (LLMs).

Controllable Context Sensitivity and the Knob Behind It

1 code implementation11 Nov 2024 Julian Minder, Kevin Du, Niklas Stoehr, Giovanni Monea, Chris Wendler, Robert West, Ryan Cotterell

In this paper, we search for a knob which controls this sensitivity, determining whether language models answer from the context or their prior knowledge.

Question Answering Retrieval-augmented Generation

Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders

1 code implementation28 Oct 2024 Viacheslav Surkov, Chris Wendler, Mikhail Terekhov, Justin Deschenaux, Robert West, Caglar Gulcehre

We investigated the possibility of using SAEs to learn interpretable features for a few-step text-to-image diffusion models, such as SDXL Turbo.

Denoising

Activation Scaling for Steering and Interpreting Language Models

no code implementations7 Oct 2024 Niklas Stoehr, Kevin Du, Vésteinn Snæbjarnarson, Robert West, Ryan Cotterell, Aaron Schein

Given the prompt "Rome is in", can we steer a language model to flip its prediction of an incorrect token "France" to a correct token "Italy" by only multiplying a few relevant activation vectors with scalars?

Language Modeling Language Modelling

Entity Insertion in Multilingual Linked Corpora: The Case of Wikipedia

1 code implementation5 Oct 2024 Tomás Feith, Akhil Arora, Martin Gerlach, Debjit Paul, Robert West

Links are a fundamental part of information networks, turning isolated pieces of knowledge into a network of information that is much richer than the sum of its parts.

Could ChatGPT get an Engineering Degree? Evaluating Higher Education Vulnerability to AI Assistants

no code implementations7 Aug 2024 Beatriz Borges, Negar Foroutan, Deniz Bayazit, Anna Sotnikova, Syrielle Montariol, Tanya Nazaretzky, Mohammadreza Banaei, Alireza Sakhaeirad, Philippe Servant, Seyed Parsa Neshaei, Jibril Frej, Angelika Romanou, Gail Weiss, Sepideh Mamooler, Zeming Chen, Simin Fan, Silin Gao, Mete Ismayilzada, Debjit Paul, Alexandre Schöpfer, Andrej Janchevski, Anja Tiede, Clarence Linden, Emanuele Troiani, Francesco Salvi, Freya Behrens, Giacomo Orsi, Giovanni Piccioli, Hadrien Sevel, Louis Coulon, Manuela Pineros-Rodriguez, Marin Bonnassies, Pierre Hellich, Puck van Gerwen, Sankalp Gambhir, Solal Pirelli, Thomas Blanchard, Timothée Callens, Toni Abi Aoun, Yannick Calvino Alonso, Yuri Cho, Alberto Chiappa, Antonio Sclocchi, Étienne Bruno, Florian Hofhammer, Gabriel Pescia, Geovani Rizk, Leello Dadi, Lucas Stoffl, Manoel Horta Ribeiro, Matthieu Bovel, Yueyang Pan, Aleksandra Radenovic, Alexandre Alahi, Alexander Mathis, Anne-Florence Bitbol, Boi Faltings, Cécile Hébert, Devis Tuia, François Maréchal, George Candea, Giuseppe Carleo, Jean-Cédric Chappelier, Nicolas Flammarion, Jean-Marie Fürbringer, Jean-Philippe Pellet, Karl Aberer, Lenka Zdeborová, Marcel Salathé, Martin Jaggi, Martin Rajman, Mathias Payer, Matthieu Wyart, Michael Gastpar, Michele Ceriotti, Ola Svensson, Olivier Lévêque, Paolo Ienne, Rachid Guerraoui, Robert West, Sanidhya Kashyap, Valerio Piazza, Viesturs Simanis, Viktor Kuncak, Volkan Cevher, Philippe Schwaller, Sacha Friedli, Patrick Jermann, Tanja Käser, Antoine Bosselut

We investigate the potential scale of this vulnerability by measuring the degree to which AI assistants can complete assessment questions in standard university-level STEM courses.

A Logical Fallacy-Informed Framework for Argument Generation

1 code implementation7 Aug 2024 Luca Mouchel, Debjit Paul, Shaobo Cui, Robert West, Antoine Bosselut, Boi Faltings

Despite the remarkable performance of Large Language Models (LLMs) in natural language processing tasks, they still struggle with generating logically sound arguments, resulting in potential risks such as spreading misinformation.

Logical Fallacies Misinformation

Self-Recognition in Language Models

1 code implementation9 Jul 2024 Tim R. Davidson, Viacheslav Surkov, Veniamin Veselovsky, Giuseppe Russo, Robert West, Caglar Gulcehre

Instead, our results suggest that given a set of alternatives, LMs seek to pick the "best" answer, regardless of its origin.

Multiple-choice

The AI Alignment Paradox

no code implementations31 May 2024 Robert West, Roland Aydin

The field of AI alignment aims to steer AI systems toward human goals, preferences, and ethical principles.

Fleet of Agents: Coordinated Problem Solving with Large Language Models

1 code implementation7 May 2024 Lars Klein, Nearchos Potamitis, Roland Aydin, Robert West, Caglar Gulcehre, Akhil Arora

While numerous frameworks have been developed to enhance the reasoning abilities of large language models (LLMs), there is a scarcity of methods that effectively balance the trade-off between cost and quality.

Navigate

Edisum: Summarizing and Explaining Wikipedia Edits at Scale

1 code implementation4 Apr 2024 Marija Šakota, Isaac Johnson, Guosheng Feng, Robert West

To overcome this problem and help editors write useful edit summaries, we propose a model for recommending edit summaries generated by a language model trained to produce good edit summaries given the representation of an edit diff.

Language Modeling Language Modelling

Can Language Models Recognize Convincing Arguments?

no code implementations31 Mar 2024 Paula Rescala, Manoel Horta Ribeiro, Tiancheng Hu, Robert West

The capabilities of large language models (LLMs) have raised concerns about their potential to create and propagate convincing narratives.

Misinformation

Agentic AI: The Era of Semantic Decoding

no code implementations21 Mar 2024 Maxime Peyrard, Martin Josifoski, Robert West

We refer to these orchestrated interactions among semantic processors, optimizing and searching in semantic space, as semantic decoding algorithms.

Emojinize: Enriching Any Text with Emoji Translations

no code implementations6 Mar 2024 Lars Henning Klein, Roland Aydin, Robert West

Emoji have become ubiquitous in written communication, on the Web and beyond.

Cloze Test

Making Reasoning Matter: Measuring and Improving Faithfulness of Chain-of-Thought Reasoning

1 code implementation21 Feb 2024 Debjit Paul, Robert West, Antoine Bosselut, Boi Faltings

In this paper, we perform a causal mediation analysis on twelve LLMs to examine how intermediate reasoning steps generated by the LLM influence the final outcome and find that LLMs do not reliably use their intermediate reasoning steps when generating an answer.

counterfactual

Symbolic Autoencoding for Self-Supervised Sequence Learning

no code implementations16 Feb 2024 Mohammad Hossein Amani, Nicolas Mario Baldwin, Amin Mansouri, Martin Josifoski, Maxime Peyrard, Robert West

Traditional language models, adept at next-token prediction in text sequences, often struggle with transduction tasks between distinct symbolic systems, particularly when parallel data is scarce.

Weakly-supervised Learning

Do Llamas Work in English? On the Latent Language of Multilingual Transformers

1 code implementation16 Feb 2024 Chris Wendler, Veniamin Veselovsky, Giovanni Monea, Robert West

Tracking intermediate embeddings through their high-dimensional space reveals three distinct phases, whereby intermediate embeddings (1) start far away from output token embeddings; (2) already allow for decoding a semantically correct next token in the middle layers, but give higher probability to its version in English than in the input language; (3) finally move into an input-language-specific region of the embedding space.

Sketch-Guided Constrained Decoding for Boosting Blackbox Large Language Models without Logit Access

1 code implementation18 Jan 2024 Saibo Geng, Berkay Döner, Chris Wendler, Martin Josifoski, Robert West

This paper introduces sketch-guided constrained decoding (SGCD), a novel approach to constrained decoding for blackbox LLMs, which operates without access to the logits of the blackbox LLM.

Constituency Parsing Language Modeling +2

A Glitch in the Matrix? Locating and Detecting Language Model Grounding with Fakepedia

1 code implementation4 Dec 2023 Giovanni Monea, Maxime Peyrard, Martin Josifoski, Vishrav Chaudhary, Jason Eisner, Emre Kiciman, Hamid Palangi, Barun Patra, Robert West

We present a novel method to study grounding abilities using Fakepedia, a novel dataset of counterfactual texts constructed to clash with a model's internal parametric knowledge.

counterfactual Language Modeling +3

SoK: Memorization in General-Purpose Large Language Models

no code implementations24 Oct 2023 Valentin Hartmann, Anshuman Suri, Vincent Bindschaedler, David Evans, Shruti Tople, Robert West

A major part of this success is due to their huge training datasets and the unprecedented number of model parameters, which allow them to memorize large amounts of information contained in the training data.

Memorization Question Answering

Prevalence and prevention of large language model use in crowd work

no code implementations24 Oct 2023 Veniamin Veselovsky, Manoel Horta Ribeiro, Philip Cozzolino, Andrew Gordon, David Rothschild, Robert West

We show that the use of large language models (LLMs) is prevalent among crowd workers, and that targeted mitigation strategies can significantly reduce, but not eliminate, LLM use.

Language Modeling Language Modelling +2

Stranger Danger! Cross-Community Interactions with Fringe Users Increase the Growth of Fringe Communities on Reddit

no code implementations18 Oct 2023 Giuseppe Russo, Manoel Horta Ribeiro, Robert West

Overall, our findings suggest that curtailing fringe-interactions may reduce the growth of fringe communities on mainstream platforms.

Causal Inference

In-class Data Analysis Replications: Teaching Students while Testing Science

no code implementations31 Aug 2023 Kristina Gligoric, Tiziano Piccardi, Jake Hofman, Robert West

Overall, we demonstrate that incorporating replication tasks into a large data science class can increase the reproducibility of scientific work as a by-product of data science instruction, thus benefiting both science and students.

Critical Evaluation of Artificial Intelligence as Digital Twin of Pathologist for Prostate Cancer Pathology

no code implementations23 Aug 2023 Okyaz Eminaga, Mahmoud Abbas, Christian Kunder, Yuri Tolkach, Ryan Han, James D. Brooks, Rosalie Nolley, Axel Semjonow, Martin Boegemann, Robert West, Jin Long, Richard Fan, Olaf Bettendorf

Adjusting the decision threshold for the secondary Gleason pattern from 5% to 10% improved the concordance level between pathologists and vPatho for tumor grading on prostatectomy specimens (kappa from 0. 44 to 0. 64).

Fly-Swat or Cannon? Cost-Effective Language Model Choice via Meta-Modeling

2 code implementations11 Aug 2023 Marija Šakota, Maxime Peyrard, Robert West

For a wide variety of tasks, inputs can be phrased as natural language prompts for an LM, from whose output the solution can then be extracted.

Language Modeling Language Modelling

Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks

1 code implementation13 Jun 2023 Veniamin Veselovsky, Manoel Horta Ribeiro, Robert West

With the widespread adoption of LLMs, human gold--standard annotations are key to understanding the capabilities of LLMs and the validity of their results.

text-classification Text Classification

Grammar-Constrained Decoding for Structured NLP Tasks without Finetuning

2 code implementations23 May 2023 Saibo Geng, Martin Josifoski, Maxime Peyrard, Robert West

In this work, we demonstrate that formal grammars can describe the output space for a much wider range of tasks and argue that GCD can serve as a unified framework for structured NLP tasks in general.

Code Generation Constituency Parsing +2

REFINER: Reasoning Feedback on Intermediate Representations

1 code implementation4 Apr 2023 Debjit Paul, Mete Ismayilzada, Maxime Peyrard, Beatriz Borges, Antoine Bosselut, Robert West, Boi Faltings

Language models (LMs) have recently shown remarkable performance on reasoning tasks by explicitly generating intermediate inferences, e. g., chain-of-thought prompting.

Exploiting Asymmetry for Synthetic Training Data Generation: SynthIE and the Case of Information Extraction

1 code implementation7 Mar 2023 Martin Josifoski, Marija Sakota, Maxime Peyrard, Robert West

This work shows that useful data can be synthetically generated even for tasks that cannot be solved directly by LLMs: for problems with structured outputs, it is possible to prompt an LLM to perform the task in the reverse direction, by generating plausible input text for a target output structure.

Synthetic Data Generation

Evolutionary Dynamics in a Varying Environment: Continuous versus Discrete Noise

no code implementations19 Dec 2022 Ami Taitelbaum, Robert West, Mauro Mobilia, Michael Assaf

Here, we study population dynamics subject to a fluctuating environment modeled by a varying carrying capacity changing continuously in time according to either binary random switches, or by being driven by a noise of continuous range.

Language Model Decoding as Likelihood-Utility Alignment

1 code implementation13 Oct 2022 Martin Josifoski, Maxime Peyrard, Frano Rajic, Jiheng Wei, Debjit Paul, Valentin Hartmann, Barun Patra, Vishrav Chaudhary, Emre Kiciman, Boi Faltings, Robert West

Specifically, by analyzing the correlation between the likelihood and the utility of predictions across a diverse set of tasks, we provide empirical evidence supporting the proposed taxonomy and a set of principles to structure reasoning when choosing a decoding algorithm.

Language Modeling Language Modelling +2

An Ordinal Latent Variable Model of Conflict Intensity

1 code implementation8 Oct 2022 Niklas Stoehr, Lucas Torroba Hennigen, Josef Valvoda, Robert West, Ryan Cotterell, Aaron Schein

It is based only on the action category ("what") and disregards the subject ("who") and object ("to whom") of an event, as well as contextual information, like associated casualty count, that should contribute to the perception of an event's "intensity".

Event Extraction

Distribution inference risks: Identifying and mitigating sources of leakage

2 code implementations18 Sep 2022 Valentin Hartmann, Léo Meynent, Maxime Peyrard, Dimitrios Dimitriadis, Shruti Tople, Robert West

We identify three sources of leakage: (1) memorizing specific information about the $\mathbb{E}[Y|X]$ (expected label given the feature values) of interest to the adversary, (2) wrong inductive bias of the model, and (3) finiteness of the training data.

Inductive Bias

The Glass Ceiling of Automatic Evaluation in Natural Language Generation

no code implementations31 Aug 2022 Pierre Colombo, Maxime Peyrard, Nathan Noiry, Robert West, Pablo Piantanida

Automatic evaluation metrics capable of replacing human judgments are critical to allowing fast development of new methods.

Text Generation

ESC-Rules: Explainable, Semantically Constrained Rule Sets

1 code implementation26 Aug 2022 Martin Glauer, Robert West, Susan Michie, Janna Hastings

We describe a novel approach to explainable prediction of a continuous variable based on learning fuzzy weighted rules.

United States Politicians' Tone Became More Negative with 2016 Primary Campaigns

1 code implementation17 Jul 2022 Jonathan Külz, Andreas Spitz, Ahmad Abu-Akel, Stephan Günnemann, Robert West

There is a widespread belief that the tone of US political language has become more negative recently, in particular when Donald Trump entered politics.

Quote Erat Demonstrandum: A Web Interface for Exploring the Quotebank Corpus

no code implementations7 Jul 2022 Vuk Vuković, Akhil Arora, Huan-Cheng Chang, Andreas Spitz, Robert West

The use of attributed quotes is the most direct and least filtered pathway of information propagation in news.

Strong Heuristics for Named Entity Linking

1 code implementation NAACL (ACL) 2022 Marko Čuljak, Andreas Spitz, Robert West, Akhil Arora

Named entity linking (NEL) in news is a challenging endeavour due to the frequency of unseen and emerging entities, which necessitates the use of unsupervised or zero-shot methods.

Entity Linking

Efficient Entity Candidate Generation for Low-Resource Languages

1 code implementation LREC 2022 Alberto García-Durán, Akhil Arora, Robert West

We also propose a light-weight and simple solution based on the construction of indexes whose design is motivated by more complex transfer learning based neural approaches.

Cross-Lingual Entity Linking Entity Linking +1

A Critical Re-evaluation of Neural Methods for Entity Alignment

1 code implementation PVLDB 2022 Manuel Leone, Stefano Huber, Akhil Arora, Alberto García-Durán, Robert West

Our findings shed light on the potential problems resulting from an impulsive application of neural methods as a panacea for all data analytics tasks.

Entity Alignment Entity Resolution +1

On the Context-Free Ambiguity of Emoji

1 code implementation17 Jan 2022 Justyna Czestochowska, Kristina Gligoric, Maxime Peyrard, Yann Mentha, Michal Bien, Andrea Grutter, Anita Auer, Aris Xanthos, Robert West

We find that with 30 annotations per emoji, 16 emojis (1. 2%) are completely unambiguous, whereas 55 emojis (4. 3%) are so ambiguous that their descriptions are indistinguishable from randomly chosen descriptions.

Homepage2Vec: Language-Agnostic Website Embedding and Classification

1 code implementation10 Jan 2022 Sylvain Lugeon, Tiziano Piccardi, Robert West

We make publicly available the curated Curlie dataset aligned across languages, the pre-trained Homepage2Vec model, and libraries

Classification

GenIE: Generative Information Extraction

1 code implementation NAACL 2022 Martin Josifoski, Nicola De Cao, Maxime Peyrard, Fabio Petroni, Robert West

Structured and grounded representation of text is typically formalized by closed information extraction, the problem of extracting an exhaustive set of (subject, relation, object) triplets that are consistent with a predefined set of entities and relations from a knowledge base schema.

Relation Extraction

Better than Average: Paired Evaluation of NLP Systems

1 code implementation ACL 2021 Maxime Peyrard, Wei Zhao, Steffen Eger, Robert West

Evaluation in NLP is usually done by comparing the scores of competing systems independently averaged over a common set of test instances.

Invariant Language Modeling

1 code implementation16 Oct 2021 Maxime Peyrard, Sarvjeet Singh Ghotra, Martin Josifoski, Vidhan Agarwal, Barun Patra, Dean Carignan, Emre Kiciman, Robert West

In particular, we adapt a game-theoretic formulation of IRM (IRM-games) to language models, where the invariance emerges from a specific training schedule in which all the environments compete to optimize their own environment-specific loss by updating subsets of the model in a round-robin fashion.

Domain Generalization Language Modeling +1

Laughing Heads: Can Transformers Detect What Makes a Sentence Funny?

1 code implementation19 May 2021 Maxime Peyrard, Beatriz Borges, Kristina Gligorić, Robert West

We make progress in both respects by training and analyzing transformer-based humor recognition models on a recently introduced dataset consisting of minimal pairs of aligned sentences, one serious, the other humorous.

Benchmarking Sentence

Low-Rank Subspaces for Unsupervised Entity Linking

1 code implementation EMNLP 2021 Akhil Arora, Alberto García-Durán, Robert West

We propose a light-weight and scalable entity linking method, Eigenthemes, that relies solely on the availability of entity names and a referent knowledge base.

Entity Linking

Are Anti-Feminist Communities Gateways to the Far Right? Evidence from Reddit and YouTube

no code implementations25 Feb 2021 Robin Mamié, Manoel Horta Ribeiro, Robert West

Our results suggest that there is a large overlap between the user bases of the Alt-right and of the Manosphere and that members of the Manosphere have a bigger chance to engage with far right content than carefully chosen counterparts.

Computers and Society

Volunteer contributions to Wikipedia increased during COVID-19 mobility restrictions

1 code implementation19 Feb 2021 Thorsten Ruprechter, Manoel Horta Ribeiro, Tiago Santos, Florian Lemmerich, Markus Strohmaier, Robert West, Denis Helic

Wikipedia, the largest encyclopedia ever created, is a global initiative driven by volunteer contributions.

Computers and Society

Formation of Social Ties Influences Food Choice: A Campus-Wide Longitudinal Study

no code implementations17 Feb 2021 Kristina Gligorić, Ryen W. White, Emre Kiciman, Eric Horvitz, Arnaud Chiolero, Robert West

To estimate causal effects from the passively observed log data, we control confounds in a matched quasi-experimental design: we identify focal users who at first do not have any regular eating partners but then start eating with a fixed partner regularly, and we match focal users into comparison pairs such that paired users are nearly identical with respect to covariates measured before acquiring the partner, where the two focal users' new eating partners diverge in the healthiness of their respective food choice.

Experimental Design Nutrition

YouNiverse: Large-Scale Channel and Video Metadata from English-Speaking YouTube

1 code implementation18 Dec 2020 Manoel Horta Ribeiro, Robert West

YouTube plays a key role in entertaining and informing people around the globe.

Time Series Analysis Social and Information Networks Computers and Society

KLearn: Background Knowledge Inference from Summarization Data

1 code implementation Findings of the Association for Computational Linguistics 2020 Maxime Peyrard, Robert West

The goal of text summarization is to compress documents to the relevant information while excluding background information already known to the receiver.

Text Summarization

Crosslingual Topic Modeling with WikiPDA

1 code implementation23 Sep 2020 Tiziano Piccardi, Robert West

We present Wikipedia-based Polyglot Dirichlet Allocation (WikiPDA), a crosslingual topic model that learns to represent Wikipedia articles written in any language as distributions over a common set of language-independent topics.

Articles Matrix Completion

Adoption of Twitter's New Length Limit: Is 280 the New 140?

no code implementations16 Sep 2020 Kristina Gligorić, Ashton Anderson, Robert West

The prevalence of tweets around 140 characters before the switch in a given language is strongly correlated with the prevalence of tweets around 280 characters after the switch in the same language, and very long tweets are vastly more popular on Web clients than on mobile clients.

Calibration of Google Trends Time Series

1 code implementation27 Jul 2020 Robert West

In the offline preprocessing phase, an "anchor bank" is constructed, a set of queries spanning the full spectrum of popularity, all calibrated against a common reference query by carefully chaining together multiple Google Trends requests.

Time Series Time Series Analysis

A Ladder of Causal Distances

1 code implementation5 May 2020 Maxime Peyrard, Robert West

Causal discovery, the task of automatically constructing a causal model from data, is of major significance across the sciences.

Benchmarking Causal Discovery +1

On the Limitations of Cross-lingual Encoders as Exposed by Reference-Free Machine Translation Evaluation

1 code implementation ACL 2020 Wei Zhao, Goran Glavaš, Maxime Peyrard, Yang Gao, Robert West, Steffen Eger

We systematically investigate a range of metrics based on state-of-the-art cross-lingual semantic representations obtained with pretrained M-BERT and LASER.

Language Modeling Language Modelling +5

Behavior Cloning in OpenAI using Case Based Reasoning

no code implementations23 Feb 2020 Chad Peters, Babak Esfandiari, Mohamad Zalat, Robert West

Learning from Observation (LfO), also known as Behavioral Cloning, is an approach for building software agents by recording the behavior of an expert (human or artificial) and using the recorded data to generate the required behavior.

OpenAI Gym

Learning High Order Feature Interactions with Fine Control Kernels

no code implementations9 Feb 2020 Hristo Paskov, Alex Paskov, Robert West

We provide a methodology for learning sparse statistical models that use as features all possible multiplicative interactions among an underlying atomic set of features.

Sparse Learning Vocal Bursts Intensity Prediction

WikiHist.html: English Wikipedia's Full Revision History in HTML Format

1 code implementation28 Jan 2020 Blagoj Mitrevski, Tiziano Piccardi, Robert West

Wikipedia is written in the wikitext markup language.

Computers and Society

Quantifying Engagement with Citations on Wikipedia

1 code implementation23 Jan 2020 Tiziano Piccardi, Miriam Redi, Giovanni Colavizza, Robert West

Wikipedia, the free online encyclopedia that anyone can edit, is one of the most visited sites on the Web and a common source of information for many users.

Computers and Society

Robust Cross-lingual Embeddings from Parallel Sentences

2 code implementations28 Dec 2019 Ali Sabet, Prakhar Gupta, Jean-Baptiste Cordonnier, Robert West, Martin Jaggi

Recent advances in cross-lingual word embeddings have primarily relied on mapping-based methods, which project pretrained word embeddings from different languages into a shared space through a linear transformation.

Cross-Lingual Document Classification Cross-Lingual Word Embeddings +8

RISE and DISE: Two Frameworks for Learning from Time Series with Missing Data

no code implementations25 Sep 2019 Alberto Garcia-Duran, Robert West

The most successful prior approaches for modeling such time series are based on recurrent neural networks that learn to impute unobserved values and then treat the imputed values as observed.

Missing Values Representation Learning +3

Auditing Radicalization Pathways on YouTube

2 code implementations22 Aug 2019 Manoel Horta Ribeiro, Raphael Ottoni, Robert West, Virgílio A. F. Almeida, Wagner Meira

Non-profits, as well as the media, have hypothesized the existence of a radicalization pipeline on YouTube, claiming that users systematically progress towards more extreme content on the platform.

Computers and Society Social and Information Networks

Privacy-Preserving Classification with Secret Vector Machines

1 code implementation8 Jul 2019 Valentin Hartmann, Konark Modi, Josep M. Pujol, Robert West

Second, we implement SecVM's distributed framework for the Cliqz web browser and deploy it for predicting user gender in a large-scale online evaluation with thousands of clients, outperforming baselines by a large margin and thus showcasing that SecVM is suitable for production environments.

Classification Federated Learning +2

Secure Summation via Subset Sums: A New Primitive for Privacy-Preserving Distributed Machine Learning

1 code implementation27 Jun 2019 Valentin Hartmann, Robert West

For population studies or for the training of complex machine learning models, it is often required to gather data from different actors.

BIG-bench Machine Learning Privacy Preserving

Eliciting New Wikipedia Users' Interests via Automatically Mined Questionnaires: For a Warm Welcome, Not a Cold Start

1 code implementation8 Apr 2019 Ramtin Yazdanian, Leila Zia, Jonathan Morgan, Bahodir Mansurov, Robert West

As such, these systems cannot make high-quality recommendations to newcomers without any previous interactions -- the so-called cold-start problem.

Articles Recommendation Systems

Crosslingual Document Embedding as Reduced-Rank Ridge Regression

1 code implementation8 Apr 2019 Martin Josifoski, Ivan S. Paskov, Hristo S. Paskov, Martin Jaggi, Robert West

Finally, although not trained for embedding sentences and words, it also achieves competitive performance on crosslingual sentence and word retrieval tasks.

Document Embedding regression +2

Hot Streaks on Social Media

1 code implementation5 Apr 2019 Kiran Garimella, Robert West

We show that user impact tends to have certain characteristics: First, impact is clustered in time, such that the most impactful tweets of a user appear close to each other.

Social and Information Networks

Expanding the Text Classification Toolbox with Cross-Lingual Embeddings

no code implementations23 Mar 2019 Meryem M'hamdi, Robert West, Andreea Hossmann, Michael Baeriswyl, Claudiu Musat

In particular, we test the hypothesis that embeddings with context are more effective, by multi-tasking the learning of multilingual word embeddings and text classification; we explore neural architectures for CLTC; and we move from bi- to multi-lingual word embeddings.

General Classification Intent Detection +4

Reverse-Engineering Satire, or "Paper on Computational Humor Accepted Despite Making Serious Advances"

1 code implementation10 Jan 2019 Robert West, Eric Horvitz

Starting from the observation that satirical news headlines tend to resemble serious news headlines, we build and analyze a corpus of satirical headlines paired with nearly identical but serious headlines.

Humor Detection Sentence

Measuring Societal Biases from Text Corpora with Smoothed First-Order Co-occurrence

no code implementations13 Dec 2018 Navid Rekabsaz, Robert West, James Henderson, Allan Hanbury

The common approach to measuring such biases using a corpus is by calculating the similarities between the embedding vector of a word (like nurse) and the vectors of the representative words of the concepts of interest (such as genders).

Word Embeddings

Quootstrap: Scalable Unsupervised Extraction of Quotation-Speaker Pairs from Large News Corpora via Bootstrapping

1 code implementation7 Apr 2018 Dario Pavllo, Tiziano Piccardi, Robert West

We propose Quootstrap, a method for extracting quotations, as well as the names of the speakers who uttered them, from large news corpora.

Articles

Growing Wikipedia Across Languages via Recommendation

2 code implementations12 Apr 2016 Ellery Wulczyn, Robert West, Leila Zia, Jure Leskovec

The system involves identifying missing articles, ranking the missing articles according to their importance, and recommending important missing articles to editors based on their interests.

Social and Information Networks Digital Libraries

Exploiting Social Network Structure for Person-to-Person Sentiment Analysis

no code implementations TACL 2014 Robert West, Hristo S. Paskov, Jure Leskovec, Christopher Potts

Person-to-person evaluations are prevalent in all kinds of discourse and important for establishing reputations, building social bonds, and shaping public opinion.

Decision Making Sentiment Analysis

Cannot find the paper you are looking for? You can Submit a new open access paper.