Search Results for author: David Mareček

Found 20 papers, 5 papers with code

Analyzing BERT’s Knowledge of Hypernymy via Prompting

no code implementations • EMNLP (BlackboxNLP) 2021 • Michael Hanna, David Mareček

The high performance of large pretrained language models (LLMs) such as BERT on NLP tasks has prompted questions about BERT’s linguistic capabilities, and how they differ from humans’.

Hypernym Discovery

Paper
Add Code

Don’t Forget About Pronouns: Removing Gender Bias in Language Models Without Losing Factual Gender Information

no code implementations • NAACL (GeBNLP) 2022 • Tomasz Limisiewicz, David Mareček

The representations in large language models contain multiple types of gender information.

Language Modelling Text Generation

Paper
Add Code

GPT-2-based Human-in-the-loop Theatre Play Script Generation

no code implementations • NAACL (WNU) 2022 • Rudolf Rosa, Patrícia Schmidtová, Ondřej Dušek, Tomáš Musil, David Mareček, Saad Obaid, Marie Nováková, Klára Vosecká, Josef Doležal

We experiment with adapting generative language models for the generation of long coherent narratives in the form of theatre plays.

Language Modelling Text Generation

Paper
Add Code

Debiasing Algorithm through Model Adaptation

1 code implementation • 29 Oct 2023 • Tomasz Limisiewicz, David Mareček, Tomáš Musil

Large language models are becoming the go-to solution for the ever-growing number of tasks.

Paper
Code

Exploring the Impact of Training Data Distribution and Subword Tokenization on Gender Bias in Machine Translation

1 code implementation • 21 Sep 2023 • Bar Iluz, Tomasz Limisiewicz, Gabriel Stanovsky, David Mareček

We study the effect of tokenization on gender bias in machine translation, an aspect that has been largely overlooked in previous works.

Gender Prediction Machine Translation +1

Paper
Code

Closing the loop: Autonomous experiments enabled by machine-learning-based online data analysis in synchrotron beamline environments

no code implementations • 20 Jun 2023 • Linus Pithan, Vladimir Starostin, David Mareček, Lukas Petersdorf, Constantin Völter, Valentin Munteanu, Maciej Jankowski, Oleg Konovalov, Alexander Gerlach, Alexander Hinderhofer, Bridget Murphy, Stefan Kowarik, Frank Schreiber

Our focus lies on the beamline integration of ML-based online data analysis and closed-loop feedback.

Decision Making

Paper
Add Code

Tokenization Impacts Multilingual Language Modeling: Assessing Vocabulary Allocation and Overlap Across Languages

1 code implementation • 26 May 2023 • Tomasz Limisiewicz, Jiří Balhar, David Mareček

Multilingual language models have recently gained attention as a promising solution for representing multiple languages in a single model.

Language Modelling NER +3

Paper
Code

Independent Components of Word Embeddings Represent Semantic Features

no code implementations • 19 Dec 2022 • Tomáš Musil, David Mareček

Independent Component Analysis (ICA) is an algorithm originally developed for finding separate sources in a mixed signal, such as a recording of multiple people in the same room speaking at the same time.

Word Embeddings

Paper
Add Code

Don't Forget About Pronouns: Removing Gender Bias in Language Models Without Losing Factual Gender Information

no code implementations • 21 Jun 2022 • Tomasz Limisiewicz, David Mareček

The representations in large language models contain multiple types of gender information.

Language Modelling Text Generation

Paper
Add Code

Examining Cross-lingual Contextual Embeddings with Orthogonal Structural Probes

no code implementations • EMNLP 2021 • Tomasz Limisiewicz, David Mareček

The evaluated information is encoded in a shared cross-lingual embedding space.

Paper
Add Code

THEaiTRE 1.0: Interactive generation of theatre play scripts

no code implementations • 17 Feb 2021 • Rudolf Rosa, Tomáš Musil, Ondřej Dušek, Dominik Jurko, Patrícia Schmidtová, David Mareček, Ondřej Bojar, Tom Kocmi, Daniel Hrbek, David Košťák, Martina Kinská, Marie Nováková, Josef Doležal, Klára Vosecká, Tomáš Studeník, Petr Žabka

We present the first version of a system for interactive generation of theatre play scripts.

Paper
Add Code

Introducing Orthogonal Constraint in Structural Probes

1 code implementation • ACL 2021 • Tomasz Limisiewicz, David Mareček

With the recent success of pre-trained models in NLP, a significant focus was put on interpreting their representations.

Memorization Position +2

Paper
Code

Syntax Representation in Word Embeddings and Neural Networks -- A Survey

no code implementations • 2 Oct 2020 • Tomasz Limisiewicz, David Mareček

Neural networks trained on natural language processing tasks capture syntax even though it is not provided as a supervision signal.

Language Modelling Machine Translation +3

Paper
Add Code

Measuring Memorization Effect in Word-Level Neural Networks Probing

no code implementations • 29 Jun 2020 • Rudolf Rosa, Tomáš Musil, David Mareček

In classical probing, a classifier is trained on the representations to extract the target linguistic information.

Machine Translation Memorization +1

Paper
Add Code

THEaiTRE: Artificial Intelligence to Write a Theatre Play

no code implementations • 25 Jun 2020 • Rudolf Rosa, Ondřej Dušek, Tom Kocmi, David Mareček, Tomáš Musil, Patrícia Schmidtová, Dominik Jurko, Ondřej Bojar, Daniel Hrbek, David Košťák, Martina Kinská, Josef Doležal, Klára Vosecká

We present THEaiTRE, a starting project aimed at automatic generation of theatre play scripts.

Machine Translation Translation

Paper
Add Code

Universal Dependencies according to BERT: both more specific and more general

2 code implementations • Findings of the Association for Computational Linguistics 2020 • Tomasz Limisiewicz, Rudolf Rosa, David Mareček

This work focuses on analyzing the form and extent of syntactic abstraction captured by BERT by extracting labeled dependency trees from self-attentions.

Relation

Paper
Code

Inducing Syntactic Trees from BERT Representations

no code implementations • 27 Jun 2019 • Rudolf Rosa, David Mareček

We use the English model of BERT and explore how a deletion of one word in a sentence changes representations of other words.

Language Modelling Sentence

Paper
Add Code

Derivational Morphological Relations in Word Embeddings

no code implementations • 6 Jun 2019 • Tomáš Musil, Jonáš Vidra, David Mareček

Derivation is a type of a word-formation process which creates new words from existing ones by adding, changing or deleting affixes.

Clustering Word Embeddings

Paper
Add Code

From Balustrades to Pierre Vinken: Looking for Syntax in Transformer Self-Attentions

no code implementations • WS 2019 • David Mareček, Rudolf Rosa

We inspect the multi-head self-attention in Transformer NMT encoders for three source languages, looking for patterns that could have a syntactic interpretation.

NMT Position

Paper
Add Code

Input Combination Strategies for Multi-Source Transformer Decoder

no code implementations • 12 Nov 2018 • Jindřich Libovický, Jindřich Helcl, David Mareček

In multi-source sequence-to-sequence tasks, the attention mechanism can be modeled in several ways.

Translation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.