Search Results for author: Sen Wu

Found 19 papers, 11 papers with code

Metadata Shaping: A Simple Approach for Knowledge-Enhanced Language Models

1 code implementation Findings (ACL) 2022 Simran Arora, Sen Wu, Enci Liu, Christopher Re

We observe proposed methods typically start with a base LM and data that has been annotated with entity metadata, then change the model, by modifying the architecture or introducing auxiliary loss terms to better capture entity knowledge.

Metadata Shaping: Natural Language Annotations for the Tail

1 code implementation16 Oct 2021 Simran Arora, Sen Wu, Enci Liu, Christopher Re

Since rare entities and facts are prevalent in the queries users submit to popular applications such as search and personal assistant systems, improving the ability of LMs to reliably capture knowledge over rare entities is a pressing challenge studied in significant prior work.

Cross-Domain Data Integration for Named Entity Disambiguation in Biomedical Text

1 code implementation Findings (EMNLP) 2021 Maya Varma, Laurel Orr, Sen Wu, Megan Leszczynski, Xiao Ling, Christopher Ré

Named entity disambiguation (NED), which involves mapping textual mentions to structured entities, is particularly challenging in the medical domain due to the presence of rare entities.

Data Integration Entity Disambiguation

Precise High-Dimensional Asymptotics for Quantifying Heterogeneous Transfers

no code implementations22 Oct 2020 Fan Yang, Hongyang R. Zhang, Sen Wu, Christopher Ré, Weijie J. Su

Intuitively, the transfer effect from one task to another task depends on dataset shifts such as sample sizes and covariance matrices.

Multi-Task Learning text-classification +1

Bootleg: Chasing the Tail with Self-Supervised Named Entity Disambiguation

1 code implementation20 Oct 2020 Laurel Orr, Megan Leszczynski, Simran Arora, Sen Wu, Neel Guha, Xiao Ling, Christopher Re

A challenge for named entity disambiguation (NED), the task of mapping textual mentions to entities in a knowledge base, is how to disambiguate entities that appear rarely in the training data, termed tail entities.

 Ranked #1 on Entity Disambiguation on AIDA-CoNLL (Micro-F1 metric)

Entity Disambiguation Relation Extraction

Ivy: Instrumental Variable Synthesis for Causal Inference

no code implementations11 Apr 2020 Zhaobin Kuang, Frederic Sala, Nimit Sohoni, Sen Wu, Aldo Córdova-Palomera, Jared Dunnmon, James Priest, Christopher Ré

To relax these assumptions, we propose Ivy, a new method to combine IV candidates that can handle correlated and invalid IV candidates in a robust manner.

Causal Inference Epidemiology +1

Understanding the Downstream Instability of Word Embeddings

1 code implementation29 Feb 2020 Megan Leszczynski, Avner May, Jian Zhang, Sen Wu, Christopher R. Aberger, Christopher Ré

To theoretically explain this tradeoff, we introduce a new measure of embedding instability---the eigenspace instability measure---which we prove bounds the disagreement in downstream predictions introduced by the change in word embeddings.

Word Embeddings

Slice-based Learning: A Programming Model for Residual Learning in Critical Data Slices

2 code implementations NeurIPS 2019 Vincent S. Chen, Sen Wu, Zhenzhen Weng, Alexander Ratner, Christopher Ré

In real-world machine learning applications, data subsets correspond to especially critical outcomes: vulnerable cyclist detections are safety-critical in an autonomous driving task, and "question" sentences might be important to a dialogue agent's language understanding for product purposes.

Autonomous Driving BIG-bench Machine Learning

Snorkel: Rapid Training Data Creation with Weak Supervision

2 code implementations28 Nov 2017 Alexander Ratner, Stephen H. Bach, Henry Ehrenberg, Jason Fries, Sen Wu, Christopher Ré

In a user study, subject matter experts build models 2. 8x faster and increase predictive performance an average 45. 5% versus seven hours of hand labeling.

BIG-bench Machine Learning

Robust Sparse Coding via Self-Paced Learning

no code implementations10 Sep 2017 Xiaodong Feng, Zhiwei Tang, Sen Wu

Sparse coding (SC) is attracting more and more attention due to its comprehensive theoretical studies and its excellent performance in many signal processing applications.

Data Programming: Creating Large Training Sets, Quickly

4 code implementations NeurIPS 2016 Alexander Ratner, Christopher De Sa, Sen Wu, Daniel Selsam, Christopher Ré

Additionally, in initial user studies we observed that data programming may be an easier way for non-experts to create machine learning models when training data is limited or unavailable.

BIG-bench Machine Learning Slot Filling

Incremental Knowledge Base Construction Using DeepDive

no code implementations3 Feb 2015 Jaeho Shin, Sen Wu, Feiran Wang, Christopher De Sa, Ce Zhang, Christopher Ré

Populating a database with unstructured information is a long-standing problem in industry and research that encompasses problems of extraction, cleaning, and integration.

Feature Engineering for Knowledge Base Construction

no code implementations24 Jul 2014 Christopher Ré, Amir Abbas Sadeghian, Zifei Shan, Jaeho Shin, Feiran Wang, Sen Wu, Ce Zhang

Our approach to KBC is based on joint probabilistic inference and learning, but we do not see inference as either a panacea or a magic bullet: inference is a tool that allows us to be systematic in how we construct, debug, and improve the quality of such systems.

Feature Engineering

Cannot find the paper you are looking for? You can Submit a new open access paper.