Search Results for author: Jimmy Lin

Found 122 papers, 47 papers with code

Learning to Rank in the Age of Muppets: Effectiveness–Efficiency Tradeoffs in Multi-Stage Ranking

no code implementations EMNLP (sustainlp) 2021 Yue Zhang, ChengCheng Hu, Yuqi Liu, Hui Fang, Jimmy Lin

It is well known that rerankers built on pretrained transformer models such as BERT have dramatically improved retrieval effectiveness in many tasks.

Document Ranking Learning-To-Rank

Bag-of-Words Baselines for Semantic Code Search

no code implementations ACL (NLP4Prog) 2021 Xinyu Zhang, Ji Xin, Andrew Yates, Jimmy Lin

The task of semantic code search is to retrieve code snippets from a source code corpus based on an information need expressed in natural language.

Code Search Information Retrieval

In-Batch Negatives for Knowledge Distillation with Tightly-Coupled Teachers for Dense Retrieval

no code implementations ACL (RepL4NLP) 2021 Sheng-Chieh Lin, Jheng-Hong Yang, Jimmy Lin

We present an efficient training approach to text retrieval with dense representations that applies knowledge distillation using the ColBERT late-interaction ranking model.

Document Ranking Knowledge Distillation

Cydex: Neural Search Infrastructure for the Scholarly Literature

no code implementations EMNLP (sdp) 2020 Shane Ding, Edwin Zhang, Jimmy Lin

Cydex is a platform that provides neural search infrastructure for domain-specific scholarly literature.

Simple and Effective Unsupervised Redundancy Elimination to Compress Dense Vectors for Passage Retrieval

no code implementations EMNLP 2021 Xueguang Ma, Minghan Li, Kai Sun, Ji Xin, Jimmy Lin

Recent work has shown that dense passage retrieval techniques achieve better ranking accuracy in open-domain question answering compared to sparse retrieval techniques such as BM25, but at the cost of large space and memory requirements.

Open-Domain Question Answering Passage Retrieval +1

Voice Query Auto Completion

no code implementations EMNLP 2021 Raphael Tang, Karun Kumar, Kendra Chalkley, Ji Xin, Liming Zhang, Wenyan Li, Gefei Yang, Yajie Mao, Junho Shin, Geoffrey Craig Murray, Jimmy Lin

Query auto completion (QAC) is the task of predicting a search engine user’s final query from their intermediate, incomplete query.

Speech Recognition

Unsupervised Chunking as Syntactic Structure Induction with a Knowledge-Transfer Approach

no code implementations Findings (EMNLP) 2021 Anup Anand Deshmukh, Qianqiu Zhang, Ming Li, Jimmy Lin, Lili Mou

In this paper, we address unsupervised chunking as a new task of syntactic structure induction, which is helpful for understanding the linguistic structures of human languages as well as processing low-resource languages.

Chunking Transfer Learning

Multi-Task Dense Retrieval via Model Uncertainty Fusion for Open-Domain Question Answering

1 code implementation Findings (EMNLP) 2021 Minghan Li, Ming Li, Kun Xiong, Jimmy Lin

Our method reaches state-of-the-art performance on 5 benchmark QA datasets, with up to 10% improvement in top-100 accuracy compared to a joint-training multi-task DPR on SQuAD.

Open-Domain Question Answering

A Little Bit Is Worse Than None: Ranking with Limited Training Data

no code implementations EMNLP (sustainlp) 2020 Xinyu Zhang, Andrew Yates, Jimmy Lin

Researchers have proposed simple yet effective techniques for the retrieval problem based on using BERT as a relevance classifier to rerank initial candidates from keyword search.

Passage Retrieval

Sparsifying Sparse Representations for Passage Retrieval by Top-$k$ Masking

no code implementations17 Dec 2021 Jheng-Hong Yang, Xueguang Ma, Jimmy Lin

Sparse lexical representation learning has demonstrated much progress in improving passage retrieval effectiveness in recent models such as DeepImpact, uniCOIL, and SPLADE.

Passage Retrieval Representation Learning

Improving Query Representations for Dense Retrieval with Pseudo Relevance Feedback: A Reproducibility Study

1 code implementation13 Dec 2021 Hang Li, Shengyao Zhuang, Ahmed Mourad, Xueguang Ma, Jimmy Lin, Guido Zuccon

Finally, we contribute a study of the generalisability of the ANCE-PRF method when dense retrievers other than ANCE are used for the first round of retrieval and for encoding the PRF signal.

Densifying Sparse Representations for Passage Retrieval by Representational Slicing

no code implementations9 Dec 2021 Sheng-Chieh Lin, Jimmy Lin

Learned sparse and dense representations capture different successful approaches to text retrieval and the fusion of their results has proven to be more effective and robust.

Passage Retrieval

Wacky Weights in Learned Sparse Representations and the Revenge of Score-at-a-Time Query Evaluation

no code implementations22 Oct 2021 Joel Mackenzie, Andrew Trotman, Jimmy Lin

Recent advances in retrieval models based on learned sparse representations generated by transformers have led us to, once again, consider score-at-a-time query evaluation techniques for the top-k retrieval problem.

A Proposed Conceptual Framework for a Representational Approach to Information Retrieval

no code implementations4 Oct 2021 Jimmy Lin

This paper outlines a conceptual framework for understanding recent developments in information retrieval and natural language processing that attempts to integrate dense and sparse retrieval methods.

Information Retrieval Sentence Similarity

Encoder Adaptation of Dense Passage Retrieval for Open-Domain Question Answering

no code implementations4 Oct 2021 Minghan Li, Jimmy Lin

Previous work on generalization of DPR mainly focus on testing both encoders in tandem on out-of-distribution (OOD) question-answering (QA) tasks, which is also known as domain adaptation.

Domain Adaptation Open-Domain Question Answering +1

Mr. TyDi: A Multi-lingual Benchmark for Dense Retrieval

1 code implementation EMNLP (MRL) 2021 Xinyu Zhang, Xueguang Ma, Peng Shi, Jimmy Lin

We present Mr. TyDi, a multi-lingual benchmark dataset for mono-lingual retrieval in eleven typologically diverse languages, designed to evaluate ranking with learned dense representations.

Representation Learning

The Art of Abstention: Selective Prediction and Error Regularization for Natural Language Processing

1 code implementation ACL 2021 Ji Xin, Raphael Tang, YaoLiang Yu, Jimmy Lin

To fill this void in the literature, we study in this paper selective prediction for NLP, comparing different models and confidence estimators.

Exploring Listwise Evidence Reasoning with T5 for Fact Verification

no code implementations ACL 2021 Kelvin Jiang, Ronak Pradeep, Jimmy Lin

This work explores a framework for fact verification that leverages pretrained sequence-to-sequence transformer models for sentence selection and label prediction, two key sub-tasks in fact verification.

Data Augmentation Fact Verification

A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques

no code implementations28 Jun 2021 Jimmy Lin, Xueguang Ma

Recent developments in representational learning for information retrieval can be organized in a conceptual framework that establishes two pairs of contrasts: sparse vs. dense representations and unsupervised vs. learned representations.

Information Retrieval

MS MARCO: Benchmarking Ranking Models in the Large-Data Regime

no code implementations9 May 2021 Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, Jimmy Lin

Evaluation efforts such as TREC, CLEF, NTCIR and FIRE, alongside public leaderboard such as MS MARCO, are intended to encourage research and track our progress, addressing big questions in our field.

Contextualized Query Embeddings for Conversational Search

no code implementations EMNLP 2021 Sheng-Chieh Lin, Jheng-Hong Yang, Jimmy Lin

This paper describes a compact and effective model for low-latency passage retrieval in conversational search based on learned dense representations.

Conversational Search Information Retrieval +3

Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware Sampling

1 code implementation14 Apr 2021 Sebastian Hofstätter, Sheng-Chieh Lin, Jheng-Hong Yang, Jimmy Lin, Allan Hanbury

A vital step towards the widespread adoption of neural retrieval models is their resource efficiency throughout the training, indexing and query workflows.

Re-Ranking

A Replication Study of Dense Passage Retriever

1 code implementation12 Apr 2021 Xueguang Ma, Kai Sun, Ronak Pradeep, Jimmy Lin

Text retrieval using learned dense representations has recently emerged as a promising alternative to "traditional" text retrieval using sparse bag-of-words representations.

Open-Domain Question Answering

BERxiT: Early Exiting for BERT with Better Fine-Tuning and Extension to Regression

1 code implementation EACL 2021 Ji Xin, Raphael Tang, YaoLiang Yu, Jimmy Lin

The slow speed of BERT has motivated much research on accelerating its inference, and the early exiting idea has been proposed to make trade-offs between model quality and efficiency.

Investigating the Limitations of Transformers with Simple Arithmetic Tasks

1 code implementation25 Feb 2021 Rodrigo Nogueira, Zhiying Jiang, Jimmy Lin

In this work, we investigate if the surface form of a number has any influence on how sequence-to-sequence language models learn simple arithmetic tasks such as addition and subtraction across a wide range of values.

Pyserini: An Easy-to-Use Python Toolkit to Support Replicable IR Research with Sparse and Dense Representations

1 code implementation19 Feb 2021 Jimmy Lin, Xueguang Ma, Sheng-Chieh Lin, Jheng-Hong Yang, Ronak Pradeep, Rodrigo Nogueira

Pyserini is an easy-to-use Python toolkit that supports replicable IR research by providing effective first-stage retrieval in a multi-stage ranking architecture.

Information Retrieval

The Expando-Mono-Duo Design Pattern for Text Ranking with Pretrained Sequence-to-Sequence Models

1 code implementation14 Jan 2021 Ronak Pradeep, Rodrigo Nogueira, Jimmy Lin

We propose a design pattern for tackling text ranking problems, dubbed "Expando-Mono-Duo", that has been empirically validated for a number of ad hoc retrieval tasks in different domains.

Document Ranking

Inserting Information Bottlenecks for Attribution in Transformers

1 code implementation Findings of the Association for Computational Linguistics 2020 Zhiying Jiang, Raphael Tang, Ji Xin, Jimmy Lin

We show the effectiveness of our method in terms of attribution and the ability to provide insight into how information flows through layers.

Designing Templates for Eliciting Commonsense Knowledge from Pretrained Sequence-to-Sequence Models

no code implementations COLING 2020 Jheng-Hong Yang, Sheng-Chieh Lin, Rodrigo Nogueira, Ming-Feng Tsai, Chuan-Ju Wang, Jimmy Lin

While internalized {``}implicit knowledge{''} in pretrained transformers has led to fruitful progress in many natural language understanding tasks, how to most effectively elicit such knowledge remains an open question.

Natural Language Understanding Question Answering

Cross-Lingual Training of Neural Models for Document Ranking

no code implementations Findings of the Association for Computational Linguistics 2020 Peng Shi, He Bai, Jimmy Lin

We tackle the challenge of cross-lingual training of neural document ranking models for mono-lingual retrieval, specifically leveraging relevance judgments in English to improve search in non-English languages.

Document Ranking

Scientific Claim Verification with VERT5ERINI

no code implementations EACL (Louhi) 2021 Ronak Pradeep, Xueguang Ma, Rodrigo Nogueira, Jimmy Lin

This work describes the adaptation of a pretrained sequence-to-sequence model to the task of scientific claim verification in the biomedical domain.

Distilling Dense Representations for Ranking using Tightly-Coupled Teachers

1 code implementation22 Oct 2020 Sheng-Chieh Lin, Jheng-Hong Yang, Jimmy Lin

We present an approach to ranking with dense representations that applies knowledge distillation to improve the recently proposed late-interaction ColBERT model.

Knowledge Distillation

Rainfall-Runoff Prediction at Multiple Timescales with a Single Long Short-Term Memory Network

1 code implementation15 Oct 2020 Martin Gauch, Frederik Kratzert, Daniel Klotz, Grey Nearing, Jimmy Lin, Sepp Hochreiter

Compared to naive prediction with a distinct LSTM per timescale, the multi-timescale architectures are computationally more efficient with no loss in accuracy.

Pretrained Transformers for Text Ranking: BERT and Beyond

1 code implementation NAACL 2021 Jimmy Lin, Rodrigo Nogueira, Andrew Yates

There are two themes that pervade our survey: techniques for handling long documents, beyond typical sentence-by-sentence processing in NLP, and techniques for addressing the tradeoff between effectiveness (i. e., result quality) and efficiency (e. g., query latency, model and index size).

Information Retrieval

Howl: A Deployed, Open-Source Wake Word Detection System

2 code implementations EMNLP (NLPOSS) 2020 Raphael Tang, Jaejun Lee, Afsaneh Razi, Julia Cambre, Ian Bicking, Jofish Kaye, Jimmy Lin

We describe Howl, an open-source wake word detection toolkit with native support for open speech datasets, like Mozilla Common Voice and Google Speech Commands.

Keyword Spotting

Don't Change Me! User-Controllable Selective Paraphrase Generation

no code implementations EACL 2021 Mohan Zhang, Luchen Tan, Zhengkai Tu, Zihang Fu, Kun Xiong, Ming Li, Jimmy Lin

The contribution of this work is a novel data generation technique using distant supervision that allows us to start with a pretrained sequence-to-sequence model and fine-tune a paraphrase generator that exhibits this behavior, allowing user-controllable paraphrase generation.

Paraphrase Generation

Covidex: Neural Ranking Models and Keyword Search Infrastructure for the COVID-19 Open Research Dataset

1 code implementation EMNLP (sdp) 2020 Edwin Zhang, Nikhil Gupta, Raphael Tang, Xiao Han, Ronak Pradeep, Kuang Lu, Yue Zhang, Rodrigo Nogueira, Kyunghyun Cho, Hui Fang, Jimmy Lin

We present Covidex, a search engine that exploits the latest neural ranking models to provide information access to the COVID-19 Open Research Dataset curated by the Allen Institute for AI.

Rapidly Deploying a Neural Search Engine for the COVID-19 Open Research Dataset

no code implementations ACL 2020 Edwin Zhang, Nikhil Gupta, Rodrigo Nogueira, Kyunghyun Cho, Jimmy Lin

The Neural Covidex is a search engine that exploits the latest neural ranking architectures to provide information access to the COVID-19 Open Research Dataset (CORD-19) curated by the Allen Institute for AI.

Decision Making

Generalized and Scalable Optimal Sparse Decision Trees

1 code implementation ICML 2020 Jimmy Lin, Chudi Zhong, Diane Hu, Cynthia Rudin, Margo Seltzer

Decision tree optimization is notoriously difficult from a computational perspective but essential for the field of interpretable machine learning.

Interpretable Machine Learning

A Data Scientist's Guide to Streamflow Prediction

no code implementations5 Jun 2020 Martin Gauch, Jimmy Lin

In recent years, the paradigms of data-driven science have become essential components of physical sciences, particularly in geophysical disciplines such as climatology.

Segatron: Segment-Aware Transformer for Language Modeling and Understanding

1 code implementation30 Apr 2020 He Bai, Peng Shi, Jimmy Lin, Yuqing Xie, Luchen Tan, Kun Xiong, Wen Gao, Ming Li

To verify this, we propose a segment-aware Transformer (Segatron), by replacing the original token position encoding with a combined position encoding of paragraph, sentence, and token.

Language Modelling Representation Learning

Showing Your Work Doesn't Always Work

1 code implementation ACL 2020 Raphael Tang, Jaejun Lee, Ji Xin, Xinyu Liu, Yao-Liang Yu, Jimmy Lin

In natural language processing, a recently popular line of work explores how to best report the experimental results of neural networks.

DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

3 code implementations ACL 2020 Ji Xin, Raphael Tang, Jaejun Lee, Yao-Liang Yu, Jimmy Lin

Large-scale pre-trained language models such as BERT have brought significant improvements to NLP applications.

Rapidly Bootstrapping a Question Answering Dataset for COVID-19

1 code implementation23 Apr 2020 Raphael Tang, Rodrigo Nogueira, Edwin Zhang, Nikhil Gupta, Phuong Cam, Kyunghyun Cho, Jimmy Lin

We present CovidQA, the beginnings of a question answering dataset specifically designed for COVID-19, built by hand from knowledge gathered from Kaggle's COVID-19 Open Research Dataset Challenge.

Question Answering

Rapidly Deploying a Neural Search Engine for the COVID-19 Open Research Dataset: Preliminary Thoughts and Lessons Learned

1 code implementation10 Apr 2020 Edwin Zhang, Nikhil Gupta, Rodrigo Nogueira, Kyunghyun Cho, Jimmy Lin

We present the Neural Covidex, a search engine that exploits the latest neural ranking architectures to provide information access to the COVID-19 Open Research Dataset curated by the Allen Institute for AI.

Decision Making

Conversational Question Reformulation via Sequence-to-Sequence Architectures and Pretrained Language Models

no code implementations4 Apr 2020 Sheng-Chieh Lin, Jheng-Hong Yang, Rodrigo Nogueira, Ming-Feng Tsai, Chuan-Ju Wang, Jimmy Lin

This paper presents an empirical study of conversational question reformulation (CQR) with sequence-to-sequence architectures and pretrained language models (PLMs).

Task-Oriented Dialogue Systems

TTTTTackling WinoGrande Schemas

no code implementations18 Mar 2020 Sheng-Chieh Lin, Jheng-Hong Yang, Rodrigo Nogueira, Ming-Feng Tsai, Chuan-Ju Wang, Jimmy Lin

We applied the T5 sequence-to-sequence model to tackle the AI2 WinoGrande Challenge by decomposing each example into two input text strings, each containing a hypothesis, and using the probabilities assigned to the "entailment" token as a score of the hypothesis.

Supporting Interoperability Between Open-Source Search Engines with the Common Index File Format

2 code implementations18 Mar 2020 Jimmy Lin, Joel Mackenzie, Chris Kamphuis, Craig Macdonald, Antonio Mallia, Michał Siedlaczek, Andrew Trotman, Arjen de Vries

There exists a natural tension between encouraging a diverse ecosystem of open-source search engines and supporting fair, replicable comparisons across those systems.

Rapid Adaptation of BERT for Information Extraction on Domain-Specific Business Documents

no code implementations5 Feb 2020 Ruixue Zhang, Wei Yang, Luyun Lin, Zhengkai Tu, Yuqing Xie, Zihang Fu, Yuhao Xie, Luchen Tan, Kun Xiong, Jimmy Lin

Techniques for automatically extracting important content elements from business documents such as contracts, statements, and filings have the potential to make business operations more efficient.

A Prototype of Serverless Lucene

no code implementations4 Feb 2020 Jimmy Lin

This paper describes a working prototype that adapts Lucene, the world's most popular and most widely deployed open-source search library, to operate within a serverless environment in the cloud.

Navigation-Based Candidate Expansion and Pretrained Language Models for Citation Recommendation

no code implementations23 Jan 2020 Rodrigo Nogueira, Zhiying Jiang, Kyunghyun Cho, Jimmy Lin

Citation recommendation systems for the scientific literature, to help authors find papers that should be cited, have the potential to speed up discoveries and uncover new routes for scientific exploration.

Citation Recommendation Domain Adaptation +2

The Archives Unleashed Project: Technology, Process, and Community to Improve Scholarly Access to Web Archives

1 code implementation15 Jan 2020 Nick Ruest, Jimmy Lin, Ian Milligan, Samantha Fritz

The Archives Unleashed project aims to improve scholarly access to web archives through a multi-pronged strategy involving tool creation, process modeling, and community building - all proceeding concurrently in mutually-reinforcing efforts.

The Proper Care and Feeding of CAMELS: How Limited Training Data Affects Streamflow Prediction

1 code implementation17 Nov 2019 Martin Gauch, Juliane Mai, Jimmy Lin

Accurate streamflow prediction largely relies on historical meteorological records and streamflow measurements.

Exploiting Token and Path-based Representations of Code for Identifying Security-Relevant Commits

no code implementations15 Nov 2019 Achyudh Ram, Ji Xin, Meiyappan Nagappan, Yao-Liang Yu, Rocío Cabrera Lozoya, Antonino Sabetta, Jimmy Lin

Public vulnerability databases such as CVE and NVD account for only 60% of security vulnerabilities present in open-source projects, and are known to suffer from inconsistent quality.

What Would Elsa Do? Freezing Layers During Transformer Fine-Tuning

no code implementations8 Nov 2019 Jaejun Lee, Raphael Tang, Jimmy Lin

We show that only a fourth of the final layers need to be fine-tuned to achieve 90% of the original quality.

Linguistic Acceptability Natural Language Inference +3

Cross-Lingual Relevance Transfer for Document Retrieval

no code implementations8 Nov 2019 Peng Shi, Jimmy Lin

Recent work has shown the surprising ability of multi-lingual BERT to serve as a zero-shot cross-lingual transfer model for a number of language processing tasks.

Zero-Shot Cross-Lingual Transfer

Explicit Pairwise Word Interaction Modeling Improves Pretrained Transformers for English Semantic Similarity Tasks

no code implementations7 Nov 2019 Yinan Zhang, Raphael Tang, Jimmy Lin

In this paper, we hypothesize that introducing an explicit, constrained pairwise word interaction mechanism to pretrained language models improves their effectiveness on semantic similarity tasks.

Semantic Similarity Semantic Textual Similarity

Honkling: In-Browser Personalization for Ubiquitous Keyword Spotting

no code implementations IJCNLP 2019 Jaejun Lee, Raphael Tang, Jimmy Lin

Used for simple commands recognition on devices from smart speakers to mobile phones, keyword spotting systems are everywhere.

Keyword Spotting

Applying BERT to Document Retrieval with Birch

no code implementations IJCNLP 2019 Zeynep Akkalyoncu Yilmaz, Shengjin Wang, Wei Yang, Haotian Zhang, Jimmy Lin

We present Birch, a system that applies BERT to document retrieval via integration with the open-source Anserini information retrieval toolkit to demonstrate end-to-end search over large document collections.

Information Retrieval

What Part of the Neural Network Does This? Understanding LSTMs by Measuring and Dissecting Neurons

no code implementations IJCNLP 2019 Ji Xin, Jimmy Lin, Yao-Liang Yu

Memory neurons of long short-term memory (LSTM) networks encode and process information in powerful yet mysterious ways.

Cross-Domain Modeling of Sentence-Level Evidence for Document Retrieval

no code implementations IJCNLP 2019 Zeynep Akkalyoncu Yilmaz, Wei Yang, Haotian Zhang, Jimmy Lin

This paper applies BERT to ad hoc document retrieval on news articles, which requires addressing two challenges: relevance judgments in existing test collections are typically provided only at the document level, and documents often exceed the length that BERT was designed to handle.

Scalable Knowledge Graph Construction from Text Collections

no code implementations WS 2019 Ryan Clancy, Ihab F. Ilyas, Jimmy Lin

We present a scalable, open-source platform that {``}distills{''} a potentially large text collection into a knowledge graph.

Fact Verification graph construction

Natural Language Generation for Effective Knowledge Distillation

no code implementations WS 2019 Raphael Tang, Yao Lu, Jimmy Lin

Knowledge distillation can effectively transfer knowledge from BERT, a deep language representation model, to traditional, shallow word embedding-based neural networks, helping them approach or exceed the quality of other heavyweight language representation models.

Knowledge Distillation Linguistic Acceptability +4

Multi-Stage Document Ranking with BERT

2 code implementations31 Oct 2019 Rodrigo Nogueira, Wei Yang, Kyunghyun Cho, Jimmy Lin

The advent of deep neural networks pre-trained via language modeling tasks has spurred a number of successful applications in natural language processing.

Document Ranking Language Modelling

The Performance Envelope of Inverted Indexing on Modern Hardware

no code implementations24 Oct 2019 Jimmy Lin, Lori Paniak, Gordon Boerke

Experiments show that the largest determinants of performance are the physical characteristics of the source and target media, and that physically isolating the two yields the highest indexing throughput.

Lucene for Approximate Nearest-Neighbors Search on Arbitrary Dense Vectors

no code implementations22 Oct 2019 Tommaso Teofili, Jimmy Lin

We demonstrate three approaches for adapting the open-source Lucene search library to perform approximate nearest-neighbor search on arbitrary dense vectors, using similarity search on word embeddings as a case study.

Dimensionality Reduction Word Embeddings

Two Birds, One Stone: A Simple, Unified Model for Text Generation from Structured and Unstructured Data

1 code implementation ACL 2020 Hamidreza Shahidi, Ming Li, Jimmy Lin

We consider neural table-to-text generation and neural question generation (NQG) tasks for text generation from structured and unstructured data, respectively.

Question Generation Table-to-Text Generation

Detecting Customer Complaint Escalation with Recurrent Neural Networks and Manually-Engineered Features

no code implementations NAACL 2019 Wei Yang, Luchen Tan, Chunwei Lu, Anqi Cui, Han Li, Xi Chen, Kun Xiong, Muzi Wang, Ming Li, Jian Pei, Jimmy Lin

Consumers dissatisfied with the normal dispute resolution process provided by an e-commerce company{'}s customer service agents have the option of escalating their complaints by filing grievances with a government authority.

The Simplest Thing That Can Possibly Work: Pseudo-Relevance Feedback Using Text Classification

no code implementations18 Apr 2019 Jimmy Lin

Motivated by recent commentary that has questioned today's pursuit of ever-more complex models and mathematical formalisms in applied machine learning and whether meaningful empirical progress is actually being made, this paper tries to tackle the decades-old problem of pseudo-relevance feedback with "the simplest thing that can possibly work".

General Classification Text Classification

Document Expansion by Query Prediction

4 code implementations17 Apr 2019 Rodrigo Nogueira, Wei Yang, Jimmy Lin, Kyunghyun Cho

One technique to improve the retrieval effectiveness of a search engine is to expand documents with terms that are related or representative of the documents' content. From the perspective of a question answering system, this might comprise questions the document can potentially answer.

Passage Re-Ranking Question Answering +1

Data Augmentation for BERT Fine-Tuning in Open-Domain Question Answering

no code implementations14 Apr 2019 Wei Yang, Yuqing Xie, Luchen Tan, Kun Xiong, Ming Li, Jimmy Lin

Recently, a simple combination of passage retrieval using off-the-shelf IR techniques and a BERT reader was found to be very effective for question answering directly on Wikipedia, yielding a large improvement over the previous state of the art on a standard benchmark dataset.

Data Augmentation Open-Domain Question Answering +1

Simple Applications of BERT for Ad Hoc Document Retrieval

2 code implementations26 Mar 2019 Wei Yang, Haotian Zhang, Jimmy Lin

Following recent successes in applying BERT to question answering, we explore simple applications to ad hoc document retrieval.

Ad-Hoc Information Retrieval Question Answering

Streaming Voice Query Recognition using Causal Convolutional Recurrent Neural Networks

no code implementations19 Dec 2018 Raphael Tang, Gefei Yang, Hong Wei, Yajie Mao, Ferhan Ture, Jimmy Lin

Voice-enabled commercial products are ubiquitous, typically enabled by lightweight on-device keyword spotting (KWS) and full automatic speech recognition (ASR) in the cloud.

Keyword Spotting Speech Recognition +1

FLOPs as a Direct Optimization Objective for Learning Sparse Neural Networks

no code implementations NIPS Workshop CDNNRIA 2018 Raphael Tang, Ashutosh Adhikari, Jimmy Lin

There exists a plethora of techniques for inducing structured sparsity in parametric models during the optimization process, with the final goal of resource-efficient inference.

Image Classification Model Compression

Progress and Tradeoffs in Neural Language Models

no code implementations2 Nov 2018 Raphael Tang, Jimmy Lin

In recent years, we have witnessed a dramatic shift towards techniques driven by neural networks for a variety of NLP tasks.

Language Modelling

Simple Attention-Based Representation Learning for Ranking Short Social Media Posts

no code implementations NAACL 2019 Peng Shi, Jinfeng Rao, Jimmy Lin

This paper explores the problem of ranking short social media posts with respect to user queries using neural networks.

Representation Learning

JavaScript Convolutional Neural Networks for Keyword Spotting in the Browser: An Experimental Analysis

1 code implementation30 Oct 2018 Jaejun Lee, Raphael Tang, Jimmy Lin

Overall, our robust, cross-device implementation for keyword spotting realizes a new paradigm for serving neural network applications, and one of our slim models reduces latency by 66% with a minimal decrease in accuracy of 4% from 94% to 90%.

Keyword Spotting Model Compression

Adaptive Pruning of Neural Language Models for Mobile Devices

no code implementations ICLR 2019 Raphael Tang, Jimmy Lin

Neural language models (NLMs) exist in an accuracy-efficiency tradeoff space where better perplexity typically comes at the cost of greater computation complexity.

Farewell Freebase: Migrating the SimpleQuestions Dataset to DBpedia

1 code implementation COLING 2018 Michael Azmy, Peng Shi, Jimmy Lin, Ihab Ilyas

To address this problem, we present SimpleDBpediaQA, a new benchmark dataset for simple question answering over knowledge graphs that was created by mapping SimpleQuestions entities and predicates from Freebase to DBpedia.

Knowledge Graphs Question Answering +1

Repeatability Corner Cases in Document Ranking: The Impact of Score Ties

no code implementations16 Jul 2018 Jimmy Lin, Peilin Yang

Due to multi-threaded indexing, which makes experimentation with large modern document collections practical, internal document ids are not assigned consistently between different index instances of the same collection, and thus score ties are broken unpredictably.

Document Ranking

Pay-Per-Request Deployment of Neural Network Models Using Serverless Architectures

no code implementations NAACL 2018 Zhucheng Tu, Mengping Li, Jimmy Lin

We demonstrate the serverless deployment of neural networks for model inferencing in NLP applications using Amazon{'}s Lambda service for feedforward evaluation and DynamoDB for storing word embeddings.

Answer Selection Sentence Classification +1

Multi-Perspective Relevance Matching with Hierarchical ConvNets for Social Media Search

3 code implementations21 May 2018 Jinfeng Rao, Wei Yang, Yuhao Zhang, Ferhan Ture, Jimmy Lin

To our best knowledge, this paper presents the first substantial work tackling search over social media posts using neural ranking models.

Information Retrieval

Strong Baselines for Simple Question Answering over Knowledge Graphs with and without Neural Networks

no code implementations NAACL 2018 Salman Mohammed, Peng Shi, Jimmy Lin

We examine the problem of question answering over knowledge graphs, focusing on simple questions that can be answered by the lookup of a single fact.

Entity Linking Knowledge Graphs +1

Deep Residual Learning for Small-Footprint Keyword Spotting

5 code implementations28 Oct 2017 Raphael Tang, Jimmy Lin

We explore the application of deep residual learning and dilated convolutions to the keyword spotting task, using the recently-released Google Speech Commands Dataset as our benchmark.

Small-Footprint Keyword Spotting

Honk: A PyTorch Reimplementation of Convolutional Neural Networks for Keyword Spotting

4 code implementations18 Oct 2017 Raphael Tang, Jimmy Lin

We describe Honk, an open-source PyTorch reimplementation of convolutional neural networks for keyword spotting that are included as examples in TensorFlow.

Keyword Spotting Speech Recognition

Integrating Lexical and Temporal Signals in Neural Ranking Models for Searching Social Media Streams

no code implementations25 Jul 2017 Jinfeng Rao, Hua He, Haotian Zhang, Ferhan Ture, Royal Sequiera, Salman Mohammed, Jimmy Lin

To our knowledge, we are the first to integrate lexical and temporal signals in an end-to-end neural network architecture, in which existing neural ranking models are used to generate query-document similarity vectors that feed into a bidirectional LSTM layer for temporal modeling.

Density Estimation Document Ranking

Exploring the Effectiveness of Convolutional Neural Networks for Answer Selection in End-to-End Question Answering

no code implementations25 Jul 2017 Royal Sequiera, Gaurav Baruah, Zhucheng Tu, Salman Mohammed, Jinfeng Rao, Haotian Zhang, Jimmy Lin

Most work on natural language question answering today focuses on answer selection: given a candidate list of sentences, determine which contains the answer.

Answer Selection

Gappy Pattern Matching on GPUs for On-Demand Extraction of Hierarchical Translation Grammars

no code implementations TACL 2015 Hua He, Jimmy Lin, Adam Lopez

We believe that GPU-based extraction of hierarchical grammars is an attractive proposition, particularly for MT applications that demand high throughput.

Machine Translation Translation

Identifying Duplicate and Contradictory Information in Wikipedia

no code implementations4 Jun 2014 Sarah Weissman, Samet Ayhan, Joshua Bradley, Jimmy Lin

Our study identifies sentences in Wikipedia articles that are either identical or highly similar by applying techniques for near-duplicate detection of web pages.

Runtime Optimizations for Prediction with Tree-Based Models

no code implementations11 Dec 2012 Nima Asadi, Jimmy Lin, Arjen P. de Vries

Tree-based models have proven to be an effective solution for web ranking as well as other problems in diverse domains.

Cannot find the paper you are looking for? You can Submit a new open access paper.