no code implementations • EMNLP (BlackboxNLP) 2021 • Zhiying Jiang, Raphael Tang, Ji Xin, Jimmy Lin
Fine-tuned pre-trained transformers achieve the state of the art in passage reranking.
no code implementations • EMNLP 2021 • Raphael Tang, Karun Kumar, Kendra Chalkley, Ji Xin, Liming Zhang, Wenyan Li, Gefei Yang, Yajie Mao, Junho Shin, Geoffrey Craig Murray, Jimmy Lin
Query auto completion (QAC) is the task of predicting a search engine user’s final query from their intermediate, incomplete query.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
1 code implementation • 30 Nov 2023 • Raphael Tang, Xinyu Zhang, Jimmy Lin, Ferhan Ture
We propose a logistic Bradley-Terry probe which predicts word pair preferences of LLMs from the words' hidden vectors.
1 code implementation • 11 Oct 2023 • Raphael Tang, Xinyu Zhang, Xueguang Ma, Jimmy Lin, Ferhan Ture
Large language models (LLMs) exhibit positional bias in how they use context, which especially complicates listwise ranking.
no code implementations • 19 Dec 2022 • Zhiying Jiang, Matthew Y. R. Yang, Mikhail Tsirlin, Raphael Tang, Jimmy Lin
Our method also performs particularly well in few-shot settings where labeled data are too scarce for DNNs to achieve a satisfying accuracy.
no code implementations • 21 Nov 2022 • Raphael Tang, Karun Kumar, Gefei Yang, Akshat Pandey, Yajie Mao, Vladislav Belyaev, Madhuri Emmadi, Craig Murray, Ferhan Ture, Jimmy Lin
In this paper, we explore training and deploying an ASR system in the label-scarce, compute-limited setting.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
1 code implementation • 10 Oct 2022 • Raphael Tang, Linqing Liu, Akshat Pandey, Zhiying Jiang, Gefei Yang, Karun Kumar, Pontus Stenetorp, Jimmy Lin, Ferhan Ture
Large-scale diffusion neural networks represent a substantial milestone in text-to-image generation, but they remain poorly understood, lacking interpretability analyses.
no code implementations • 31 Jul 2022 • Ji Xin, Raphael Tang, Zhiying Jiang, YaoLiang Yu, Jimmy Lin
There exists a wide variety of efficiency methods for natural language processing (NLP) tasks, such as pruning, distillation, dynamic inference, quantization, etc.
1 code implementation • ACL 2021 • Ji Xin, Raphael Tang, YaoLiang Yu, Jimmy Lin
To fill this void in the literature, we study in this paper selective prediction for NLP, comparing different models and confidence estimators.
1 code implementation • EACL 2021 • Ji Xin, Raphael Tang, YaoLiang Yu, Jimmy Lin
The slow speed of BERT has motivated much research on accelerating its inference, and the early exiting idea has been proposed to make trade-offs between model quality and efficiency.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Zhiying Jiang, Raphael Tang, Ji Xin, Jimmy Lin
We show the effectiveness of our method in terms of attribution and the ability to provide insight into how information flows through layers.
2 code implementations • EMNLP (NLPOSS) 2020 • Raphael Tang, Jaejun Lee, Afsaneh Razi, Julia Cambre, Ian Bicking, Jofish Kaye, Jimmy Lin
We describe Howl, an open-source wake word detection toolkit with native support for open speech datasets, like Mozilla Common Voice and Google Speech Commands.
Ranked #4 on
Keyword Spotting
on Google Speech Commands
1 code implementation • EMNLP (sdp) 2020 • Edwin Zhang, Nikhil Gupta, Raphael Tang, Xiao Han, Ronak Pradeep, Kuang Lu, Yue Zhang, Rodrigo Nogueira, Kyunghyun Cho, Hui Fang, Jimmy Lin
We present Covidex, a search engine that exploits the latest neural ranking models to provide information access to the COVID-19 Open Research Dataset curated by the Allen Institute for AI.
no code implementations • WS 2020 • Ashutosh Adhikari, Achyudh Ram, Raphael Tang, William L. Hamilton, Jimmy Lin
Fine-tuned variants of BERT are able to achieve state-of-the-art accuracy on many natural language processing tasks, although at significant computational costs.
1 code implementation • ACL 2020 • Raphael Tang, Jaejun Lee, Ji Xin, Xinyu Liu, Yao-Liang Yu, Jimmy Lin
In natural language processing, a recently popular line of work explores how to best report the experimental results of neural networks.
3 code implementations • ACL 2020 • Ji Xin, Raphael Tang, Jaejun Lee, Yao-Liang Yu, Jimmy Lin
Large-scale pre-trained language models such as BERT have brought significant improvements to NLP applications.
1 code implementation • 23 Apr 2020 • Raphael Tang, Rodrigo Nogueira, Edwin Zhang, Nikhil Gupta, Phuong Cam, Kyunghyun Cho, Jimmy Lin
We present CovidQA, the beginnings of a question answering dataset specifically designed for COVID-19, built by hand from knowledge gathered from Kaggle's COVID-19 Open Research Dataset Challenge.
no code implementations • 8 Nov 2019 • Jaejun Lee, Raphael Tang, Jimmy Lin
We show that only a fourth of the final layers need to be fine-tuned to achieve 90% of the original quality.
no code implementations • 7 Nov 2019 • Yinan Zhang, Raphael Tang, Jimmy Lin
In this paper, we hypothesize that introducing an explicit, constrained pairwise word interaction mechanism to pretrained language models improves their effectiveness on semantic similarity tasks.
no code implementations • IJCNLP 2019 • Linqing Liu, Wei Yang, Jinfeng Rao, Raphael Tang, Jimmy Lin
Semantic similarity modeling is central to many NLP problems such as natural language inference and question answering.
1 code implementation • IJCNLP 2019 • Jaejun Lee, Raphael Tang, Jimmy Lin
Used for simple commands recognition on devices from smart speakers to mobile phones, keyword spotting systems are everywhere.
1 code implementation • WS 2019 • Raphael Tang, Yao Lu, Jimmy Lin
Knowledge distillation can effectively transfer knowledge from BERT, a deep language representation model, to traditional, shallow word embedding-based neural networks, helping them approach or exceed the quality of other heavyweight language representation models.
1 code implementation • NAACL 2019 • Ashutosh Adhikari, Achyudh Ram, Raphael Tang, Jimmy Lin
Neural network models for many NLP tasks have grown increasingly complex in recent years, making training and deployment more difficult.
Ranked #2 on
Document Classification
on IMDb-M
3 code implementations • 17 Apr 2019 • Ashutosh Adhikari, Achyudh Ram, Raphael Tang, Jimmy Lin
We present, to our knowledge, the first application of BERT to document classification.
Ranked #1 on
Document Classification
on Yelp-14
4 code implementations • 28 Mar 2019 • Raphael Tang, Yao Lu, Linqing Liu, Lili Mou, Olga Vechtomova, Jimmy Lin
In the natural language processing literature, neural networks are becoming increasingly deeper and complex.
Ranked #56 on
Sentiment Analysis
on SST-2 Binary classification
no code implementations • 19 Dec 2018 • Raphael Tang, Gefei Yang, Hong Wei, Yajie Mao, Ferhan Ture, Jimmy Lin
Voice-enabled commercial products are ubiquitous, typically enabled by lightweight on-device keyword spotting (KWS) and full automatic speech recognition (ASR) in the cloud.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • NIPS Workshop CDNNRIA 2018 • Raphael Tang, Ashutosh Adhikari, Jimmy Lin
There exists a plethora of techniques for inducing structured sparsity in parametric models during the optimization process, with the final goal of resource-efficient inference.
no code implementations • 2 Nov 2018 • Raphael Tang, Jimmy Lin
In recent years, we have witnessed a dramatic shift towards techniques driven by neural networks for a variety of NLP tasks.
1 code implementation • 30 Oct 2018 • Jaejun Lee, Raphael Tang, Jimmy Lin
Overall, our robust, cross-device implementation for keyword spotting realizes a new paradigm for serving neural network applications, and one of our slim models reduces latency by 66% with a minimal decrease in accuracy of 4% from 94% to 90%.
no code implementations • ICLR 2019 • Raphael Tang, Jimmy Lin
Neural language models (NLMs) exist in an accuracy-efficiency tradeoff space where better perplexity typically comes at the cost of greater computation complexity.
4 code implementations • 28 Oct 2017 • Raphael Tang, Jimmy Lin
We explore the application of deep residual learning and dilated convolutions to the keyword spotting task, using the recently-released Google Speech Commands Dataset as our benchmark.
4 code implementations • 18 Oct 2017 • Raphael Tang, Jimmy Lin
We describe Honk, an open-source PyTorch reimplementation of convolutional neural networks for keyword spotting that are included as examples in TensorFlow.