no code implementations • 25 Apr 2023 • Wang-Chiew Tan
This paper presents an opinion on the potential of using large language models to query on both unstructured and structured data.
1 code implementation • 5 Apr 2021 • Yoshihiko Suhara, Jinfeng Li, Yuliang Li, Dan Zhang, Çağatay Demiralp, Chen Chen, Wang-Chiew Tan
Inferring meta information about tables, such as column headers or relationships between columns, is an active research topic in data management as we find many tables are missing some of this information.
Ranked #1 on Column Type Annotation on VizNet-Sato-MultiColumn
1 code implementation • Findings (EMNLP) 2021 • Hayate Iso, Xiaolan Wang, Yoshihiko Suhara, Stefanos Angelidis, Wang-Chiew Tan
We found that text autoencoders tend to generate overly generic summaries from simply averaged latent vectors due to an unexpected $L_2$-norm shrinkage in the aggregated latent vectors, which we refer to as summary vector degeneration.
Ranked #1 on Unsupervised Opinion Summarization on Amazon
no code implementations • 11 Jul 2020 • Jinfeng Li, Yuliang Li, Xiaolan Wang, Wang-Chiew Tan
We embark on a systematic study to investigate the following question: Are deep models the best performing model for all semantic tagging tasks?
no code implementations • 29 May 2020 • Nofar Carmeli, Xiaolan Wang, Yoshihiko Suhara, Stefanos Angelidis, Yuliang Li, Jinfeng Li, Wang-Chiew Tan
The Web is a major resource of both factual and subjective information.
no code implementations • 13 May 2020 • Sainyam Galhotra, Behzad Golshan, Wang-Chiew Tan
At the same time, creating a labeled subset of the data can be costly and even infeasible in imbalanced settings.
1 code implementation • ACL 2020 • Yoshihiko Suhara, Xiaolan Wang, Stefanos Angelidis, Wang-Chiew Tan
The framework uses an Aspect-based Sentiment Analysis model to extract opinion phrases from reviews, and trains a Transformer model to reconstruct the original reviews from these extractions.
1 code implementation • EMNLP 2020 • Johannes Bjerva, Nikita Bhutani, Behzad Golshan, Wang-Chiew Tan, Isabelle Augenstein
We find that subjectivity is also an important feature in the case of QA, albeit with more intricate interactions between subjectivity and QA performance.
no code implementations • 6 Apr 2020 • Aaron Traylor, Chen Chen, Behzad Golshan, Xiaolan Wang, Yuliang Li, Yoshihiko Suhara, Jinfeng Li, Cagatay Demiralp, Wang-Chiew Tan
In this paper, we introduce xSense, an effective system for review comprehension using domain-specific commonsense knowledge bases (xSense KBs).
1 code implementation • 1 Apr 2020 • Yuliang Li, Jinfeng Li, Yoshihiko Suhara, AnHai Doan, Wang-Chiew Tan
Our experiments show that a straightforward application of language models such as BERT, DistilBERT, or RoBERTa pre-trained on large text corpora already significantly improves the matching quality and outperforms previous state-of-the-art (SOTA), by up to 29% of F1 score on benchmark datasets.
Ranked #2 on Entity Resolution on WDC Watches-xlarge
no code implementations • 31 Mar 2020 • Aaron Feng, Shuwei Chen, Yuliang Li, Hiroshi Matsuda, Hidekazu Tamaki, Wang-Chiew Tan
Also, we found that the existing search algorithms do not meet the search quality standard required by production systems.
1 code implementation • AKBC 2020 • Nikita Bhutani, Aaron Traylor, Chen Chen, Xiaolan Wang, Behzad Golshan, Wang-Chiew Tan
Since it can be expensive to obtain training data to learn to extract implications for each new domain of reviews, we propose an unsupervised KBC system, Sampo, Specifically, Sampo is tailored to build KBs for domains where many reviews on the same domain are available.
1 code implementation • 7 Feb 2020 • Zhengjie Miao, Yuliang Li, Xiaolan Wang, Wang-Chiew Tan
A novelty of Snippext is its clever use of a two-prong approach to achieve state-of-the-art (SOTA) performance with little labeled training data through: (1) data augmentation to automatically generate more labeled training data from existing ones, and (2) a semi-supervised learning technique to leverage the massive amount of unlabeled data in addition to the (limited amount of) labeled data.
1 code implementation • 15 Jan 2020 • Xiong Zhang, Jonathan Engel, Sara Evensen, Yuliang Li, Çağatay Demiralp, Wang-Chiew Tan
They contain a wealth of information about the opinions and experiences of users, which can help better understand consumer decisions and improve user experience with products and services.
1 code implementation • 14 Nov 2019 • Dan Zhang, Yoshihiko Suhara, Jinfeng Li, Madelon Hulsebos, Çağatay Demiralp, Wang-Chiew Tan
Detecting the semantic types of data columns in relational tables is important for various data preparation and information retrieval tasks such as data cleaning, schema matching, data discovery, and semantic search.
Ranked #2 on Column Type Annotation on VizNet-Sato-MultiColumn
no code implementations • WS 2019 • Danni Ma, Chen Chen, Behzad Golshan, Wang-Chiew Tan
Paraphrases are important linguistic resources for a wide variety of NLP applications.
1 code implementation • 15 Sep 2019 • Wataru Hirota, Yoshihiko Suhara, Behzad Golshan, Wang-Chiew Tan
We present Emu, a system that semantically enhances multilingual sentence embeddings.
no code implementations • 23 Jul 2019 • Sara Evensen, Yoshihiko Suhara, Alon Halevy, Vivian Li, Wang-Chiew Tan, Saran Mumick
We prototype one necessary component of such a system, the Happiness Entailment Recognition (HER) module, which takes as input a short text describing an event, a candidate suggestion, and outputs a determination about whether the suggestion is more likely to be good for this user based on the event described.
no code implementations • 4 Mar 2019 • Sara Evensen, Aaron Feng, Alon Halevy, Jinfeng Li, Vivian Li, Yuliang Li, Huining Liu, George Mihaila, John Morales, Natalie Nuno, Ekaterina Pavlovic, Wang-Chiew Tan, Xiaolan Wang
We describe Voyageur, which is an application of experiential search to the domain of travel.
no code implementations • NAACL 2019 • Nikita Bhutani, Yoshihiko Suhara, Wang-Chiew Tan, Alon Halevy, H. V. Jagadish
We describe NeurON, a system for extracting tuples from question-answer pairs.
no code implementations • 25 Feb 2019 • Yuliang Li, Aaron Xixuan Feng, Jinfeng Li, Saran Mumick, Alon Halevy, Vivian Li, Wang-Chiew Tan
In order to support experiential queries, a database system needs to model subjective data and also be able to process queries where the user can express varied subjective experiences in words chosen by the user, in addition to specifying predicates involving objective attributes.
no code implementations • WS 2018 • Dan Iter, Alon Halevy, Wang-Chiew Tan
A common need of NLP applications is to extract structured data from text corpora in order to perform analytics or trigger an appropriate action.
no code implementations • 3 May 2018 • Xiaolan Wang, Aaron Feng, Behzad Golshan, Alon Halevy, George Mihaila, Hidekazu Oiwa, Wang-Chiew Tan
KOKO is novel in that its extraction language simultaneously supports conditions on the surface of the text and on the structure of the dependency parse tree of sentences, thereby allowing for more refined extractions.
2 code implementations • LREC 2018 • Akari Asai, Sara Evensen, Behzad Golshan, Alon Halevy, Vivian Li, Andrei Lopatenko, Daniela Stepanov, Yoshihiko Suhara, Wang-Chiew Tan, Yinzhan Xu
The science of happiness is an area of positive psychology concerned with understanding what behaviors make people happy in a sustainable fashion.