In this paper, we propose a novel framework E2GRE (Entity and Evidence Guided Relation Extraction) that jointly extracts relations and the underlying evidence sentences by using large pretrained language model (LM) as input encoder.
However, existing methods cannot deal with large scenes containing hundreds of people, which encounter the challenges of large number of people, large variations in human scale, and complex spatial distribution.
This allows us to encode robust, position-independent character-level information in the internal representations of subword-based models.
The tasks of humor understanding and generation are challenging and subjective even for humans, requiring commonsense and real-world knowledge to master.
In this work, we propose a new task, context-situated pun generation, where a specific context represented by a set of keywords is provided, and the task is to first identify suitable pun words that are appropriate for the context, then generate puns based on the context keywords and the identified pun words.
Most existing OCR methods focus on alphanumeric characters due to the popularity of English and numbers, as well as their corresponding datasets.
Distilling supervision signal from a long sequence to make predictions is a challenging task in machine learning, especially when not all elements in the input sequence contribute equally to the desired output.
To this end, we propose a novel network to encode face videos into the latent space of StyleGAN for semantic face video manipulation.
Existing self-supervised 3D human pose estimation schemes have largely relied on weak supervisions like consistency loss to guide the learning, which, inevitably, leads to inferior results in real-world scenarios with unseen poses.
Ranked #15 on 3D Human Pose Estimation on MPI-INF-3DHP
These questions often involve three time-related challenges that previous work fail to adequately address: 1) questions often do not specify exact timestamps of interest (e. g., "Obama" instead of 2000); 2) subtle lexical differences in time relations (e. g., "before" vs "after"); 3) off-the-shelf temporal KG embeddings that previous work builds on ignore the temporal order of timestamps, which is crucial for answering temporal-order related questions.
Ranked #2 on Question Answering on CronQuestions
In this study, we aim to predict the plausible future action steps given an observation of the past and study the task of instructional activity anticipation.
Large pre-trained language models (PLMs) have led to great success on various commonsense question answering (QA) tasks in an end-to-end fashion.
no code implementations • 16 Aug 2021 • Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel, Noah Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Koh, Mark Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Malik, Christopher D. Manning, Suvir Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Ben Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, Julian Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, aditi raghunathan, Rob Reich, Hongyu Ren, Frieda Rong, Yusuf Roohani, Camilo Ruiz, Jack Ryan, Christopher Ré, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishnan Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Wang, Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, Percy Liang
AI is undergoing a paradigm shift with the rise of models (e. g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks.
Successful applications of InfoNCE and its variants have popularized the use of contrastive variational mutual information (MI) estimators in machine learning.
Understanding the temporal relations among events in text is a critical aspect of reading comprehension, which can be evaluated in the form of temporal question answering (TQA).
Recent work (Takanobu et al., 2020) proposed the system-wise evaluation on dialog systems and found that improvement on individual components (e. g., NLU, policy) in prior work may not necessarily bring benefit to pipeline systems in system-wise evaluation.
The core of our algorithm is to introduce a novel variance reduction term to the gradient estimation when performing the task adaptation.
Conversational artificial intelligence (ConvAI) systems have attracted much academic and commercial attention recently, making significant progress on both fronts.
A crucial component for the scene text based reasoning required for TextVQA and TextCaps datasets involve detecting and recognizing text present in the images using an optical character recognition (OCR) system.
In this paper, we propose UniGNN, a unified framework for interpreting the message passing process in graph and hypergraph neural networks, which can generalize general GNN models into hypergraphs.
Hypergraphs are a generalized data structure of graphs to model higher-order correlations among entities, which have been successfully adopted into various research domains.
Recent advances in OCR have shown that an end-to-end (E2E) training pipeline that includes both detection and recognition leads to the best results.
Recent work on aspect-level sentiment classification has demonstrated the efficacy of incorporating syntactic structures such as dependency trees with graph neural networks(GNN), but these approaches are usually vulnerable to parsing errors.
In this paper, we propose two novel techniques, adaptive thresholding and localized context pooling, to solve the multi-label and multi-entity problems.
Ranked #4 on Relation Extraction on ReDocRED
Currently, at every layer, attention is computed between connected pairs of nodes and depends solely on the representation of the two nodes.
Here, we propose to study the inductive learning setting for CKG completion where unseen entities may present at test time.
Document-level relation extraction is a challenging task which requires reasoning over multiple sentences in order to predict relations in a document.
Ranked #12 on Relation Extraction on DocRED
Recent end-to-end trainable methods for scene text spotting, integrating detection and recognition, showed much progress.
The reliability of using fully convolutional networks (FCNs) has been successfully demonstrated by recent studies in many speech applications.
Our core innovation is the learning of a region-phrase score function, based on which an image-sentence score function is further constructed.
Recent Transformer-based models such as Transformer-XL and BERT have achieved huge success on various natural language processing tasks.
This work presents a novel approach for speaker diarization to leverage lexical information provided by automatic speech recognition.
We validate the proposed GSN on two NLP tasks: interpretable multi-hop reading comprehension on HotpotQA and graph based fact verification on FEVER.
This paper attempts to find a solution to guarantee the effectiveness of waterline detection for inland maritime applications with general digital camera sensor.
Translational distance-based knowledge graph embedding has shown progressive improvements on the link prediction task, from TransE to the latest state-of-the-art RotatE.
Ranked #14 on Link Prediction on FB15k-237
In this paper, we propose a machine learning framework to obtain speech emotion representations by limiting the effect of speaker variability in the speech signals.
We test the relation module on the SQuAD 2. 0 dataset using both the BiDAF and BERT models as baseline readers.
Interpretable multi-hop reading comprehension (RC) over multiple documents is a challenging problem because it demands reasoning over multiple information sources and explaining the answer prediction by providing supporting evidences.
Graph Attention Networks (GATs) are the state-of-the-art neural architecture for representation learning with graphs.
Aspect-level sentiment classification aims to identify the sentiment polarity towards a specific aspect term in a sentence.
In this paper, we aim to improve a MRC model's ability to determine whether a question has an answer in a given context (e. g. the recently proposed SQuAD 2. 0 task).
Speaker verification systems often degrade significantly when there is a language mismatch between training and testing data.
In this paper, we propose a new end-to-end graph neural network (GNN) based algorithm for MIL: we treat each bag as a graph and use GNN to learn the bag embedding, in order to explore the useful structural information among instances in bags.
We introduce a heterogeneous graph with different types of nodes and edges, which is named as Heterogeneous Document-Entity (HDE) graph.
no code implementations • 16 Apr 2019 • Kong Aik Lee, Ville Hautamaki, Tomi Kinnunen, Hitoshi Yamamoto, Koji Okabe, Ville Vestman, Jing Huang, Guohong Ding, Hanwu Sun, Anthony Larcher, Rohan Kumar Das, Haizhou Li, Mickael Rouvier, Pierre-Michel Bousquet, Wei Rao, Qing Wang, Chunlei Zhang, Fahimeh Bahmaninezhad, Hector Delgado, Jose Patino, Qiongqiong Wang, Ling Guo, Takafumi Koshinaka, Jiacen Zhang, Koichi Shinoda, Trung Ngo Trong, Md Sahidullah, Fan Lu, Yun Tang, Ming Tu, Kah Kuan Teh, Huy Dat Tran, Kuruvachan K. George, Ivan Kukanov, Florent Desnous, Jichen Yang, Emre Yilmaz, Longting Xu, Jean-Francois Bonastre, Cheng-Lin Xu, Zhi Hao Lim, Eng Siong Chng, Shivesh Ranjan, John H. L. Hansen, Massimiliano Todisco, Nicholas Evans
The I4U consortium was established to facilitate a joint entry to NIST speaker recognition evaluations (SRE).
This paper aims to improve the widely used deep speaker embedding x-vector model.
In this work, we extend the scene-text extraction system at Facebook, Rosetta, to efficiently handle text in various orientations.
The recent graph convolutional network (GCN) provides another way of learning graph node embedding by successfully utilizing graph connectivity structure.
Ranked #23 on Link Prediction on FB15k-237
We present the DeepGlobe 2018 Satellite Image Understanding Challenge, which includes three public competitions for segmentation, detection, and classification tasks on satellite images.
This article discusses a framework to support the design and end-to-end planning of fixed millimeter-wave networks.