To learn the semantic embeddings of instances and labels with raw text, we propose to pre-train Transformer-based encoders with self-supervised contrastive losses.
We also provide a theoretical analysis that justifies the use of XMC over link prediction and motivates integrating XR-Transformers, a powerful method for solving XMC problems, into the GIANT framework.
Ranked #1 on Node Property Prediction on ogbn-papers100M
Despite leveraging pre-trained transformer models for text representation, the fine-tuning procedure of transformer models on large label space still has lengthy computational time even with powerful GPUs.
Partition-based methods are increasingly-used in extreme multi-label classification (XMC) problems due to their scalability to large output spaces (e. g., millions or more).
1 code implementation • 23 Jun 2021 • Wei-Cheng Chang, Daniel Jiang, Hsiang-Fu Yu, Choon-Hui Teo, Jiong Zhang, Kai Zhong, Kedarnath Kolluri, Qie Hu, Nikhil Shandilya, Vyacheslav Ievgrafov, Japinder Singh, Inderjit S. Dhillon
In this paper, we aim to improve semantic product search by using tree-based XMC models where inference time complexity is logarithmic in the number of products.
In this paper, we propose the Prediction for Enormous and Correlated Output Spaces (PECOS) framework, a versatile and modular machine learning framework for solving prediction problems for very large output spaces, and apply it to the eXtreme Multilabel Ranking (XMR) problem: given an input instance, find and rank the most relevant items from an enormous but fixed and finite output space.
NCK is crucial for successful inference with SVGD in high dimension, as it adapts the kernel to the noise level of the score estimate.
Keywords: Multivariate Time Series, Change-point Detection, Graph Neural Networks
We consider the large-scale query-document retrieval problem: given a query (e. g., a question), return the set of relevant documents (e. g., paragraphs containing the answer) from a large document corpus.
While neural sequence generation models achieve initial success for many NLP applications, the canonical decoding procedure with left-to-right generation order (i. e., autoregressive) in one-pass can not reflect the true nature of human revising a sentence to obtain a refined result.
However, naively applying deep transformer models to the XMC problem leads to sub-optimal performance due to the large output space and the label sparsity issue.
While learning the kernel in a data driven way has been investigated, in this paper we explore learning the spectral distribution of kernel via implicit generative models parametrized by deep neural networks.
To circumvent the softmax bottleneck, SeCSeq compresses labels into sequences of semantic-aware compact codes, on which Seq2Seq models are trained.
In this paper, propose a method to effectively encode the local and global contextual information for each target word using a three-part neural network approach.
In this paper, we propose a low-rank coordinate descent approach to structured semidefinite programming with diagonal constraints.
In this paper, we propose to improve both the model expressiveness of GMMN and its computational efficiency by introducing adversarial kernel learning techniques, as the replacement of a fixed Gaussian kernel in the original GMMN.
Multivariate time series forecasting is an important machine learning problem across many domains, including predictions of solar plant energy output, electricity consumption, and traffic jam situation.
Ranked #5 on Univariate Time Series Forecasting on Solar-Power