Specifically, we first leverage LLMs to construct a "question/rationale graph" by using knowledge extraction prompting given the initial question and the rationales generated in the previous steps.
We introduce Sketch-based Video Object Localization (SVOL), a new task aimed at localizing spatio-temporal object boxes in video queried by the input sketch.
In this paper, we present a Self-Positioning point-based Transformer (SPoTr), which is designed to capture both local and global shape contexts with reduced complexity.
Ranked #2 on 3D Part Segmentation on ShapeNet-Part
To address these issues, we propose Question Answering Transformer (QAT), which is designed to jointly reason over language and graphs with respect to entity relations in a unified manner.
Transformer-based models have recently shown success in representation learning on graph-structured data beyond natural language processing and computer vision.
We also propose a simple and effective semi-supervised learning strategy with generated samples from MH-Aug. Our extensive experiments demonstrate that MH-Aug can generate a sequence of samples according to the target distribution to significantly improve the performance of GNNs.
To our surprise, we found that training schedule shows divide-and-conquer-like pattern: time segments are first diversified regardless of the target, then coupled with each target, and fine-tuned to the target again.
To address the two common problems of graph convolution, in this paper, we propose Deformable Graph Convolutional Networks (Deformable GCNs) that adaptively perform convolution in multiple latent spaces and capture short/long-range dependencies between nodes.
Current Visual Document Understanding (VDU) methods outsource the task of reading text to off-the-shelf Optical Character Recognition (OCR) engines and focus on the understanding task with the OCR outputs.
Ranked #10 on Document Image Classification on RVL-CDIP
Our method is learning to learn a primary task with various auxiliary tasks to improve generalization performance.
Nonlinear differential equations are challenging to solve numerically and are important to understanding the dynamics of many physical systems.
Motivated by the SSP property and a generalized Runge-Kutta method, we propose Strong Stability Preserving networks (SSP networks) which improve robustness against adversarial attacks.
Our proposed method is learning to learn a primary task by predicting meta-paths as auxiliary tasks.