We present a new formulation for structured information extraction (SIE) from visually rich documents.
Ranked #2 on Entity Linking on FUNSD
Different from traditional recommendation, takeaway recommendation faces two main challenges: (1) Dual Interaction-Aware Preference Modeling.
However, they focus on the contrastive learning between samples, which neglect the inherent self-similar prior in physiological signals and seem to have a limited ability to cope with noisy.
In this work, instead of directly predicting the pixel-level segmentation masks, the problem of referring image segmentation is formulated as sequential polygon generation, and the predicted polygons can be later converted into segmentation masks.
Ranked #1 on Referring Expression Segmentation on ReferIt (using extra training data)
This strategy involves determining the depth of such regions by progressing from the edges towards the interior, prioritizing accurate regions over coarse regions.
We propose a semi-supervised approach for contemporary object detectors following the teacher-student dual model framework.
Recent work introduced progressive network growing as a promising way to ease the training for large GANs, but the model design and architecture-growing strategy still remain under-explored and needs manual design for different image data.
Computer vision applications such as visual relationship detection and human object interaction can be formulated as a composite (structured) set detection problem in which both the parts (subject, object, and predicate) and the sum (triplet as a whole) are to be detected in a hierarchical fashion.
Microsatellite instability (MSI) is associated with several tumor types and its status has become increasingly vital in guiding patient treatment decisions.
Deep neural networks can model images with rich latent representations, but they cannot naturally conceptualize structures of object categories in a human-perceptible way.
The essential ingredients of our methods are confidence-calibrated classifiers, data relabeling, and the leave-one-out strategy for modeling novel classes under the hierarchical taxonomy.
Several recent works have empirically observed that Convolutional Neural Nets (CNNs) are (approximately) invertible.
Our training objective encourages better localization on single images, incorporates text phrases in a broad range, and properly pairs image regions with text phrases into positive and negative examples.
Inspired by the recent trend toward revisiting the importance of unsupervised learning, we investigate joint supervised and unsupervised learning in a large-scale setting by augmenting existing neural networks with decoding pathways for reconstruction.
In addition to identifying the content within a single image, relating images and generating related images are critical tasks for image understanding.
Object detection systems based on the deep convolutional neural network (CNN) have recently made ground- breaking advances on several object detection benchmarks.
By assuming a human face as piece-wise planar surfaces, where each surface corresponds to a facial part, we develop in this paper a Constrained Part-based Alignment (CPA) algorithm for face recognition across pose and/or expression.