We propose to employ phrase expressions as another interaction input to infer the attributes of target object.
Deep image matting methods have achieved increasingly better results on benchmarks (e. g., Composition-1k/alphamatting. com).
Generalized few-shot semantic segmentation was introduced to move beyond only evaluating few-shot segmentation models on novel classes to include testing their ability to remember base classes.
We also introduce Text Refinement Network (TexRNet), a novel text segmentation approach that adapts to the unique properties of text, e. g. non-convex boundary, diverse texture, etc., which often impose burdens on traditional segmentation models.
This paper presents a GAN for generating images of handwritten lines conditioned on arbitrary text and latent style vectors.
We demonstrate how to increase overall model capacity to achieve improved performance, by introducing objectness, which is class-agnostic and so not prone to overfitting, for complementary use with class-specific features.
We propose a novel interactive architecture and a novel training scheme that are both tailored to better exploit the user workflow.
Chart question answering (CQA) is a newly proposed visual question answering (VQA) task where an algorithm must answer questions about data visualizations, e. g. bar charts, pie charts, and line graphs.
We present a method to perform automatic image recoloring based on the distribution of colors associated with objects present in an image.
The subtleties of human perception, as measured by vision scientists through the use of psychophysics, are important clues to the internal workings of visual recognition.
End-to-end sequential learning to explore spatial-temporal features for video segmentation is largely limited by the scale of available video segmentation datasets, i. e., even the largest video segmentation dataset only contains 90 short video clips.
Instead of relying on pre-defined low-level image features, our method adaptively predicts object boundaries according to image content and user interactions.
We present a new image search technique that, given a background image, returns compatible foreground objects for image compositing tasks.
Despite decades of research, offline handwriting recognition (HWR) of degraded historical documents remains a challenging problem, which if solved could greatly improve the searchability of online cultural heritage archives.
Content-aware image completion or in-painting is a fundamental tool for the correction of defects or removal of objects in images.
Deep generative models have shown success in automatically synthesizing missing image regions using surrounding context.
One property that remains lacking in image captions generated by contemporary methods is discriminability: being able to tell two images apart given the caption for one of them.
Bar charts are an effective way to convey numeric information, but today's algorithms cannot parse them.
In this paper, we propose a novel segmentation approach that uses a rectangle as a soft constraint by transforming it into an Euclidean distance map.
We consider the problem of two-frame depth from defocus in conditions unsuitable for existing methods yet typical of everyday photography: a handheld cellphone camera, a small aperture, a non-stationary scene and sparse surface texture.
We evaluate our algorithm on the image matting benchmark, our testing set, and a wide variety of real images.
This paper introduces an approach to regularize 2. 5D surface normal and depth predictions at each pixel given a single input image.
We study the problem of Salient Object Subitizing, i. e. predicting the existence and the number of salient objects in an image using holistic cues.
While these methods achieve better results than color-based methods, they are still limited in either using depth as an additional color channel or simply combining depth with color in a linear way.
Our system leverages a Convolutional-Neural-Network model to generate location proposals of salient objects.
Existing methods attempt to estimate a spatially varying illumination map, however, results are error prone and the resulting illumination maps are too low-resolution to be used for proper spatially varying white-balance correction.
Motivated by the application of fact-level image understanding, we present an automatic method for data collection of structured visual facts from images with captions.
We develop a deep learning algorithm for contour detection with a fully convolutional encoder-decoder network.
A limitation in color constancy research is the inability to establish ground truth colors for evaluating corrected images.
Powered by this fast MBD transform algorithm, the proposed salient object detection method runs at 80 FPS, and significantly outperforms previous methods with similar speed on four large benchmark datasets, and achieves comparable or better performance than state-of-the-art methods.
Ranked #6 on Video Salient Object Detection on DAVSOD-easy35 (using extra training data)
We show that learning visual facts in a structured way enables not only a uniform but also generalizable visual understanding.
In this paper, we propose a novel deep neural network framework embedded with low-level features (LCNN) for salient object detection in complex images.
The transferred local shape masks constitute a patch-level segmentation solution space and we thus develop a novel cascade algorithm, PatchCut, for coarse-to-fine object segmentation.
More recent state-of-the-art methods employ learning-based techniques that produce better results, but often rely on complex features and have long evaluation and training times.
By allowing for interactions between the depth and semantic information, the joint network provides more accurate depth prediction than a state-of-the-art CNN trained solely for depth prediction .
For most natural images, some boundary superpixels serve as the background labels and the saliency of other superpixels are determined by ranking their similarities to the boundary labels based on an inner propagation scheme.
Segmenting semantic objects from images and parsing them into their respective semantic parts are fundamental steps towards detailed object understanding in computer vision.
This paper presents a scalable scene parsing algorithm based on image retrieval and superpixel matching.
The first is that the range in which the foreground and background are sampled is often limited to such an extent that the true foreground and background colors are not present.