no code implementations • ECCV 2020 • Henghui Ding, Scott Cohen, Brian Price, Xudong Jiang
We propose to employ phrase expressions as another interaction input to infer the attributes of target object.
no code implementations • 3 Dec 2024 • Tianyu Wang, Jianming Zhang, Haitian Zheng, Zhihong Ding, Scott Cohen, Zhe Lin, Wei Xiong, Chi-Wing Fu, Luis Figueroa, Soo Ye Kim
MetaShadow combines the strengths of two cooperative components: Shadow Analyzer, for object-centered shadow detection and removal, and Shadow Synthesizer, for reference-based controllable shadow synthesis.
no code implementations • CVPR 2024 • Eric Slyman, Stefan Lee, Scott Cohen, Kushal Kafle
Recent dataset deduplication techniques have demonstrated that content-aware dataset pruning can dramatically reduce the cost of training Vision-Language Pretrained (VLP) models without significant performance losses compared to training on the original dataset.
no code implementations • 23 Apr 2024 • Hang Hua, Jing Shi, Kushal Kafle, Simon Jenni, Daoan Zhang, John Collomosse, Scott Cohen, Jiebo Luo
To address this, we propose FineMatch, a new aspect-based fine-grained text and image matching benchmark, focusing on text and image mismatch detection and correction.
no code implementations • CVPR 2024 • Yizhi Song, Zhifei Zhang, Zhe Lin, Scott Cohen, Brian Price, Jianming Zhang, Soo Ye Kim, He Zhang, Wei Xiong, Daniel Aliaga
Generative object compositing emerges as a promising new avenue for compositional image editing.
no code implementations • 4 Dec 2023 • Kangfu Mei, Luis Figueroa, Zhe Lin, Zhihong Ding, Scott Cohen, Vishal M. Patel
Recovering textures under shadows has remained a challenging problem due to the difficulty of inferring shadow-free scenes from shadow images.
1 code implementation • 24 Aug 2023 • Ziyan Yang, Kushal Kafle, Zhe Lin, Scott Cohen, Zhihong Ding, Vicente Ordonez
To solve this problem, we propose an auto-regressive model that given a subject, it predicts its relations, objects, and object locations by casting this output as a sequence of tokens.
no code implementations • CVPR 2023 • Hoang M. Le, Brian Price, Scott Cohen, Michael S. Brown
Inspired by neural implicit representations for 2D images, we propose a method that optimizes a lightweight multi-layer-perceptron (MLP) model during the gamut reduction step to predict the clipped values.
1 code implementation • CVPR 2023 • Sijie Zhu, Zhe Lin, Scott Cohen, Jason Kuen, Zhifei Zhang, Chen Chen
Given a background image and a segmented object, the goal is to train a model to predict plausible placements (location and scale) of the object for compositing.
no code implementations • CVPR 2023 • Yizhi Song, Zhifei Zhang, Zhe Lin, Scott Cohen, Brian Price, Jianming Zhang, Soo Ye Kim, Daniel Aliaga
Object compositing based on 2D images is a challenging problem since it typically involves multiple processing stages such as color harmonization, geometry correction and shadow generation to generate realistic results.
no code implementations • 13 Dec 2022 • Haitian Zheng, Zhe Lin, Jingwan Lu, Scott Cohen, Eli Shechtman, Connelly Barnes, Jianming Zhang, Qing Liu, Yuqian Zhou, Sohrab Amirghodsi, Jiebo Luo
Moreover, the object-level discriminators take aligned instances as inputs to enforce the realism of individual objects.
1 code implementation • 2 Dec 2022 • Yizhi Song, Zhifei Zhang, Zhe Lin, Scott Cohen, Brian Price, Jianming Zhang, Soo Ye Kim, Daniel Aliaga
Object compositing based on 2D images is a challenging problem since it typically involves multiple processing stages such as color harmonization, geometry correction and shadow generation to generate realistic results.
no code implementations • 31 Mar 2022 • Sijie Zhu, Zhe Lin, Scott Cohen, Jason Kuen, Zhifei Zhang, Chen Chen
To move a step further, this paper proposes GALA (Geometry-and-Lighting-Aware), a generic foreground object search method with discriminative modeling on geometry and lighting compatibility for open-world image compositing.
1 code implementation • 22 Mar 2022 • Haitian Zheng, Zhe Lin, Jingwan Lu, Scott Cohen, Eli Shechtman, Connelly Barnes, Jianming Zhang, Ning Xu, Sohrab Amirghodsi, Jiebo Luo
We propose cascaded modulation GAN (CM-GAN), a new network design consisting of an encoder with Fourier convolution blocks that extract multi-scale feature representations from the input image with holes and a dual-stream decoder with a novel cascaded global-spatial modulation block at each scale level.
Ranked #3 on Image Inpainting on Places2
no code implementations • 21 Dec 2021 • Josh Myers-Dean, Yinan Zhao, Brian Price, Scott Cohen, Danna Gurari
Generalized few-shot semantic segmentation was introduced to move beyond only evaluating few-shot segmentation models on novel classes to include testing their ability to remember base classes.
no code implementations • CVPR 2021 • Khoi Pham, Kushal Kafle, Zhe Lin, Zhihong Ding, Scott Cohen, Quan Tran, Abhinav Shrivastava
In this paper, we introduce a large-scale in-the-wild visual attribute prediction dataset consisting of over 927K attribute annotations for over 260K object instances.
2 code implementations • ICCV 2021 • Hareesh Ravi, Kushal Kafle, Scott Cohen, Jonathan Brandt, Mubbasir Kapadia
Visual storytelling and story comprehension are uniquely human skills that play a central role in how we learn about and experience the world.
1 code implementation • 14 Dec 2020 • Haitian Zheng, Zhe Lin, Jingwan Lu, Scott Cohen, Jianming Zhang, Ning Xu, Jiebo Luo
A core problem of this task is how to transfer visual details from the input images to the new semantic layout while making the resulting image visually realistic.
1 code implementation • CVPR 2020 • Chenyun Wu, Zhe Lin, Scott Cohen, Trung Bui, Subhransu Maji
We consider the problem of segmenting image regions given a natural language phrase, and study it on a novel dataset of 77, 262 images and 345, 486 phrase-region pairs.
Ranked #4 on Referring Expression Segmentation on PhraseCut
1 code implementation • 6 Apr 2020 • Yinan Zhao, Brian Price, Scott Cohen, Danna Gurari
We demonstrate how to increase overall model capacity to achieve improved performance, by introducing objectness, which is class-agnostic and so not prone to overfitting, for complementary use with class-specific features.
no code implementations • 25 Mar 2020 • Peng Zhou, Brian Price, Scott Cohen, Gregg Wilensky, Larry S. Davis
In this paper, we target refining the boundaries in high resolution images given low resolution masks.
3 code implementations • 17 Mar 2020 • Marco Forte, Brian Price, Scott Cohen, Ning Xu, François Pitié
We propose a novel interactive architecture and a novel training scheme that are both tailored to better exploit the user workflow.
3 code implementations • 5 Sep 2019 • Brian Davis, Bryan Morse, Scott Cohen, Brian Price, Chris Tensmeyer
Automatic, template-free extraction of information from form images is challenging due to the variety of form layouts.
no code implementations • ICCV 2019 • Yinan Zhao, Brian Price, Scott Cohen, Danna Gurari
Many people search for foreground objects to use when editing images.
1 code implementation • 5 Aug 2019 • Kushal Kafle, Robik Shrestha, Brian Price, Scott Cohen, Christopher Kanan
Chart question answering (CQA) is a newly proposed visual question answering (VQA) task where an algorithm must answer questions about data visualizations, e. g. bar charts, pie charts, and line graphs.
no code implementations • 7 Jun 2019 • Charles Chen, Ruiyi Zhang, Eunyee Koh, Sungchul Kim, Scott Cohen, Tong Yu, Ryan Rossi, Razvan Bunescu
In this work, we investigate the problem of figure captioning where the goal is to automatically generate a natural language description of the figure.
1 code implementation • Eurographics 2019 - Short Papers 2019 • Mahmoud Afifi, Brian Price, Scott Cohen, and Michael S. Brown
We present a method to perform automatic image recoloring based on the distribution of colors associated with objects present in an image.
4 code implementations • ECCV 2018 • Ning Xu, Linjie Yang, Yuchen Fan, Jianchao Yang, Dingcheng Yue, Yuchen Liang, Brian Price, Scott Cohen, Thomas Huang
End-to-end sequential learning to explore spatial-temporal features for video segmentation is largely limited by the scale of available video segmentation datasets, i. e., even the largest video segmentation dataset only contains 90 short video clips.
Ranked #12 on Video Object Segmentation on YouTube-VOS 2018 (F-Measure (Unseen) metric)
1 code implementation • ECCV 2018 • Curtis Wigington, Chris Tensmeyer, Brian Davis, William Barrett, Brian Price, Scott Cohen
Despite decades of research, offline handwriting recognition (HWR) of degraded historical documents remains a challenging problem, which if solved could greatly improve the searchability of online cultural heritage archives.
Ranked #13 on Handwritten Text Recognition on IAM
no code implementations • ECCV 2018 • Hoang Le, Long Mai, Brian Price, Scott Cohen, Hailin Jin, Feng Liu
Instead of relying on pre-defined low-level image features, our method adaptively predicts object boundaries according to image content and user interactions.
no code implementations • ECCV 2018 • Yufei Wang, Zhe Lin, Xiaohui Shen, Jianming Zhang, Scott Cohen
Then, we refine and extend the embedding network to predict an attention map, using a curated dataset with bounding box annotations on 750 concepts.
no code implementations • 22 Mar 2018 • Yinan Zhao, Brian Price, Scott Cohen, Danna Gurari
Deep generative models have shown success in automatically synthesizing missing image regions using surrounding context.
1 code implementation • CVPR 2018 • Ruotian Luo, Brian Price, Scott Cohen, Gregory Shakhnarovich
One property that remains lacking in image captions generated by contemporary methods is discriminability: being able to tell two images apart given the caption for one of them.
1 code implementation • CVPR 2018 • Kushal Kafle, Brian Price, Scott Cohen, Christopher Kanan
Bar charts are an effective way to convey numeric information, but today's algorithms cannot parse them.
1 code implementation • Pacific Graphics 2017 • Rang Nguyen, Brian Price, Scott Cohen, and Michael S. Brown
Methods such as color transfer are effective in making an image share similar colors with a target image; however, color transfer is not suitable for modifying multiple images.
no code implementations • 2 Jul 2017 • Ning Xu, Brian Price, Scott Cohen, Jimei Yang, Thomas Huang
In this paper, we propose a novel segmentation approach that uses a rectangle as a soft constraint by transforming it into an Euclidean distance map.
no code implementations • CVPR 2017 • Huixuan Tang, Scott Cohen, Brian Price, Stephen Schiller, Kiriakos N. Kutulakos
We consider the problem of two-frame depth from defocus in conditions unsuitable for existing methods yet typical of everyday photography: a handheld cellphone camera, a small aperture, a non-stationary scene and sparse surface texture.
no code implementations • CVPR 2017 • Ji Zhang, Mohamed Elhoseiny, Scott Cohen, Walter Chang, Ahmed Elgammal
We demonstrate the ability of our Rel-PN to localize relationships with only a few thousand proposals.
no code implementations • CVPR 2017 • Yufei Wang, Zhe Lin, Xiaohui Shen, Scott Cohen, Garrison W. Cottrell
Furthermore, our algorithm can generate descriptions with varied length, benefiting from the separate control of the skeleton and attributes.
no code implementations • CVPR 2017 • Yu-Wei Chao, Jimei Yang, Brian Price, Scott Cohen, Jia Deng
This paper presents the first study on forecasting human dynamics from static images.
8 code implementations • CVPR 2017 • Ning Xu, Brian Price, Scott Cohen, Thomas Huang
We evaluate our algorithm on the image matting benchmark, our testing set, and a wide variety of real images.
no code implementations • NeurIPS 2016 • Peng Wang, Xiaohui Shen, Bryan Russell, Scott Cohen, Brian Price, Alan L. Yuille
This paper introduces an approach to regularize 2. 5D surface normal and depth predictions at each pixel given a single input image.
1 code implementation • 8 Jun 2016 • Paul Hongsuck Seo, Zhe Lin, Scott Cohen, Xiaohui Shen, Bohyung Han
We propose a novel attention model that can accurately attends to target objects of various scales and shapes in images.
no code implementations • CVPR 2016 • Dongliang Cheng, Abdelrahman Abdelhamed, Brian Price, Scott Cohen, Michael S. Brown
Existing methods attempt to estimate a spatially varying illumination map, however, results are error prone and the resulting illumination maps are too low-resolution to be used for proper spatially varying white-balance correction.
no code implementations • CVPR 2016 • Jie Feng, Brian Price, Scott Cohen, Shih-Fu Chang
While these methods achieve better results than color-based methods, they are still limited in either using depth as an additional color channel or simply combining depth with color in a linear way.
no code implementations • WS 2016 • Mohamed Elhoseiny, Scott Cohen, Walter Chang, Brian Price, Ahmed Elgammal
Motivated by the application of fact-level image understanding, we present an automatic method for data collection of structured visual facts from images with captions.
3 code implementations • CVPR 2016 • Jimei Yang, Brian Price, Scott Cohen, Honglak Lee, Ming-Hsuan Yang
We develop a deep learning algorithm for contour detection with a fully convolutional encoder-decoder network.
3 code implementations • CVPR 2016 • Ning Xu, Brian Price, Scott Cohen, Jimei Yang, Thomas Huang
Interactive object selection is a very important research problem and has many applications.
Ranked #11 on Interactive Segmentation on SBD
no code implementations • ICCV 2015 • Dongliang Cheng, Brian Price, Scott Cohen, Michael S. Brown
A limitation in color constancy research is the inability to establish ground truth colors for evaluating corrected images.
no code implementations • 16 Nov 2015 • Mohamed Elhoseiny, Scott Cohen, Walter Chang, Brian Price, Ahmed Elgammal
We show that learning visual facts in a structured way enables not only a uniform but also generalizable visual understanding.
no code implementations • CVPR 2015 • Peng Wang, Xiaohui Shen, Zhe Lin, Scott Cohen, Brian Price, Alan L. Yuille
By allowing for interactions between the depth and semantic information, the joint network provides more accurate depth prediction than a state-of-the-art CNN trained solely for depth prediction [5].
no code implementations • CVPR 2015 • Dongliang Cheng, Brian Price, Scott Cohen, Michael S. Brown
More recent state-of-the-art methods employ learning-based techniques that produce better results, but often rely on complex features and have long evaluation and training times.
no code implementations • CVPR 2015 • Jimei Yang, Brian Price, Scott Cohen, Zhe Lin, Ming-Hsuan Yang
The transferred local shape masks constitute a patch-level segmentation solution space and we thus develop a novel cascade algorithm, PatchCut, for coarse-to-fine object segmentation.
no code implementations • ICCV 2015 • Peng Wang, Xiaohui Shen, Zhe Lin, Scott Cohen, Brian Price, Alan Yuille
Segmenting semantic objects from images and parsing them into their respective semantic parts are fundamental steps towards detailed object understanding in computer vision.
no code implementations • CVPR 2014 • Jimei Yang, Brian Price, Scott Cohen, Ming-Hsuan Yang
This paper presents a scalable scene parsing algorithm based on image retrieval and superpixel matching.
no code implementations • CVPR 2014 • Ejaz Ahmed, Scott Cohen, Brian Price
With the tag provided by the user we do a text query of an image database to gather exemplars of the object.
no code implementations • CVPR 2013 • Zhuoyuan Chen, Hailin Jin, Zhe Lin, Scott Cohen, Ying Wu
We use approximate nearest neighbor fields to compute an initial motion field and use a robust algorithm to compute a set of similarity transformations as the motion candidates for segmentation.
no code implementations • CVPR 2013 • Ehsan Shahrian, Deepu Rajan, Brian Price, Scott Cohen
The first is that the range in which the foreground and background are sampled is often limited to such an extent that the true foreground and background colors are not present.
no code implementations • CVPR 2013 • Jianchao Yang, Zhe Lin, Scott Cohen
Extensive experiments on benchmark and realworld images demonstrate that our algorithm can produce natural-looking results with sharp edges and preserved fine details, while the current state-of-the-art algorithms are prone to visual artifacts.