1 code implementation • CVPR 2022 • Liangzhe Yuan, Rui Qian, Yin Cui, Boqing Gong, Florian Schroff, Ming-Hsuan Yang, Hartwig Adam, Ting Liu
Modern self-supervised learning algorithms typically enforce persistency of instance representations across views.
1 code implementation • 6 Jul 2023 • Liangzhe Yuan, Nitesh Bharadwaj Gundavarapu, Long Zhao, Hao Zhou, Yin Cui, Lu Jiang, Xuan Yang, Menglin Jia, Tobias Weyand, Luke Friedman, Mikhail Sirotenko, Huisheng Wang, Florian Schroff, Hartwig Adam, Ming-Hsuan Yang, Ting Liu, Boqing Gong
We evaluate existing foundation models video understanding capabilities using a carefully designed experiment protocol consisting of three hallmark tasks (action recognition, temporal localization, and spatiotemporal localization), eight datasets well received by the community, and four adaptation methods tailoring a foundation model (FM) for a downstream task.
4 code implementations • CVPR 2021 • Rui Qian, Tianjian Meng, Boqing Gong, Ming-Hsuan Yang, Huisheng Wang, Serge Belongie, Yin Cui
Our representations are learned using a contrastive loss, where two augmented clips from the same short video are pulled together in the embedding space, while clips from different videos are pushed away.
Ranked #1 on Self-Supervised Action Recognition on Kinetics-600
2 code implementations • ICCV 2021 • Yinxiao Li, Pengchong Jin, Feng Yang, Ce Liu, Ming-Hsuan Yang, Peyman Milanfar
Most video super-resolution methods focus on restoring high-resolution video frames from low-resolution videos without taking into account compression.
1 code implementation • NeurIPS 2023 • Meng Liu, Mingda Zhang, Jialu Liu, Hanjun Dai, Ming-Hsuan Yang, Shuiwang Ji, Zheyun Feng, Boqing Gong
In this paper, we present a novel problem, namely video timeline modeling.
32 code implementations • 2 Apr 2019 • Shang-Hua Gao, Ming-Ming Cheng, Kai Zhao, Xin-Yu Zhang, Ming-Hsuan Yang, Philip Torr
We evaluate the Res2Net block on all these models and demonstrate consistent performance gains over baseline models on widely-used datasets, e. g., CIFAR-100 and ImageNet.
Ranked #2 on Image Classification on GasHisSDB
12 code implementations • ECCV 2018 • Yijun Li, Ming-Yu Liu, Xueting Li, Ming-Hsuan Yang, Jan Kautz
Photorealistic image stylization concerns transferring style of a reference photo to a content photo with the constraint that the stylized photo should remain photorealistic.
5 code implementations • CVPR 2019 • Wenbo Bao, Wei-Sheng Lai, Chao Ma, Xiaoyun Zhang, Zhiyong Gao, Ming-Hsuan Yang
The proposed model then warps the input frames, depth maps, and contextual features based on the optical flow and local interpolation kernels for synthesizing the output frame.
Ranked #5 on Video Frame Interpolation on Middlebury
1 code implementation • 22 Dec 2022 • Christoph Mayer, Martin Danelljan, Ming-Hsuan Yang, Vittorio Ferrari, Luc van Gool, Alina Kuznetsova
Our approach achieves a 4x faster run-time in case of 10 concurrent objects compared to tracking each object independently and outperforms existing single object trackers on our new benchmark.
1 code implementation • ICCV 2023 • Long Zhao, Liangzhe Yuan, Boqing Gong, Yin Cui, Florian Schroff, Ming-Hsuan Yang, Hartwig Adam, Ting Liu
To address this challenge, we propose UniVRD, a novel bottom-up method for Unified Visual Relationship Detection by leveraging vision and language models (VLMs).
Human-Object Interaction Detection Relationship Detection +2
5 code implementations • CVPR 2018 • Huaizu Jiang, Deqing Sun, Varun Jampani, Ming-Hsuan Yang, Erik Learned-Miller, Jan Kautz
Finally, the two input images are warped and linearly fused to form each intermediate frame.
2 code implementations • 2 Sep 2022 • Ling Yang, Zhilong Zhang, Yang song, Shenda Hong, Runsheng Xu, Yue Zhao, Yingxia Shao, Wentao Zhang, Bin Cui, Ming-Hsuan Yang
This survey aims to provide a contextualized, in-depth look at the state of diffusion models, identifying the key areas of focus and pointing to potential areas for further exploration.
2 code implementations • 23 Oct 2018 • Zhile Ren, Orazio Gallo, Deqing Sun, Ming-Hsuan Yang, Erik B. Sudderth, Jan Kautz
To date, top-performing optical flow estimation methods only take pairs of consecutive frames into account.
12 code implementations • ECCV 2020 • Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, Ling Shao
With the goal of recovering high-quality image content from its degraded version, image restoration enjoys numerous applications, such as in surveillance, computational photography, medical imaging, and remote sensing.
Ranked #5 on Spectral Reconstruction on ARAD-1K
8 code implementations • CVPR 2020 • Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, Ling Shao
This is mainly because the AWGN is not adequate for modeling the real camera noise which is signal-dependent and heavily transformed by the camera imaging pipeline.
Ranked #10 on Image Denoising on DND (using extra training data)
8 code implementations • CVPR 2021 • Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, Ling Shao
At each stage, we introduce a novel per-pixel adaptive design that leverages in-situ supervised attention to reweight the local features.
Ranked #3 on Spectral Reconstruction on ARAD-1K
11 code implementations • CVPR 2022 • Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang
Since convolutional neural networks (CNNs) perform well at learning generalizable image priors from large-scale data, these models have been extensively applied to image restoration and related tasks.
Ranked #1 on Grayscale Image Denoising on Urban100 sigma15
1 code implementation • 14 Jan 2021 • Weihao Xia, Yulun Zhang, Yujiu Yang, Jing-Hao Xue, Bolei Zhou, Ming-Hsuan Yang
GAN inversion aims to invert a given image back into the latent space of a pretrained GAN model, for the image to be faithfully reconstructed from the inverted code by the generator.
1 code implementation • CVPR 2020 • Yu-Lun Liu, Wei-Sheng Lai, Ming-Hsuan Yang, Yung-Yu Chuang, Jia-Bin Huang
We present a learning-based approach for removing unwanted obstructions, such as window reflections, fence occlusions or raindrops, from a short sequence of images captured by a moving camera.
1 code implementation • CVPR 2023 • Lijun Yu, Yong Cheng, Kihyuk Sohn, José Lezama, Han Zhang, Huiwen Chang, Alexander G. Hauptmann, Ming-Hsuan Yang, Yuan Hao, Irfan Essa, Lu Jiang
We introduce the MAsked Generative VIdeo Transformer, MAGVIT, to tackle various video synthesis tasks with a single model.
Ranked #1 on Video Prediction on Something-Something V2
12 code implementations • CVPR 2018 • Yi-Hsuan Tsai, Wei-Chih Hung, Samuel Schulter, Kihyuk Sohn, Ming-Hsuan Yang, Manmohan Chandraker
In this paper, we propose an adversarial learning method for domain adaptation in the context of semantic segmentation.
Ranked #3 on Domain Adaptation on Synscapes-to-Cityscapes
13 code implementations • ICLR 2018 • Wei-Chih Hung, Yi-Hsuan Tsai, Yan-Ting Liou, Yen-Yu Lin, Ming-Hsuan Yang
We propose a method for semi-supervised semantic segmentation using an adversarial network.
7 code implementations • ECCV 2018 • Hsin-Ying Lee, Hung-Yu Tseng, Jia-Bin Huang, Maneesh Kumar Singh, Ming-Hsuan Yang
Our model takes the encoded content features extracted from a given input and the attribute vectors sampled from the attribute space to produce diverse outputs at test time.
4 code implementations • 2 May 2019 • Hsin-Ying Lee, Hung-Yu Tseng, Qi Mao, Jia-Bin Huang, Yu-Ding Lu, Maneesh Singh, Ming-Hsuan Yang
In this work, we present an approach based on disentangled representation for generating diverse outputs without paired training images.
4 code implementations • 2 Jan 2023 • Huiwen Chang, Han Zhang, Jarred Barber, AJ Maschinot, Jose Lezama, Lu Jiang, Ming-Hsuan Yang, Kevin Murphy, William T. Freeman, Michael Rubinstein, Yuanzhen Li, Dilip Krishnan
Compared to pixel-space diffusion models, such as Imagen and DALL-E 2, Muse is significantly more efficient due to the use of discrete tokens and requiring fewer sampling iterations; compared to autoregressive models, such as Parti, Muse is more efficient due to the use of parallel decoding.
Ranked #1 on Text-to-Image Generation on MS-COCO (FID metric)
1 code implementation • 9 Dec 2021 • Lu Qi, Jason Kuen, Zhe Lin, Jiuxiang Gu, Fengyun Rao, Dian Li, Weidong Guo, Zhen Wen, Ming-Hsuan Yang, Jiaya Jia
To improve instance-level detection/segmentation performance, existing self-supervised and semi-supervised methods extract either task-unrelated or task-specific training signals from unlabeled data.
1 code implementation • 1 Aug 2022 • Lu Zhang, Lu Qi, Xu Yang, Hong Qiao, Ming-Hsuan Yang, Zhiyong Liu
In the first stage, we obtain a robust feature extractor, which could serve for all images with base and novel categories.
1 code implementation • 28 May 2023 • Lu Qi, Jason Kuen, Weidong Guo, Jiuxiang Gu, Zhe Lin, Bo Du, Yu Xu, Ming-Hsuan Yang
Despite the progress of image segmentation for accurate visual entity segmentation, completing the diverse requirements of image editing applications for different-level region-of-interest selections remains unsolved.
1 code implementation • 10 Nov 2022 • Lu Qi, Jason Kuen, Weidong Guo, Tiancheng Shen, Jiuxiang Gu, Jiaya Jia, Zhe Lin, Ming-Hsuan Yang
It improves mask prediction by fusing high-res image crops that provide more fine-grained image details and the full image.
1 code implementation • 6 Nov 2023 • Hao Zhou, Tiancheng Shen, Xu Yang, Hai Huang, Xiangtai Li, Lu Qi, Ming-Hsuan Yang
We benchmarked the proposed evaluation metrics on 12 open-vocabulary methods of three segmentation tasks.
1 code implementation • 4 Dec 2023 • Lu Qi, Lehan Yang, Weidong Guo, Yu Xu, Bo Du, Varun Jampani, Ming-Hsuan Yang
On the other hand, the progressive dichotomy module can efficiently decode the synthesized colormap to high-quality entity-level masks in a depth-first binary search without knowing the cluster numbers.
15 code implementations • NeurIPS 2017 • Yijun Li, Chen Fang, Jimei Yang, Zhaowen Wang, Xin Lu, Ming-Hsuan Yang
The whitening and coloring transforms reflect a direct matching of feature covariance of the content image to a given style image, which shares similar spirits with the optimization of Gram matrix based cost in neural style transfer.
3 code implementations • ICLR 2019 • Yunbo Wang, Lu Jiang, Ming-Hsuan Yang, Li-Jia Li, Mingsheng Long, Li Fei-Fei
We first evaluate the E3D-LSTM network on widely-used future video prediction datasets and achieve the state-of-the-art performance.
Ranked #1 on Video Prediction on KTH (Cond metric)
1 code implementation • 6 Nov 2023 • Hanoona Rasheed, Muhammad Maaz, Sahal Shaji Mullappilly, Abdelrahman Shaker, Salman Khan, Hisham Cholakkal, Rao M. Anwer, Erix Xing, Ming-Hsuan Yang, Fahad S. Khan
In this work, we present Grounding LMM (GLaMM), the first model that can generate natural language responses seamlessly intertwined with corresponding object segmentation masks.
2 code implementations • NeurIPS 2019 • Hsin-Ying Lee, Xiaodong Yang, Ming-Yu Liu, Ting-Chun Wang, Yu-Ding Lu, Ming-Hsuan Yang, Jan Kautz
In the analysis phase, we decompose a dance into a series of basic dance units, through which the model learns how to move.
Ranked #3 on Motion Synthesis on BRACE
1 code implementation • CVPR 2020 • Yu-Lun Liu, Wei-Sheng Lai, Yu-Sheng Chen, Yi-Lung Kao, Ming-Hsuan Yang, Yung-Yu Chuang, Jia-Bin Huang
We model the HDRto-LDR image formation pipeline as the (1) dynamic range clipping, (2) non-linear mapping from a camera response function, and (3) quantization.
2 code implementations • ICCV 2021 • Yu-Lun Liu, Wei-Sheng Lai, Ming-Hsuan Yang, Yung-Yu Chuang, Jia-Bin Huang
Existing video stabilization methods often generate visible distortion or require aggressive cropping of frame boundaries, resulting in smaller field of views.
1 code implementation • 25 Jul 2023 • Muhammad Awais, Muzammal Naseer, Salman Khan, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, Fahad Shahbaz Khan
Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world.
2 code implementations • CVPR 2019 • Qi Mao, Hsin-Ying Lee, Hung-Yu Tseng, Siwei Ma, Ming-Hsuan Yang
In this work, we propose a simple yet effective regularization term to address the mode collapse issue for cGANs.
Multimodal Unsupervised Image-To-Image Translation Translation
1 code implementation • ECCV 2018 • Wei-Sheng Lai, Jia-Bin Huang, Oliver Wang, Eli Shechtman, Ersin Yumer, Ming-Hsuan Yang
Our method takes the original unprocessed and per-frame processed videos as inputs to produce a temporally consistent video.
1 code implementation • 14 Aug 2018 • Xueting Li, Sifei Liu, Jan Kautz, Ming-Hsuan Yang
Recent arbitrary style transfer methods transfer second order statistics from reference image onto content image via a multiplication between content image features and a transformation matrix, which is computed from features with a pre-determined algorithm.
1 code implementation • 19 Apr 2022 • Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, Ling Shao
In the former case, spatial details are preserved but the contextual information cannot be precisely encoded.
1 code implementation • 8 Aug 2022 • Jean Lahoud, Jiale Cao, Fahad Shahbaz Khan, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Ming-Hsuan Yang
The success of the transformer architecture in natural language processing has recently triggered attention in the computer vision field.
2 code implementations • ECCV 2018 • Varun Jampani, Deqing Sun, Ming-Yu Liu, Ming-Hsuan Yang, Jan Kautz
Superpixels provide an efficient low/mid-level representation of image data, which greatly reduces the number of image primitives for subsequent vision tasks.
2 code implementations • ICCV 2023 • Abdelrahman Shaker, Muhammad Maaz, Hanoona Rasheed, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan
Using our proposed efficient additive attention, we build a series of models called "SwiftFormer" which achieves state-of-the-art performance in terms of both accuracy and mobile inference speed.
1 code implementation • CVPR 2020 • Hang Dong, Jinshan Pan, Lei Xiang, Zhe Hu, Xinyi Zhang, Fei Wang, Ming-Hsuan Yang
To address the issue of preserving spatial information in the U-Net architecture, we design a dense feature fusion module using the back-projection feedback scheme.
Ranked #9 on Image Dehazing on Haze4k
1 code implementation • ICLR 2020 • Hung-Yu Tseng, Hsin-Ying Lee, Jia-Bin Huang, Ming-Hsuan Yang
Few-shot classification aims to recognize novel categories with only few labeled images in each class.
Ranked #6 on Cross-Domain Few-Shot on CUB
1 code implementation • ICLR 2022 • Chieh Hubert Lin, Hsin-Ying Lee, Yen-Chi Cheng, Sergey Tulyakov, Ming-Hsuan Yang
We present a novel framework, InfinityGAN, for arbitrary-sized image generation.
Ranked #2 on Scene Generation on OSM
2 code implementations • CVPR 2017 • Yijun Li, Sifei Liu, Jimei Yang, Ming-Hsuan Yang
In this paper, we propose an effective face completion algorithm using a deep generative model.
1 code implementation • ICLR 2022 • Hwanjun Song, Deqing Sun, Sanghyuk Chun, Varun Jampani, Dongyoon Han, Byeongho Heo, Wonjae Kim, Ming-Hsuan Yang
Transformers are transforming the landscape of computer vision, especially for recognition tasks.
Ranked #12 on Object Detection on COCO 2017 val
1 code implementation • 17 Apr 2022 • Hwanjun Song, Deqing Sun, Sanghyuk Chun, Varun Jampani, Dongyoon Han, Byeongho Heo, Wonjae Kim, Ming-Hsuan Yang
Transformers have been widely used in numerous vision problems especially for visual recognition and detection.
1 code implementation • 22 Nov 2021 • Muhammad Maaz, Hanoona Rasheed, Salman Khan, Fahad Shahbaz Khan, Rao Muhammad Anwer, Ming-Hsuan Yang
This has been a long-standing question in computer vision.
2 code implementations • 8 Dec 2022 • Abdelrahman Shaker, Muhammad Maaz, Hanoona Rasheed, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan
Owing to the success of transformer models, recent works study their applicability in 3D medical segmentation tasks.
2 code implementations • CVPR 2018 • Hang Su, Varun Jampani, Deqing Sun, Subhransu Maji, Evangelos Kalogerakis, Ming-Hsuan Yang, Jan Kautz
We present a network architecture for processing point clouds that directly operates on a collection of points represented as a sparse set of samples in a high-dimensional lattice.
Ranked #30 on Semantic Segmentation on ScanNet
2 code implementations • 20 Mar 2022 • Runsheng Xu, Hao Xiang, Zhengzhong Tu, Xin Xia, Ming-Hsuan Yang, Jiaqi Ma
In this paper, we investigate the application of Vehicle-to-Everything (V2X) communication to improve the perception performance of autonomous vehicles.
Ranked #1 on 3D Object Detection on V2XSet
1 code implementation • ECCV 2020 • Xueting Li, Sifei Liu, Kihwan Kim, Shalini De Mello, Varun Jampani, Ming-Hsuan Yang, Jan Kautz
To the best of our knowledge, we are the first to try and solve the single-view reconstruction problem without a category-specific template mesh or semantic keypoints.
1 code implementation • CVPR 2019 • Wei-Chih Hung, Varun Jampani, Sifei Liu, Pavlo Molchanov, Ming-Hsuan Yang, Jan Kautz
Parts provide a good intermediate representation of objects that is robust with respect to the camera, pose and appearance variations.
Ranked #4 on Unsupervised Keypoint Estimation on CUB
3 code implementations • 26 Jan 2021 • Xiangyu Xu, Muchen Li, Wenxiu Sun, Ming-Hsuan Yang
We present a spatial pixel aggregation network and learn the pixel sampling and averaging strategies for image denoising.
7 code implementations • 4 Oct 2017 • Wei-Sheng Lai, Jia-Bin Huang, Narendra Ahuja, Ming-Hsuan Yang
However, existing methods often require a large number of network parameters and entail heavy computational loads at runtime for generating high-accuracy super-resolution results.
1 code implementation • CVPR 2021 • Jingkai Zhou, Varun Jampani, Zhixiong Pi, Qiong Liu, Ming-Hsuan Yang
Inspired by recent advances in attention, DDF decouples a depth-wise dynamic filter into spatial and channel dynamic filters.
Ranked #13 on Semantic Segmentation on MCubeS
1 code implementation • NeurIPS 2023 • Junyi Zhang, Charles Herrmann, Junhwa Hur, Luisa Polania Cabrera, Varun Jampani, Deqing Sun, Ming-Hsuan Yang
Text-to-image diffusion models have made significant advances in generating and editing high-quality images.
Ranked #3 on Semantic correspondence on SPair-71k
1 code implementation • 18 Jan 2024 • Shilin Xu, Haobo Yuan, Qingyu Shi, Lu Qi, Jingbo Wang, Yibo Yang, Yining Li, Kai Chen, Yunhai Tong, Bernard Ghanem, Xiangtai Li, Ming-Hsuan Yang
Segment Anything Model (SAM) is one remarkable model that can achieve generalized segmentation.
2 code implementations • ICCV 2023 • Muhammad Uzair Khattak, Syed Talal Wasim, Muzammal Naseer, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan
To the best of our knowledge, this is the first regularization framework for prompt learning that avoids overfitting by jointly attending to pre-trained model features, the training trajectory during prompting, and the textual diversity.
Ranked #2 on Prompt Engineering on ImageNet V2
1 code implementation • CVPR 2020 • Huan Wang, Yijun Li, Yuehai Wang, Haoji Hu, Ming-Hsuan Yang
In this work, we present a new knowledge distillation method (named Collaborative Distillation) for encoder-decoder based neural style transfer to reduce the convolutional filters.
1 code implementation • CVPR 2019 • Yijun Li, Chen Fang, Aaron Hertzmann, Eli Shechtman, Ming-Hsuan Yang
We propose a high-quality photo-to-pencil translation method with fine-grained control over the drawing style.
1 code implementation • ICCV 2017 • Jingchun Cheng, Yi-Hsuan Tsai, Shengjin Wang, Ming-Hsuan Yang
This paper proposes an end-to-end trainable network, SegFlow, for simultaneously predicting pixel-wise object segmentation and optical flow in videos.
Ranked #67 on Semi-Supervised Video Object Segmentation on DAVIS 2016
1 code implementation • CVPR 2020 • Yu-Ting Chang, Qiaosong Wang, Wei-Chih Hung, Robinson Piramuthu, Yi-Hsuan Tsai, Ming-Hsuan Yang
Existing weakly-supervised semantic segmentation methods using image-level annotations typically rely on initial responses to locate object regions.
2 code implementations • CVPR 2018 • Nian Liu, Junwei Han, Ming-Hsuan Yang
We formulate the proposed PiCANet in both global and local forms to attend to global and local contexts, respectively.
Ranked #7 on RGB Salient Object Detection on SOC
1 code implementation • ECCV 2018 • Sifei Liu, Guangyu Zhong, Shalini De Mello, Jinwei Gu, Varun Jampani, Ming-Hsuan Yang, Jan Kautz
Our approach is based on a temporal propagation network (TPN), which models the transition-related affinity between a pair of frames in a purely data-driven manner.
2 code implementations • NeurIPS 2019 • Xueting Li, Sifei Liu, Shalini De Mello, Xiaolong Wang, Jan Kautz, Ming-Hsuan Yang
Our learning process integrates two highly related tasks: tracking large image regions \emph{and} establishing fine-grained pixel-level associations between consecutive video frames.
1 code implementation • NeurIPS 2021 • Muzammal Naseer, Kanchana Ranasinghe, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang
We show and analyze the following intriguing properties of ViT: (a) Transformers are highly robust to severe occlusions, perturbations and domain shifts, e. g., retain as high as 60% top-1 accuracy on ImageNet even after randomly occluding 80% of the image content.
1 code implementation • 20 Oct 2018 • Wenbo Bao, Wei-Sheng Lai, Xiaoyun Zhang, Zhiyong Gao, Ming-Hsuan Yang
Recently, a number of data-driven frame interpolation methods based on convolutional neural networks have been proposed.
Ranked #22 on Video Frame Interpolation on Vimeo90K
1 code implementation • arXiv 2018 • Wenbo Bao, Wei-Sheng Lai, Xiaoyun Zhang, Zhiyong Gao, Ming-Hsuan Yang
In this work, we propose a motion estimation and motion compensation driven neural network for video frame interpolation.
Ranked #6 on Video Frame Interpolation on Middlebury
1 code implementation • CVPR 2021 • Hung-Yu Tseng, Lu Jiang, Ce Liu, Ming-Hsuan Yang, Weilong Yang
Recent years have witnessed the rapid progress of generative adversarial networks (GANs).
Ranked #1 on Image Generation on CIFAR-100
1 code implementation • ECCV 2020 • Cheng-Chun Hsu, Yi-Hsuan Tsai, Yen-Yu Lin, Ming-Hsuan Yang
A domain adaptive object detector aims to adapt itself to unseen domains that may contain variations of object appearance, viewpoints or backgrounds.
3 code implementations • 6 Jun 2021 • ShangHua Gao, Zhong-Yu Li, Ming-Hsuan Yang, Ming-Ming Cheng, Junwei Han, Philip Torr
In this work, we propose a new problem of large-scale unsupervised semantic segmentation (LUSS) with a newly created benchmark dataset to help the research progress.
Ranked #1 on Unsupervised Semantic Segmentation on ImageNet-S-300
1 code implementation • 7 Jul 2017 • Chao Ma, Jia-Bin Huang, Xiaokang Yang, Ming-Hsuan Yang
Second, we learn a correlation filter over a feature pyramid centered at the estimated target position for predicting scale changes.
2 code implementations • CVPR 2017 • Yi-Hsuan Tsai, Xiaohui Shen, Zhe Lin, Kalyan Sunkavalli, Xin Lu, Ming-Hsuan Yang
Compositing is one of the most common operations in photo editing.
1 code implementation • CVPR 2018 • Feng Li, Cheng Tian, WangMeng Zuo, Lei Zhang, Ming-Hsuan Yang
Compared with SRDCF, STRCF with hand-crafted features provides a 5 times speedup and achieves a gain of 5. 4% and 3. 6% AUC score on OTB-2015 and Temple-Color, respectively.
Ranked #9 on Visual Object Tracking on VOT2017/18
2 code implementations • 1 Dec 2021 • Kaihao Zhang, Tao Wang, Wenhan Luo, Boheng Chen, Wenqi Ren, Bjorn Stenger, Wei Liu, Hongdong Li, Ming-Hsuan Yang
Blur artifacts can seriously degrade the visual quality of images, and numerous deblurring methods have been proposed for specific scenarios.
2 code implementations • 18 Dec 2016 • Chao Ma, Chih-Yuan Yang, Xiaokang Yang, Ming-Hsuan Yang
Numerous single-image super-resolution algorithms have been proposed in the literature, but few studies address the problem of performance evaluation based on visual perception.
Ranked #7 on Video Quality Assessment on MSU SR-QA Dataset
2 code implementations • 27 Jul 2018 • Xinyi Zhang, Hang Dong, Zhe Hu, Wei-Sheng Lai, Fei Wang, Ming-Hsuan Yang
Single-image super-resolution is a fundamental task for vision applications to enhance the image quality with respect to spatial resolution.
1 code implementation • 13 Dec 2023 • Xiaoyu Zhou, Zhiwei Lin, Xiaojun Shan, Yongtao Wang, Deqing Sun, Ming-Hsuan Yang
We present DrivingGaussian, an efficient and effective framework for surrounding dynamic autonomous driving scenes.
1 code implementation • CVPR 2022 • Akshay Dudhane, Syed Waqas Zamir, Salman Khan, Fahad Shahbaz Khan, Ming-Hsuan Yang
Our central idea is to create a set of pseudo-burst features that combine complementary information from all the input burst frames to seamlessly exchange information.
Ranked #2 on Burst Image Super-Resolution on BurstSR
3 code implementations • CVPR 2016 • Jimei Yang, Brian Price, Scott Cohen, Honglak Lee, Ming-Hsuan Yang
We develop a deep learning algorithm for contour detection with a fully convolutional encoder-decoder network.
1 code implementation • 24 Oct 2019 • Han-Kai Hsu, Chun-Han Yao, Yi-Hsuan Tsai, Wei-Chih Hung, Hung-Yu Tseng, Maneesh Singh, Ming-Hsuan Yang
This intermediate domain is constructed by translating the source images to mimic the ones in the target domain.
1 code implementation • CVPR 2023 • Akshay Dudhane, Syed Waqas Zamir, Salman Khan, Fahad Shahbaz Khan, Ming-Hsuan Yang
Unlike existing methods, the proposed alignment module not only aligns burst features but also exchanges feature information and maintains focused communication with the reference frame through the proposed reference-based feature enrichment mechanism, which facilitates handling complex motions.
1 code implementation • CVPR 2022 • Zhihao Shi, Xiangyu Xu, Xiaohong Liu, Jun Chen, Ming-Hsuan Yang
Existing methods for video interpolation heavily rely on deep convolution neural networks, and thus suffer from their intrinsic limitations, such as content-agnostic kernel weights and restricted receptive field.
1 code implementation • CVPR 2023 • Gaoxiang Cong, Liang Li, Yuankai Qi, ZhengJun Zha, Qi Wu, Wenyu Wang, Bin Jiang, Ming-Hsuan Yang, Qingming Huang
Given a piece of text, a video clip and a reference audio, the movie dubbing (also known as visual voice clone V2C) task aims to generate speeches that match the speaker's emotion presented in the video using the desired speaker voice as reference.
1 code implementation • ECCV 2020 • Hung-Yu Tseng, Matthew Fisher, Jingwan Lu, Yijun Li, Vladimir Kim, Ming-Hsuan Yang
People often create art by following an artistic workflow involving multiple stages that inform the overall design.
2 code implementations • NeurIPS 2018 • Donghoon Lee, Sifei Liu, Jinwei Gu, Ming-Yu Liu, Ming-Hsuan Yang, Jan Kautz
Learning to insert an object instance into an image in a semantically coherent manner is a challenging and interesting problem.
1 code implementation • ECCV 2018 • Ji Zhu, Hua Yang, Nian Liu, Minyoung Kim, Wenjun Zhang, Ming-Hsuan Yang
In this paper, we propose an online Multi-Object Tracking (MOT) approach which integrates the merits of single object tracking and data association methods in a unified framework to handle noisy detections and frequent interactions between targets.
Ranked #5 on Online Multi-Object Tracking on MOT16
1 code implementation • 20 Nov 2019 • Arda Senocak, Tae-Hyun Oh, Junsik Kim, Ming-Hsuan Yang, In So Kweon
Visual events are usually accompanied by sounds in our daily lives.
1 code implementation • 12 Oct 2017 • Dong Li, Jia-Bin Huang, Ya-Li Li, Shengjin Wang, Ming-Hsuan Yang
In classification adaptation, we transfer a pre-trained network to a multi-label classification task for recognizing the presence of a certain object in an image.
1 code implementation • 13 Jun 2019 • Yun-Chun Chen, Yen-Yu Lin, Ming-Hsuan Yang, Jia-Bin Huang
In contrast to existing algorithms that tackle the tasks of semantic matching and object co-segmentation in isolation, our method exploits the complementary nature of the two tasks.
1 code implementation • 12 Aug 2020 • Wenqing Chu, Wei-Chih Hung, Yi-Hsuan Tsai, Yu-Ting Chang, Yijun Li, Deng Cai, Ming-Hsuan Yang
Caricature is an artistic drawing created to abstract or exaggerate facial features of a person.
1 code implementation • 12 Dec 2023 • Jiangning Zhang, Xuhai Chen, Yabiao Wang, Chengjie Wang, Yong liu, Xiangtai Li, Ming-Hsuan Yang, DaCheng Tao
Following this spirit, this paper explores plain ViT architecture for MUAD.
1 code implementation • ECCV 2018 • Yijun Li, Chen Fang, Jimei Yang, Zhaowen Wang, Xin Lu, Ming-Hsuan Yang
Existing video prediction methods mainly rely on observing multiple historical frames or focus on predicting the next one-frame.
1 code implementation • ECCV 2018 • Xiankai Lu, Chao Ma, Bingbing Ni, Xiaokang Yang, Ian Reid, Ming-Hsuan Yang
Regression trackers directly learn a mapping from regularly dense samples of target objects to soft labels, which are usually generated by a Gaussian function, to estimate target positions.
1 code implementation • 11 Aug 2020 • Yu-Lun Liu, Wei-Sheng Lai, Ming-Hsuan Yang, Yung-Yu Chuang, Jia-Bin Huang
We present a learning-based approach for removing unwanted obstructions, such as window reflections, fence occlusions, or adherent raindrops, from a short sequence of images captured by a moving camera.
1 code implementation • 13 May 2019 • Wenqing Chu, Wei-Chih Hung, Yi-Hsuan Tsai, Deng Cai, Ming-Hsuan Yang
However, current state-of-the-art face parsing methods require large amounts of labeled data on the pixel-level and such process for caricature is tedious and labor-intensive.
1 code implementation • CVPR 2022 • Hanhua Ye, Guorong Li, Yuankai Qi, Shuhui Wang, Qingming Huang, Ming-Hsuan Yang
(II) Predicate level, which learns the actions conditioned on highlighted objects and is supervised by the predicate in captions.
1 code implementation • 3 Jun 2019 • Jathushan Rajasegaran, Munawar Hayat, Salman Khan, Fahad Shahbaz Khan, Ling Shao, Ming-Hsuan Yang
In a conventional supervised learning setting, a machine learning model has access to examples of all object classes that are desired to be recognized during the inference stage.
1 code implementation • 13 Apr 2020 • Hung-Yu Tseng, Yi-Wen Chen, Yi-Hsuan Tsai, Sifei Liu, Yen-Yu Lin, Ming-Hsuan Yang
With the growing attention on learning-to-learn new tasks using only a few examples, meta-learning has been widely used in numerous problems such as few-shot classification, reinforcement learning, and domain generalization.
1 code implementation • 3 Jan 2023 • Xiangtai Li, Shilin Xu, Yibo Yang, Haobo Yuan, Guangliang Cheng, Yunhai Tong, Zhouchen Lin, Ming-Hsuan Yang, DaCheng Tao
Third, inspired by Mask2Former, based on our meta-architecture, we propose Panoptic-PartFormer++ and design a new part-whole cross-attention scheme to boost part segmentation qualities further.
1 code implementation • ECCV 2018 • Wei-Chih Hung, Jianming Zhang, Xiaohui Shen, Zhe Lin, Joon-Young Lee, Ming-Hsuan Yang
Specifically, given a foreground image and a background image, our proposed method automatically generates a set of blending photos with scores that indicate the aesthetics quality with the proposed quality network and policy network.
1 code implementation • 21 Nov 2022 • Ling Yang, Zhilin Huang, Yang song, Shenda Hong, Guohao Li, Wentao Zhang, Bin Cui, Bernard Ghanem, Ming-Hsuan Yang
Generating images from graph-structured inputs, such as scene graphs, is uniquely challenging due to the difficulty of aligning nodes and connections in graphs with objects and their relations in images.
1 code implementation • ICCV 2017 • Hsin-Ying Lee, Jia-Bin Huang, Maneesh Singh, Ming-Hsuan Yang
We present an unsupervised representation learning approach using videos without semantic labels.
Ranked #46 on Self-Supervised Action Recognition on HMDB51
1 code implementation • 2 Nov 2020 • Qi Mao, Hung-Yu Tseng, Hsin-Ying Lee, Jia-Bin Huang, Siwei Ma, Ming-Hsuan Yang
Generating a smooth sequence of intermediate results bridges the gap of two different domains, facilitating the morphing effect across domains.
1 code implementation • 2 Feb 2021 • Xiangyu Xu, Yongrui Ma, Wenxiu Sun, Ming-Hsuan Yang
In this paper, we study the problem of real-scene single image super-resolution to bridge the gap between synthetic data and real captured images.
1 code implementation • NeurIPS 2019 • Xiangyu Xu, Li Si-Yao, Wenxiu Sun, Qian Yin, Ming-Hsuan Yang
Video interpolation is an important problem in computer vision, which helps overcome the temporal limitation of camera sensors.
1 code implementation • 26 Mar 2024 • Abdelrahman Shaker, Syed Talal Wasim, Martin Danelljan, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan
Recently, transformer-based approaches have shown promising results for semi-supervised video object segmentation.
1 code implementation • CVPR 2018 • Jiawei Zhang, Jinshan Pan, Jimmy Ren, Yibing Song, Linchao Bao, Rynson W. H. Lau, Ming-Hsuan Yang
The proposed network is composed of three deep convolutional neural networks (CNNs) and a recurrent neural network (RNN).
Ranked #10 on Deblurring on RealBlur-R (trained on GoPro) (SSIM (sRGB) metric)
1 code implementation • ICCV 2021 • Hsin-Ping Huang, Hung-Yu Tseng, Saurabh Saini, Maneesh Singh, Ming-Hsuan Yang
Second, we develop point cloud aggregation modules to gather the style information of the 3D scene, and then modulate the features in the point cloud with a linear transformation matrix.
1 code implementation • 23 Aug 2022 • Chun-Han Yao, Jimei Yang, Duygu Ceylan, Yi Zhou, Yang Zhou, Ming-Hsuan Yang
An alternative approach is to estimate dense vertices of a predefined template body in the image space.
1 code implementation • 12 Apr 2018 • Dongwei Ren, WangMeng Zuo, David Zhang, Lei Zhang, Ming-Hsuan Yang
For blind deconvolution, as estimation error of blur kernel is usually introduced, the subsequent non-blind deconvolution process does not restore the latent image well.
2 code implementations • 15 Dec 2018 • Nian Liu, Junwei Han, Ming-Hsuan Yang
We propose three specific formulations of the PiCANet via embedding the pixel-wise contextual attention mechanism into the pooling and convolution operations with attending to global or local contexts.
2 code implementations • 30 Nov 2023 • Hung-Yu Tseng, Hsin-Ying Lee, Ming-Hsuan Yang
Contents generated by recent advanced Text-to-Image (T2I) diffusion models are sometimes too imaginative for existing off-the-shelf dense predictors to estimate due to the immitigable domain gap.
2 code implementations • 5 Oct 2017 • Shun Zhang, Jia-Bin Huang, Jongwoo Lim, Yihong Gong, Jinjun Wang, Narendra Ahuja, Ming-Hsuan Yang
Multi-face tracking in unconstrained videos is a challenging problem as faces of one person often appear drastically different in multiple shots due to significant variations in scale, pose, expression, illumination, and make-up.
1 code implementation • 12 Jul 2017 • Chao Ma, Jia-Bin Huang, Xiaokang Yang, Ming-Hsuan Yang
Specifically, we learn adaptive correlation filters on the outputs from each convolutional layer to encode the target appearance.
1 code implementation • 12 Dec 2022 • Zhiwei Lin, Yongtao Wang, Shengxiang Qi, Nan Dong, Ming-Hsuan Yang
Based on the property of outdoor point clouds in autonomous driving scenarios, i. e., the point clouds of distant objects are more sparse, we propose point density prediction to enable the 3D encoder to learn location information, which is essential for object detection.
1 code implementation • 28 Nov 2023 • Yuanze Lin, Yi-Wen Chen, Yi-Hsuan Tsai, Lu Jiang, Ming-Hsuan Yang
Language has emerged as a natural interface for image editing.
1 code implementation • 10 Oct 2019 • Yi-Wen Chen, Yi-Hsuan Tsai, Tiantian Wang, Yen-Yu Lin, Ming-Hsuan Yang
To this end, we propose an end-to-end trainable comprehension network that consists of the language and visual encoders to extract feature representations from both domains.
Ranked #19 on Referring Expression Segmentation on RefCOCO testB
1 code implementation • 2 Mar 2020 • Xinyi Zhang, Hang Dong, Zhe Hu, Wei-Sheng Lai, Fei Wang, Ming-Hsuan Yang
To address this problem, we propose a dual-branch convolutional neural network to extract base features and recovered features separately.
1 code implementation • 5 Apr 2022 • An-Chieh Cheng, Xueting Li, Sifei Liu, Min Sun, Ming-Hsuan Yang
With the capacity of modeling long-range dependencies in sequential data, transformers have shown remarkable performances in a variety of generative tasks such as image, audio, and text generation.
1 code implementation • ICCV 2023 • Xin Li, Yuqing Huang, Zhenyu He, YaoWei Wang, Huchuan Lu, Ming-Hsuan Yang
Existing visual tracking methods typically take an image patch as the reference of the target to perform tracking.
1 code implementation • 28 Nov 2023 • Junyi Zhang, Charles Herrmann, Junhwa Hur, Eric Chen, Varun Jampani, Deqing Sun, Ming-Hsuan Yang
This paper identifies the importance of being geometry-aware for semantic correspondence and reveals a limitation of the features of current foundation models under simple post-processing.
Ranked #1 on Semantic correspondence on PF-PASCAL
1 code implementation • NeurIPS 2023 • Cheng-Ju Ho, Chen-Hsuan Tai, Yen-Yu Lin, Ming-Hsuan Yang, Yi-Hsuan Tsai
Semi-supervised object detection is crucial for 3D scene understanding, efficiently addressing the limitation of acquiring large-scale 3D bounding box annotations.
1 code implementation • 13 Dec 2023 • Kuan-Chih Huang, Weijie Lyu, Ming-Hsuan Yang, Yi-Hsuan Tsai
Recent temporal LiDAR-based 3D object detectors achieve promising performance based on the two-stage proposal-based approach.
1 code implementation • 2 Apr 2024 • Akshay Dudhane, Omkar Thawakar, Syed Waqas Zamir, Salman Khan, Fahad Shahbaz Khan, Ming-Hsuan Yang
All-in-one image restoration tackles different types of degradations with a unified model instead of having task-specific, non-generic models for each degradation.
1 code implementation • 20 Nov 2023 • Yuheng Liu, Xinke Li, Xueting Li, Lu Qi, Chongshou Li, Ming-Hsuan Yang
Directly transferring the 2D techniques to 3D scene generation is challenging due to significant resolution reduction and the scarcity of comprehensive real-world 3D scene datasets.
1 code implementation • CVPR 2023 • Yunhao Ge, Jie Ren, Andrew Gallagher, Yuxiao Wang, Ming-Hsuan Yang, Hartwig Adam, Laurent Itti, Balaji Lakshminarayanan, Jiaping Zhao
We also show that our method improves across ImageNet shifted datasets, four other datasets, and other model architectures such as LiT.
1 code implementation • CVPR 2023 • Botao Ye, Sifei Liu, Xueting Li, Ming-Hsuan Yang
In this work, we introduce a self-supervised super-plane constraint by exploring the free geometry cues from the predicted surface, which can further regularize the reconstruction of plane regions without any other ground truth annotations.
1 code implementation • ICCV 2023 • Yunhao Ge, Yuecheng Li, Shuo Ni, Jiaping Zhao, Ming-Hsuan Yang, Laurent Itti
Reprogramming parameters are task-specific and exclusive to each task, which makes our method immune to catastrophic forgetting.
1 code implementation • 20 May 2016 • Xing Wei, Qingxiong Yang, Yihong Gong, Ming-Hsuan Yang, Narendra Ahuja
Quantitative and qualitative evaluation on a number of computer vision applications was conducted, demonstrating that the proposed method is the top performer.
1 code implementation • ICCV 2023 • Kuan-Chih Huang, Ming-Hsuan Yang, Yi-Hsuan Tsai
In this paper, we find that the motion cue of objects along different time frames is critical in 3D multi-object tracking, which is less explored in existing monocular-based approaches.
1 code implementation • CVPR 2018 • Chong Sun, Dong Wang, Huchuan Lu, Ming-Hsuan Yang
To address this issue, we propose a novel CF-based optimization problem to jointly model the discrimination and reliability information.
1 code implementation • CVPR 2020 • Muhammad Abdullah Jamal, Matthew Brown, Ming-Hsuan Yang, Liqiang Wang, Boqing Gong
Object frequency in the real world often follows a power law, leading to a mismatch between datasets with long-tailed class distributions seen by a machine learning model and our expectation of the model to perform well on all classes.
Ranked #27 on Long-tail Learning on Places-LT
1 code implementation • 30 Mar 2020 • Junyi Feng, Songyuan Li, Xi Li, Fei Wu, Qi Tian, Ming-Hsuan Yang, Haibin Ling
Real-time semantic video segmentation is a challenging task due to the strict requirements of inference speed.
1 code implementation • ICCV 2017 • Wei-Chih Hung, Yi-Hsuan Tsai, Xiaohui Shen, Zhe Lin, Kalyan Sunkavalli, Xin Lu, Ming-Hsuan Yang
We present a scene parsing method that utilizes global context information based on both the parametric and non- parametric models.
1 code implementation • ICCV 2023 • Joungbin An, Hyolim Kang, Su Ho Han, Ming-Hsuan Yang, Seon Joo Kim
Online Action Detection (OAD) is the task of identifying actions in streaming videos without access to future frames.
Ranked #1 on Online Action Detection on TVSeries
1 code implementation • 5 Oct 2017 • Feng Li, Yingjie Yao, Peihua Li, David Zhang, WangMeng Zuo, Ming-Hsuan Yang
The aspect ratio variation frequently appears in visual tracking and has a severe influence on performance.
1 code implementation • 3 Apr 2024 • Zhongyu Xia, Zhiwei Lin, Xinhao Wang, Yongtao Wang, Yun Xing, Shengxiang Qi, Nan Dong, Ming-Hsuan Yang
Three-dimensional perception from multi-view cameras is a crucial component in autonomous driving systems, which involves multiple tasks like 3D object detection and bird's-eye-view (BEV) semantic segmentation.
1 code implementation • NeurIPS 2021 • Yi-Wen Chen, Yi-Hsuan Tsai, Ming-Hsuan Yang
Specifically, we adopt RGB images for appearance, optical flow for motion, and depth maps for image structure.
1 code implementation • ICCV 2023 • Amandeep Kumar, Ankan Kumar Bhunia, Sanath Narayan, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan
We present a method to efficiently generate 3D-aware high-resolution images that are view-consistent across multiple target views.
1 code implementation • ICCV 2021 • Yuankai Qi, Zizheng Pan, Yicong Hong, Ming-Hsuan Yang, Anton Van Den Hengel, Qi Wu
Vision-and-Language Navigation (VLN) requires an agent to find a path to a remote location on the basis of natural-language instructions and a set of photo-realistic panoramas.
1 code implementation • CVPR 2023 • Chun-Han Yao, Wei-Chih Hung, Yuanzhen Li, Michael Rubinstein, Ming-Hsuan Yang, Varun Jampani
Automatically estimating 3D skeleton, shape, camera viewpoints, and part articulation from sparse in-the-wild image ensembles is a severely under-constrained and challenging problem.
1 code implementation • 14 Aug 2023 • Yu-Ju Tsai, Yu-Lun Liu, Lu Qi, Kelvin C. K. Chan, Ming-Hsuan Yang
Restoring facial details from low-quality (LQ) images has remained a challenging problem due to its ill-posedness induced by various degradations in the wild.
Ranked #2 on Blind Face Restoration on WIDER
1 code implementation • 13 Dec 2021 • Xin Li, Qiao Liu, Wenjie Pei, Qiuhong Shen, YaoWei Wang, Huchuan Lu, Ming-Hsuan Yang
Along with the rapid progress of visual tracking, existing benchmarks become less informative due to redundancy of samples and weak discrimination between current trackers, making evaluations on all datasets extremely time-consuming.
1 code implementation • CVPR 2018 • Chong Sun, Dong Wang, Huchuan Lu, Ming-Hsuan Yang
Second, we propose a fully convolutional neural network with spatially regularized kernels, through which the filter kernel corresponding to each output channel is forced to focus on a specific region of the target.
Ranked #12 on Visual Object Tracking on VOT2017/18
1 code implementation • 19 Dec 2016 • Zhizhen Chi, Hongyang Li, Huchuan Lu, Ming-Hsuan Yang
In this paper, we propose a dual network to better utilize features among layers for visual tracking.
1 code implementation • 26 Apr 2021 • Yu-Chuan Su, Soravit Changpinyo, Xiangning Chen, Sathish Thoppay, Cho-Jui Hsieh, Lior Shapira, Radu Soricut, Hartwig Adam, Matthew Brown, Ming-Hsuan Yang, Boqing Gong
To enable progress on this task, we create a new dataset consisting of 220k human-annotated 2. 5D relationships among 512K objects from 11K images.
1 code implementation • 4 Jul 2022 • Zhiwei Lin, TingTing Liang, Taihong Xiao, Yongtao Wang, Zhi Tang, Ming-Hsuan Yang
To address this issue, we propose a neural architecture search method named FlowNAS to automatically find the better encoder architecture for flow estimation task.
1 code implementation • CVPR 2018 • Jufeng Yang, Dongyu She, Yu-Kun Lai, Paul L. Rosin, Ming-Hsuan Yang
The second branch utilizes both the holistic and localized information by coupling the sentiment map with deep features for robust classification.
1 code implementation • ECCV 2018 • Donghoon Lee, Sangdoo Yun, Sungjoon Choi, Hwiyeon Yoo, Ming-Hsuan Yang, Songhwai Oh
We introduce a new problem of generating an image based on a small number of key local patches without any geometric prior.
1 code implementation • 8 Jul 2020 • Xin-Yu Zhang, Taihong Xiao, HaoLin Jia, Ming-Ming Cheng, Ming-Hsuan Yang
In this work, we propose a simple yet effective meta-learning algorithm in semi-supervised learning.
1 code implementation • 4 Feb 2024 • Tao Wang, Wanglong Lu, Kaihao Zhang, Wenhan Luo, Tae-Kyun Kim, Tong Lu, Hongdong Li, Ming-Hsuan Yang
For the prompt generation, we first propose a prompt pre-training strategy to train a frequency prompt encoder that encodes the ground-truth image into LF and HF prompts.
1 code implementation • CVPR 2021 • Jie Cao, Luanxuan Hou, Ming-Hsuan Yang, Ran He, Zhenan Sun
We interpolate training samples at the feature level and propose a novel content loss based on the perceptual relations among samples.
1 code implementation • 19 Dec 2022 • Cheng-Ju Ho, Chen-Hsuan Tai, Yi-Hsuan Tsai, Yen-Yu Lin, Ming-Hsuan Yang
In this work, we propose an object-level point augmentor (OPA) that performs local transformations for semi-supervised 3D object detection.
1 code implementation • ICCV 2021 • Sanath Narayan, Hisham Cholakkal, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, Ling Shao
The proposed formulation comprises a discriminative and a denoising loss term for enhancing temporal action localization.
Ranked #3 on Weakly Supervised Action Localization on THUMOS’14
1 code implementation • 20 Apr 2021 • Yi-Wen Chen, Yi-Hsuan Tsai, Ming-Hsuan Yang
While prior work usually treats each sentence and attends it to an object separately, we focus on learning a referring expression comprehension model that considers the property in synonymous sentences.
1 code implementation • 28 Mar 2024 • Yuqing Huang, Xin Li, Zikun Zhou, YaoWei Wang, Zhenyu He, Ming-Hsuan Yang
Upon the PN tree memory, we develop corresponding walking rules for determining the state of the target and define a set of control flows to unite the tracker and the detector in different tracking scenarios.
1 code implementation • 8 Dec 2022 • Ziheng Yan, Yuankai Qi, Guorong Li, Xinyan Liu, Weigang Zhang, Qingming Huang, Ming-Hsuan Yang
Crowd counting is usually handled in a density map regression fashion, which is supervised via a L2 loss between the predicted density map and ground truth.
1 code implementation • Computer Vision and Image Understanding 2017 • Mohamed A. Naiel, M. Omair Ahmad, M.N.S. Swamy, Jongwoo Lim, Ming-Hsuan Yang
For each frame, we construct an association between detections and trackers, and treat each detected image region as a key sample, for online update, if it is associated to a tracker.
Ranked #1 on Online Multi-Object Tracking on Oxford Town Center
1 code implementation • CVPR 2017 • Wei-Sheng Lai, Jia-Bin Huang, Narendra Ahuja, Ming-Hsuan Yang
Convolutional neural networks have recently demonstrated high-quality reconstruction for single-image super-resolution.
Ranked #40 on Image Super-Resolution on BSD100 - 4x upscaling
1 code implementation • 4 Dec 2023 • Chen Zhang, Guorong Li, Yuankai Qi, Hanhua Ye, Laiyun Qing, Ming-Hsuan Yang, Qingming Huang
To address these limitations, we propose a Dynamic Erasing Network (DE-Net) for weakly supervised video anomaly detection, which learns multi-scale temporal features.
1 code implementation • NeurIPS 2021 • Yan-Bo Lin, Hung-Yu Tseng, Hsin-Ying Lee, Yen-Yu Lin, Ming-Hsuan Yang
The audio-visual video parsing task aims to temporally parse a video into audio or visual event categories.
1 code implementation • 19 Feb 2020 • Xin-Yu Zhang, Kai Zhao, Taihong Xiao, Ming-Ming Cheng, Ming-Hsuan Yang
Recent advances in convolutional neural networks(CNNs) usually come with the expense of excessive computational overhead and memory footprint.
1 code implementation • ECCV 2020 • Taihong Xiao, Jinwei Yuan, Deqing Sun, Qifei Wang, Xin-Yu Zhang, Kehan Xu, Ming-Hsuan Yang
Cost volume is an essential component of recent deep models for optical flow estimation and is usually constructed by calculating the inner product between two feature vectors.
1 code implementation • 27 Nov 2021 • Pin-Hung Kuo, Jinshan Pan, Shao-Yi Chien, Ming-Hsuan Yang
Most existing methods usually formulate the non-blind deconvolution problem into a maximum-a-posteriori framework and address it by manually designing kinds of regularization terms and data terms of the latent clear images.
1 code implementation • 7 Dec 2023 • Tiantian Wang, Xinxin Zuo, Fangzhou Mu, Jian Wang, Ming-Hsuan Yang
To overcome these limitations, we leverage Neural Radiance Fields (NeRFs) to represent videos, conducting stylization in the rendered feature space.
1 code implementation • 18 Jun 2013 • An Bian, Xiong Li, Yuncai Liu, Ming-Hsuan Yang
We show that: (1) PCDN is guaranteed to converge globally despite increasing parallelism; (2) PCDN converges to the specified accuracy $\epsilon$ within the limited iteration number of $T_\epsilon$, and $T_\epsilon$ decreases with increasing parallelism (bundle size $P$).
1 code implementation • 24 Nov 2020 • Yu-Ding Lu, Hsin-Ying Lee, Hung-Yu Tseng, Ming-Hsuan Yang
Interpretable generation process is beneficial to various image editing applications.
1 code implementation • 16 Feb 2024 • Divin Yan, Lu Qi, Vincent Tao Hu, Ming-Hsuan Yang, Meng Tang
To address the observed appearance overlap between synthesized images of rare classes and tail classes, we propose a method based on contrastive learning to minimize the overlap between distributions of synthetic images for different classes.
1 code implementation • 10 Dec 2023 • Xinyan Liu, Guorong Li, Yuankai Qi, Ziheng Yan, Zhenjun Han, Anton Van Den Hengel, Ming-Hsuan Yang, Qingming Huang
% To provide a more realistic reflection of the underlying practical challenge, we introduce a weakly supervised VIC task, wherein trajectory labels are not provided.
no code implementations • 15 May 2018 • Jinshan Pan, Wenqi Ren, Zhe Hu, Ming-Hsuan Yang
However, existing methods are less effective as only few edges can be restored from blurry face images for kernel estimation.
no code implementations • CVPR 2018 • Jinshan Pan, Sifei Liu, Deqing Sun, Jiawei Zhang, Yang Liu, Jimmy Ren, Zechao Li, Jinhui Tang, Huchuan Lu, Yu-Wing Tai, Ming-Hsuan Yang
These problems usually involve the estimation of two components of the target signals: structures and details.
no code implementations • 4 Oct 2017 • Chenglong Li, Liang Lin, WangMeng Zuo, Jin Tang, Ming-Hsuan Yang
First, the graph is initialized by assigning binary weights of some image patches to indicate the object and background patches according to the predicted bounding box.
no code implementations • CVPR 2018 • Yibing Song, Chao Ma, Xiaohe Wu, Lijun Gong, Linchao Bao, WangMeng Zuo, Chunhua Shen, Rynson Lau, Ming-Hsuan Yang
To augment positive samples, we use a generative network to randomly generate masks, which are applied to adaptively dropout input features to capture a variety of appearance changes.
no code implementations • CVPR 2018 • Lerenhan Li, Jinshan Pan, Wei-Sheng Lai, Changxin Gao, Nong Sang, Ming-Hsuan Yang
We present an effective blind image deblurring method based on a data-driven discriminative prior. Our work is motivated by the fact that a good image prior should favor clear images over blurred images. In this work, we formulate the image prior as a binary classifier which can be achieved by a deep convolutional neural network (CNN). The learned prior is able to distinguish whether an input image is clear or not. Embedded into the maximum a posterior (MAP) framework, it helps blind deblurring in various scenarios, including natural, face, text, and low-illumination images. However, it is difficult to optimize the deblurring method with the learned image prior as it involves a non-linear CNN. Therefore, we develop an efficient numerical approach based on the half-quadratic splitting method and gradient decent algorithm to solve the proposed model. Furthermore, the proposed model can be easily extended to non-uniform deblurring. Both qualitative and quantitative experimental results show that our method performs favorably against state-of-the-art algorithms as well as domain-specific image deblurring approaches.
no code implementations • CVPR 2018 • Wenqi Ren, Lin Ma, Jiawei Zhang, Jinshan Pan, Xiaochun Cao, Wei Liu, Ming-Hsuan Yang
The proposed algorithm hinges on an end-to-end trainable neural network that consists of an encoder and a decoder.
Ranked #23 on Image Dehazing on SOTS Outdoor
no code implementations • CVPR 2018 • Ziyi Shen, Wei-Sheng Lai, Tingfa Xu, Jan Kautz, Ming-Hsuan Yang
In this paper, we present an effective and efficient face deblurring algorithm by exploiting semantic cues via deep convolutional neural networks (CNNs).
no code implementations • CVPR 2018 • Arda Senocak, Tae-Hyun Oh, Junsik Kim, Ming-Hsuan Yang, In So Kweon
We show that even with a few supervision, false conclusion is able to be corrected and the source of sound in a visual scene can be localized effectively.
no code implementations • 31 Jan 2018 • Guangyu Zhong, Yi-Hsuan Tsai, Sifei Liu, Zhixun Su, Ming-Hsuan Yang
In this paper, we propose a learning-based method to compose a video-story from a group of video clips that describe an activity or experience.
no code implementations • 12 Jan 2018 • Donghoon Lee, Ming-Hsuan Yang, Songhwai Oh
Single image reflection separation is an ill-posed problem since two scenes, a transmitted scene and a reflected scene, need to be inferred from a single observation.
no code implementations • 14 Dec 2017 • Yi-Hsuan Tsai, Ming-Yu Liu, Deqing Sun, Ming-Hsuan Yang, Jan Kautz
Specifically, we target a streaming setting where the videos to be streamed from a server to a client are all in the same domain and they have to be compressed to a small size for low-latency transmission.
no code implementations • 11 Oct 2017 • Yijun Li, Jia-Bin Huang, Narendra Ahuja, Ming-Hsuan Yang
In contrast to existing methods that consider only the guidance image, the proposed algorithm can selectively transfer salient structures that are consistent with both guidance and target images.
no code implementations • 31 Mar 2017 • Wei-Sheng Lai, Yujia Huang, Neel Joshi, Chris Buehler, Ming-Hsuan Yang, Sing Bing Kang
We present a system for converting a fully panoramic ($360^\circ$) video into a normal field-of-view (NFOV) hyperlapse for an optimal viewing experience.
no code implementations • NeurIPS 2017 • Sifei Liu, Shalini De Mello, Jinwei Gu, Guangyu Zhong, Ming-Hsuan Yang, Jan Kautz
Specifically, we develop a three-way connection for the linear propagation model, which (a) formulates a sparse transformation matrix, where all elements can be the output from a deep CNN, but (b) results in a dense affinity matrix that effectively models any task-specific pairwise similarity matrix.
no code implementations • 14 Sep 2017 • Jingchun Cheng, Sifei Liu, Yi-Hsuan Tsai, Wei-Chih Hung, Shalini De Mello, Jinwei Gu, Jan Kautz, Shengjin Wang, Ming-Hsuan Yang
In addition, we apply a filter on the refined score map that aims to recognize the best connected region using spatial and temporal consistencies in the video.
no code implementations • 28 Aug 2017 • Yibing Song, Linchao Bao, Shengfeng He, Qingxiong Yang, Ming-Hsuan Yang
We address the problem of transferring the style of a headshot photo to face images.
no code implementations • ICCV 2017 • Wenqi Ren, Jinshan Pan, Xiaochun Cao, Ming-Hsuan Yang
We analyze the relationship between motion blur trajectory and optical flow, and present a novel pixel-wise non-linear kernel model to account for motion blur.
no code implementations • ICCV 2017 • Kihyuk Sohn, Sifei Liu, Guangyu Zhong, Xiang Yu, Ming-Hsuan Yang, Manmohan Chandraker
Despite rapid advances in face recognition, there remains a clear gap between the performance of still image-based face recognition and video-based face recognition, due to the vast difference in visual quality between the domains and the difficulty of curating diverse large-scale video datasets.
no code implementations • 6 Aug 2017 • Sifei Liu, Jianping Shi, Ji Liang, Ming-Hsuan Yang
Face parsing is an important problem in computer vision that finds numerous applications including recognition and editing.
no code implementations • ICCV 2017 • Yibing Song, Chao Ma, Lijun Gong, Jiawei Zhang, Rynson Lau, Ming-Hsuan Yang
Our method integrates feature extraction, response map generation as well as model update into the neural networks for an end-to-end training.
no code implementations • 5 Jun 2017 • Dong Li, Hsin-Ying Lee, Jia-Bin Huang, Shengjin Wang, Ming-Hsuan Yang
First, we exploit the discriminative constraints to capture the intra- and inter-class relationships of image embeddings.
no code implementations • CVPR 2017 • Yijun Li, Chen Fang, Jimei Yang, Zhaowen Wang, Xin Lu, Ming-Hsuan Yang
Recent progresses on deep discriminative and generative modeling have shown promising results on texture synthesis.
no code implementations • CVPR 2017 • Jiawei Zhang, Jinshan Pan, Wei-Sheng Lai, Rynson Lau, Ming-Hsuan Yang
In this paper, we propose a fully convolutional networks for iterative non-blind deconvolution We decompose the non-blind deconvolution problem into image denoising and image deconvolution.
no code implementations • 30 Oct 2016 • Kaihua Zhang, Qingshan Liu, Ming-Hsuan Yang
In this paper, we present a simple yet effective Boolean map based representation that exploits connectivity cues for visual tracking.