no code implementations • ECCV 2020 • Chun-Han Yao, Chen Fang, Xiaohui Shen, Yangyue Wan, Ming-Hsuan Yang
While single-image object detectors can be naively applied to videos in a frame-by-frame fashion, the prediction is often temporally inconsistent.
no code implementations • ECCV 2020 • Weitao Wan, Jiansheng Chen, Ming-Hsuan Yang
We call such a new robust training strategy the adversarial training with bi-directional likelihood regularization (ATBLR) method.
no code implementations • 12 Sep 2023 • Xiaoyu Zhou, Zhiwei Lin, Xiaojun Shan, Yongtao Wang, Deqing Sun, Ming-Hsuan Yang
Recent novel view synthesis methods obtain promising results for relatively small scenes, e. g., indoor environments and scenes with a few objects, but tend to fail for unbounded outdoor scenes with a single image as input.
no code implementations • 10 Sep 2023 • Shuangkang Fang, Yufeng Wang, Yi Yang, Yi-Hsuan Tsai, Wenrui Ding, Ming-Hsuan Yang, Shuchang Zhou
To tackle these issues, we introduce a text-driven editing method, termed DN2N, which allows for the direct acquisition of a NeRF model with universal editing capabilities, eliminating the requirement for retraining.
no code implementations • 22 Aug 2023 • Xin Li, Yuqing Huang, Zhenyu He, YaoWei Wang, Huchuan Lu, Ming-Hsuan Yang
Existing visual tracking methods typically take an image patch as the reference of the target to perform tracking.
1 code implementation • 22 Aug 2023 • Kuan-Chih Huang, Ming-Hsuan Yang, Yi-Hsuan Tsai
In this paper, we find that the motion cue of objects along different time frames is critical in 3D multi-object tracking, which is less explored in existing monocular-based approaches.
no code implementations • 14 Aug 2023 • Yu-Ju Tsai, Yu-Lun Liu, Lu Qi, Kelvin C. K. Chan, Ming-Hsuan Yang
Restoring facial details from low-quality (LQ) images has remained a challenging problem due to its ill-posedness induced by various degradations in the wild.
1 code implementation • 25 Jul 2023 • Muhammad Awais, Muzammal Naseer, Salman Khan, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, Fahad Shahbaz Khan
Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world.
1 code implementation • 21 Jul 2023 • Yunhao Ge, Yuecheng Li, Shuo Ni, Jiaping Zhao, Ming-Hsuan Yang, Laurent Itti
Reprogramming parameters are task-specific and exclusive to each task, which makes our method immune to catastrophic forgetting.
2 code implementations • 13 Jul 2023 • Muhammad Uzair Khattak, Syed Talal Wasim, Muzammal Naseer, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan
To the best of our knowledge, this is the first regularization framework for prompt learning that avoids overfitting by jointly attending to pre-trained model features, the training trajectory during prompting, and the textual diversity.
Ranked #1 on
Prompt Engineering
on ImageNet V2
no code implementations • 6 Jul 2023 • Liangzhe Yuan, Nitesh Bharadwaj Gundavarapu, Long Zhao, Hao Zhou, Yin Cui, Lu Jiang, Xuan Yang, Menglin Jia, Tobias Weyand, Luke Friedman, Mikhail Sirotenko, Huisheng Wang, Florian Schroff, Hartwig Adam, Ming-Hsuan Yang, Ting Liu, Boqing Gong
We evaluate existing foundation models video understanding capabilities using a carefully designed experiment protocol consisting of three hallmark tasks (action recognition, temporal localization, and spatiotemporal localization), eight datasets well received by the community, and four adaptation methods tailoring a foundation model (FM) for a downstream task.
1 code implementation • 30 Jun 2023 • Lijun Yu, Yong Cheng, Zhiruo Wang, Vivek Kumar, Wolfgang Macherey, Yanping Huang, David A. Ross, Irfan Essa, Yonatan Bisk, Ming-Hsuan Yang, Kevin Murphy, Alexander G. Hauptmann, Lu Jiang
In this work, we introduce Semantic Pyramid AutoEncoder (SPAE) for enabling frozen LLMs to perform both understanding and generation tasks involving non-linguistic modalities such as images or videos.
no code implementations • 7 Jun 2023 • Chun-Han Yao, Amit Raj, Wei-Chih Hung, Yuanzhen Li, Michael Rubinstein, Ming-Hsuan Yang, Varun Jampani
Second, we perform diffusion-guided 3D optimization to estimate shape and texture that are of high-fidelity and faithful to input images.
no code implementations • 2 Jun 2023 • Zhi-Kai Huang, Wei-Ting Chen, Yuan-Chun Chiang, Sy-Yen Kuo, Ming-Hsuan Yang
Crowd counting has recently attracted significant attention in the field of computer vision due to its wide applications to image understanding.
1 code implementation • 28 May 2023 • Lu Qi, Jason Kuen, Weidong Guo, Jiuxiang Gu, Zhe Lin, Bo Du, Yu Xu, Ming-Hsuan Yang
Despite the progress of image segmentation for accurate visual entity segmentation, completing the diverse requirements of image editing applications for different-level region-of-interest selections remains unsolved.
no code implementations • 24 May 2023 • Junyi Zhang, Charles Herrmann, Junhwa Hur, Luisa Polania Cabrera, Varun Jampani, Deqing Sun, Ming-Hsuan Yang
Text-to-image diffusion models have made significant advances in generating and editing high-quality images.
no code implementations • 27 Apr 2023 • Tsai-Shien Chen, Chieh Hubert Lin, Hung-Yu Tseng, Tsung-Yi Lin, Ming-Hsuan Yang
In response to this gap, we introduce MCDiff, a conditional diffusion model that generates a video from a starting image frame and a set of strokes, which allow users to specify the intended content and dynamics for synthesis.
no code implementations • 15 Apr 2023 • Hsin-Ping Huang, Yu-Chuan Su, Ming-Hsuan Yang
We tackle the long video generation problem, i. e.~generating videos beyond the output length of video generation models.
1 code implementation • 3 Apr 2023 • Amandeep Kumar, Ankan Kumar Bhunia, Sanath Narayan, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan
We present a method to efficiently generate 3D-aware high-resolution images that are view-consistent across multiple target views.
1 code implementation • CVPR 2023 • Akshay Dudhane, Syed Waqas Zamir, Salman Khan, Fahad Shahbaz Khan, Ming-Hsuan Yang
Unlike existing methods, the proposed alignment module not only aligns burst features but also exchanges feature information and maintains focused communication with the reference frame through the proposed reference-based feature enrichment mechanism, which facilitates handling complex motions.
no code implementations • 28 Mar 2023 • Yuanhao Xiong, Long Zhao, Boqing Gong, Ming-Hsuan Yang, Florian Schroff, Ting Liu, Cho-Jui Hsieh, Liangzhe Yuan
Most of existing video-language pre-training methods focus on instance-level alignment between video clips and captions via global contrastive learning but neglect rich fine-grained local information, which is of importance to downstream tasks requiring temporal localization and semantic reasoning.
2 code implementations • 27 Mar 2023 • Abdelrahman Shaker, Muhammad Maaz, Hanoona Rasheed, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan
Using our proposed efficient additive attention, we build a series of models called "SwiftFormer" which achieves state-of-the-art performance in terms of both accuracy and mobile inference speed.
1 code implementation • 16 Mar 2023 • Long Zhao, Liangzhe Yuan, Boqing Gong, Yin Cui, Florian Schroff, Ming-Hsuan Yang, Hartwig Adam, Ting Liu
To address this challenge, we propose UniVRD, a novel bottom-up method for Unified Visual Relationship Detection by leveraging vision and language models (VLMs).
Human-Object Interaction Detection
Relationship Detection
+2
no code implementations • 23 Jan 2023 • Chieh Hubert Lin, Hsin-Ying Lee, Willi Menapace, Menglei Chai, Aliaksandr Siarohin, Ming-Hsuan Yang, Sergey Tulyakov
Toward infinite-scale 3D city synthesis, we propose a novel framework, InfiniCity, which constructs and renders an unconstrainedly large and 3D-grounded environment from random noises.
1 code implementation • 3 Jan 2023 • Xiangtai Li, Shilin Xu, Yibo Yang, Haobo Yuan, Guangliang Cheng, Yunhai Tong, Zhouchen Lin, Ming-Hsuan Yang, DaCheng Tao
Third, inspired by Mask2Former, based on our meta-architecture, we propose Panoptic-PartFormer++ and design a new part-whole cross-attention scheme to boost part segmentation qualities further.
2 code implementations • 2 Jan 2023 • Huiwen Chang, Han Zhang, Jarred Barber, AJ Maschinot, Jose Lezama, Lu Jiang, Ming-Hsuan Yang, Kevin Murphy, William T. Freeman, Michael Rubinstein, Yuanzhen Li, Dilip Krishnan
Compared to pixel-space diffusion models, such as Imagen and DALL-E 2, Muse is significantly more efficient due to the use of discrete tokens and requiring fewer sampling iterations; compared to autoregressive models, such as Parti, Muse is more efficient due to the use of parallel decoding.
Ranked #20 on
Text-to-Image Generation
on COCO
1 code implementation • CVPR 2023 • Botao Ye, Sifei Liu, Xueting Li, Ming-Hsuan Yang
In this work, we introduce a self-supervised super-plane constraint by exploring the free geometry cues from the predicted surface, which can further regularize the reconstruction of plane regions without any other ground truth annotations.
no code implementations • 22 Dec 2022 • Christoph Mayer, Martin Danelljan, Ming-Hsuan Yang, Vittorio Ferrari, Luc van Gool, Alina Kuznetsova
Our approach achieves a 4x faster run-time in case of 10 concurrent objects compared to tracking each object independently and outperforms existing single object trackers on our new benchmark.
no code implementations • CVPR 2023 • Chun-Han Yao, Wei-Chih Hung, Yuanzhen Li, Michael Rubinstein, Ming-Hsuan Yang, Varun Jampani
Automatically estimating 3D skeleton, shape, camera viewpoints, and part articulation from sparse in-the-wild image ensembles is a severely under-constrained and challenging problem.
1 code implementation • 19 Dec 2022 • Cheng-Ju Ho, Chen-Hsuan Tai, Yi-Hsuan Tsai, Yen-Yu Lin, Ming-Hsuan Yang
In this work, we propose an object-level point augmentor (OPA) that performs local transformations for semi-supervised 3D object detection.
1 code implementation • CVPR 2023 • Lijun Yu, Yong Cheng, Kihyuk Sohn, José Lezama, Han Zhang, Huiwen Chang, Alexander G. Hauptmann, Ming-Hsuan Yang, Yuan Hao, Irfan Essa, Lu Jiang
We introduce the MAsked Generative VIdeo Transformer, MAGVIT, to tackle various video synthesis tasks with a single model.
Ranked #1 on
Video Generation
on UCF-101
no code implementations • 9 Dec 2022 • Youming Deng, Xueting Li, Sifei Liu, Ming-Hsuan Yang
To model the illumination of a scene, existing inverse rendering works either completely ignore the indirect illumination or model it by coarse approximations, leading to sub-optimal illumination, geometry, and material prediction of the scene.
1 code implementation • CVPR 2023 • Gaoxiang Cong, Liang Li, Yuankai Qi, ZhengJun Zha, Qi Wu, Wenyu Wang, Bin Jiang, Ming-Hsuan Yang, Qingming Huang
Given a piece of text, a video clip and a reference audio, the movie dubbing (also known as visual voice clone V2C) task aims to generate speeches that match the speaker's emotion presented in the video using the desired speaker voice as reference.
no code implementations • 8 Dec 2022 • Xinyan Liu, Guorong Li, Yuankai Qi, Zhenjun Han, Qingming Huang, Ming-Hsuan Yang, Nicu Sebe
Crowd localization aims to predict the spatial position of humans in a crowd scenario.
2 code implementations • 8 Dec 2022 • Abdelrahman Shaker, Muhammad Maaz, Hanoona Rasheed, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan
Owing to the success of transformer models, recent works study their applicability in 3D medical segmentation tasks.
no code implementations • 8 Dec 2022 • Ziheng Yan, Yuankai Qi, Guorong Li, Xinyan Liu, Weigang Zhang, Qingming Huang, Ming-Hsuan Yang
Crowd counting is usually handled in a density map regression fashion, which is supervised via a L2 loss between the predicted density map and ground truth.
no code implementations • CVPR 2023 • Chen Zhang, Guorong Li, Yuankai Qi, Shuhui Wang, Laiyun Qing, Qingming Huang, Ming-Hsuan Yang
Weakly supervised video anomaly detection aims to identify abnormal events in videos using only video-level labels.
no code implementations • CVPR 2023 • Hsin-Ping Huang, Charles Herrmann, Junhwa Hur, Erika Lu, Kyle Sargent, Austin Stone, Ming-Hsuan Yang, Deqing Sun
Recently, AutoFlow has shown promising results on learning a training set for optical flow, but requires ground truth labels in the target domain to compute its search metric.
1 code implementation • CVPR 2023 • Yunhao Ge, Jie Ren, Andrew Gallagher, Yuxiao Wang, Ming-Hsuan Yang, Hartwig Adam, Laurent Itti, Balaji Lakshminarayanan, Jiaping Zhao
We also show that our method improves across ImageNet shifted datasets, four other datasets, and other model architectures such as LiT.
no code implementations • 29 Nov 2022 • Taihong Xiao, ZiRui Wang, Liangliang Cao, Jiahui Yu, Shengyang Dai, Ming-Hsuan Yang
Vision-language foundation models pretrained on large-scale data provide a powerful tool for many visual understanding tasks.
1 code implementation • 21 Nov 2022 • Ling Yang, Zhilin Huang, Yang song, Shenda Hong, Guohao Li, Wentao Zhang, Bin Cui, Bernard Ghanem, Ming-Hsuan Yang
Generating images from graph-structured inputs, such as scene graphs, is uniquely challenging due to the difficulty of aligning nodes and connections in graphs with objects and their relations in images.
1 code implementation • 10 Nov 2022 • Lu Qi, Jason Kuen, Weidong Guo, Tiancheng Shen, Jiuxiang Gu, Jiaya Jia, Zhe Lin, Ming-Hsuan Yang
It improves mask prediction by fusing high-res image crops that provide more fine-grained image details and the full image.
no code implementations • 27 Oct 2022 • Jie Cao, Mandi Luo, Junchi Yu, Ming-Hsuan Yang, Ran He
Then, we optimize the augmented samples by minimizing the norms of the data scores, i. e., the gradients of the log-density functions.
no code implementations • 23 Oct 2022 • Yunfan Liu, Qi Li, Qiyao Deng, Zhenan Sun, Ming-Hsuan Yang
Facial Attribute Manipulation (FAM) aims to aesthetically modify a given face image to render desired attributes, which has received significant attention due to its broad practical applications ranging from digital entertainment to biometric forensics.
2 code implementations • 2 Sep 2022 • Ling Yang, Zhilong Zhang, Yang song, Shenda Hong, Runsheng Xu, Yue Zhao, Yingxia Shao, Wentao Zhang, Bin Cui, Ming-Hsuan Yang
This survey aims to provide a contextualized, in-depth look at the state of diffusion models, identifying the key areas of focus and pointing to potential areas for further exploration.
1 code implementation • 23 Aug 2022 • Chun-Han Yao, Jimei Yang, Duygu Ceylan, Yi Zhou, Yang Zhou, Ming-Hsuan Yang
An alternative approach is to estimate dense vertices of a predefined template body in the image space.
1 code implementation • 8 Aug 2022 • Jean Lahoud, Jiale Cao, Fahad Shahbaz Khan, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Ming-Hsuan Yang
The success of the transformer architecture in natural language processing has recently triggered attention in the computer vision field.
1 code implementation • 1 Aug 2022 • Lu Zhang, Lu Qi, Xu Yang, Hong Qiao, Ming-Hsuan Yang, Zhiyong Liu
In the first stage, we obtain a robust feature extractor, which could serve for all images with base and novel categories.
no code implementations • 15 Jul 2022 • Rui Qian, Yeqing Li, Zheng Xu, Ming-Hsuan Yang, Serge Belongie, Yin Cui
Utilizing vision and language models (VLMs) pre-trained on large-scale image-text pairs is becoming a promising paradigm for open-vocabulary visual recognition.
Ranked #1 on
Zero-Shot Action Recognition
on HMDB51
no code implementations • 7 Jul 2022 • Chun-Han Yao, Wei-Chih Hung, Yuanzhen Li, Michael Rubinstein, Ming-Hsuan Yang, Varun Jampani
In this work, we propose a practical problem setting to estimate 3D pose and shape of animals given only a few (10-30) in-the-wild images of a particular animal species (say, horse).
1 code implementation • 4 Jul 2022 • Zhiwei Lin, TingTing Liang, Taihong Xiao, Yongtao Wang, Zhi Tang, Ming-Hsuan Yang
To address this issue, we propose a neural architecture search method named FlowNAS to automatically find the better encoder architecture for flow estimation task.
no code implementations • 2 Jun 2022 • Chieh Hubert Lin, Hsin-Ying Lee, Hung-Yu Tseng, Maneesh Singh, Ming-Hsuan Yang
Recent studies show that paddings in convolutional neural networks encode absolute position information which can negatively affect the model performance for certain tasks.
1 code implementation • 19 Apr 2022 • Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, Ling Shao
In the former case, spatial details are preserved but the contextual information cannot be precisely encoded.
1 code implementation • 17 Apr 2022 • Hwanjun Song, Deqing Sun, Sanghyuk Chun, Varun Jampani, Dongyoon Han, Byeongho Heo, Wonjae Kim, Ming-Hsuan Yang
Transformers have been widely used in numerous vision problems especially for visual recognition and detection.
1 code implementation • 5 Apr 2022 • An-Chieh Cheng, Xueting Li, Sifei Liu, Min Sun, Ming-Hsuan Yang
With the capacity of modeling long-range dependencies in sequential data, transformers have shown remarkable performances in a variety of generative tasks such as image, audio, and text generation.
no code implementations • 4 Apr 2022 • Tiantian Wang, Nikolaos Sarafianos, Ming-Hsuan Yang, Tony Tung
We accomplish this by utilizing both the human pose that models the body shape as well as point clouds that partially cover the human as input.
no code implementations • 23 Mar 2022 • Hsin-Ping Huang, Deqing Sun, Yaojie Liu, Wen-Sheng Chu, Taihong Xiao, Jinwei Yuan, Hartwig Adam, Ming-Hsuan Yang
While recent face anti-spoofing methods perform well under the intra-domain setups, an effective approach needs to account for much larger appearance variations of images acquired in complex scenes with different sensors for robust performance.
3 code implementations • 20 Mar 2022 • Runsheng Xu, Hao Xiang, Zhengzhong Tu, Xin Xia, Ming-Hsuan Yang, Jiaqi Ma
In this paper, we investigate the application of Vehicle-to-Everything (V2X) communication to improve the perception performance of autonomous vehicles.
Ranked #1 on
3D Object Detection
on V2XSet
no code implementations • 26 Jan 2022 • Kaihao Zhang, Wenqi Ren, Wenhan Luo, Wei-Sheng Lai, Bjorn Stenger, Ming-Hsuan Yang, Hongdong Li
Image deblurring is a classic problem in low-level computer vision with the aim to recover a sharp image from a blurred input image.
no code implementations • CVPR 2022 • Yen-Chi Cheng, Chieh Hubert Lin, Hsin-Ying Lee, Jian Ren, Sergey Tulyakov, Ming-Hsuan Yang
Existing image outpainting methods pose the problem as a conditional image-to-image translation task, often generating repetitive structures and textures by replicating the content available in the input image.
no code implementations • 14 Dec 2021 • Qing Li, Boqing Gong, Yin Cui, Dan Kondratyuk, Xianzhi Du, Ming-Hsuan Yang, Matthew Brown
The experiments show that the resultant unified foundation transformer works surprisingly well on both the vision-only and text-only tasks, and the proposed knowledge distillation and gradient masking strategy can effectively lift the performance to approach the level of separately-trained models.
1 code implementation • 13 Dec 2021 • Xin Li, Qiao Liu, Wenjie Pei, Qiuhong Shen, YaoWei Wang, Huchuan Lu, Ming-Hsuan Yang
Along with the rapid progress of visual tracking, existing benchmarks become less informative due to redundancy of samples and weak discrimination between current trackers, making evaluations on all datasets extremely time-consuming.
1 code implementation • CVPR 2022 • Liangzhe Yuan, Rui Qian, Yin Cui, Boqing Gong, Florian Schroff, Ming-Hsuan Yang, Hartwig Adam, Ting Liu
Modern self-supervised learning algorithms typically enforce persistency of instance representations across views.
1 code implementation • 9 Dec 2021 • Lu Qi, Jason Kuen, Zhe Lin, Jiuxiang Gu, Fengyun Rao, Dian Li, Weidong Guo, Zhen Wen, Ming-Hsuan Yang, Jiaya Jia
To improve instance-level detection/segmentation performance, existing self-supervised and semi-supervised methods extract either task-unrelated or task-specific training signals from unlabeled data.
no code implementations • 8 Dec 2021 • Rui Qian, Yeqing Li, Liangzhe Yuan, Boqing Gong, Ting Liu, Matthew Brown, Serge Belongie, Ming-Hsuan Yang, Hartwig Adam, Yin Cui
The training objective consists of two parts: a fine-grained temporal learning objective to maximize the similarity between corresponding temporal embeddings in the short clip and the long clip, and a persistent temporal learning objective to pull together global embeddings of the two clips.
2 code implementations • 1 Dec 2021 • Kaihao Zhang, Tao Wang, Wenhan Luo, Boheng Chen, Wenqi Ren, Bjorn Stenger, Wei Liu, Hongdong Li, Ming-Hsuan Yang
Blur artifacts can seriously degrade the visual quality of images, and numerous deblurring methods have been proposed for specific scenarios.
1 code implementation • NeurIPS 2021 • Yan-Bo Lin, Hung-Yu Tseng, Hsin-Ying Lee, Yen-Yu Lin, Ming-Hsuan Yang
The audio-visual video parsing task aims to temporally parse a video into audio or visual event categories.
1 code implementation • 27 Nov 2021 • Pin-Hung Kuo, Jinshan Pan, Shao-Yi Chien, Ming-Hsuan Yang
Most existing methods usually formulate the non-blind deconvolution problem into a maximum-a-posteriori framework and address it by manually designing kinds of regularization terms and data terms of the latent clear images.
no code implementations • ICLR 2022 • Xueting Li, Shalini De Mello, Xiaolong Wang, Ming-Hsuan Yang, Jan Kautz, Sifei Liu
We propose a novel scene representation that encodes reaching distance -- the distance between any position in the scene to a goal along a feasible trajectory.
1 code implementation • CVPR 2022 • Zhihao Shi, Xiangyu Xu, Xiaohong Liu, Jun Chen, Ming-Hsuan Yang
Existing methods for video interpolation heavily rely on deep convolution neural networks, and thus suffer from their intrinsic limitations, such as content-agnostic kernel weights and restricted receptive field.
1 code implementation • CVPR 2022 • Hanhua Ye, Guorong Li, Yuankai Qi, Shuhui Wang, Qingming Huang, Ming-Hsuan Yang
(II) Predicate level, which learns the actions conditioned on highlighted objects and is supervised by the predicate in captions.
1 code implementation • 22 Nov 2021 • Muhammad Maaz, Hanoona Rasheed, Salman Khan, Fahad Shahbaz Khan, Rao Muhammad Anwer, Ming-Hsuan Yang
This has been a long-standing question in computer vision.
Ranked #1 on
Class-agnostic Object Detection
on COCO
11 code implementations • CVPR 2022 • Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang
Since convolutional neural networks (CNNs) perform well at learning generalizable image priors from large-scale data, these models have been extensively applied to image restoration and related tasks.
Ranked #1 on
Grayscale Image Denoising
on Urban100 sigma15
no code implementations • 18 Nov 2021 • Wei-Sheng Lai, YiChang Shih, Chia-Kai Liang, Ming-Hsuan Yang
Video blogs and selfies are popular social media formats, which are often captured by wide-angle cameras to show human subjects and expanded background.
no code implementations • 3 Nov 2021 • Yi-Wen Chen, Xiaojie Jin, Xiaohui Shen, Ming-Hsuan Yang
Video salient object detection aims to find the most visually distinctive objects in a video.
no code implementations • 14 Oct 2021 • Yufeng Wang, Yi-Hsuan Tsai, Wei-Chih Hung, Wenrui Ding, Shuo Liu, Ming-Hsuan Yang
Multi-Task Learning (MTL) aims to enhance the model generalization by sharing representations between related tasks for better performance.
1 code implementation • ICLR 2022 • Hwanjun Song, Deqing Sun, Sanghyuk Chun, Varun Jampani, Dongyoon Han, Byeongho Heo, Wonjae Kim, Ming-Hsuan Yang
Transformers are transforming the landscape of computer vision, especially for recognition tasks.
Ranked #11 on
Object Detection
on COCO 2017 val
1 code implementation • CVPR 2022 • Akshay Dudhane, Syed Waqas Zamir, Salman Khan, Fahad Shahbaz Khan, Ming-Hsuan Yang
Our central idea is to create a set of pseudo-burst features that combine complementary information from all the input burst frames to seamlessly exchange information.
Ranked #2 on
Burst Image Super-Resolution
on BurstSR
no code implementations • 22 Sep 2021 • Taihong Xiao, Sifei Liu, Shalini De Mello, Zhiding Yu, Jan Kautz, Ming-Hsuan Yang
Dense correspondence across semantically related images has been extensively studied, but still faces two challenges: 1) large variations in appearance, scale and pose exist even for objects from the same category, and 2) labeling pixel-level dense correspondences is labor intensive and infeasible to scale.
no code implementations • 17 Aug 2021 • Chun-Han Yao, Boqing Gong, Yin Cui, Hang Qi, Yukun Zhu, Ming-Hsuan Yang
We further take the server-client and inter-client domain shifts into account and pose a domain adaptation problem with one source (centralized server data) and multiple targets (distributed client data).
no code implementations • ICCV 2021 • Chun-Han Yao, Wei-Chih Hung, Varun Jampani, Ming-Hsuan Yang
Reasoning 3D shapes from 2D images is an essential yet challenging task, especially when only single-view images are at our disposal.
1 code implementation • NeurIPS 2021 • Yi-Wen Chen, Yi-Hsuan Tsai, Ming-Hsuan Yang
Specifically, we adopt RGB images for appearance, optical flow for motion, and depth maps for image structure.
no code implementations • NeurIPS 2021 • An-Chieh Cheng, Xueting Li, Min Sun, Ming-Hsuan Yang, Sifei Liu
We propose a canonical point autoencoder (CPAE) that predicts dense correspondences between 3D shapes of the same category.
no code implementations • 21 Jun 2021 • Xin Li, Wenjie Pei, YaoWei Wang, Zhenyu He, Huchuan Lu, Ming-Hsuan Yang
While deep-learning based tracking methods have achieved substantial progress, they entail large-scale and high-quality annotated data for sufficient training.
no code implementations • ICLR 2022 • Tsai-Shien Chen, Wei-Chih Hung, Hung-Yu Tseng, Shao-Yi Chien, Ming-Hsuan Yang
Self-supervised learning has recently shown great potential in vision tasks through contrastive learning, which aims to discriminate each image, or instance, in the dataset.
3 code implementations • 6 Jun 2021 • ShangHua Gao, Zhong-Yu Li, Ming-Hsuan Yang, Ming-Ming Cheng, Junwei Han, Philip Torr
In this work, we propose a new problem of large-scale unsupervised semantic segmentation (LUSS) with a newly created benchmark dataset to help the research progress.
Ranked #1 on
Unsupervised Semantic Segmentation
on ImageNet-S-50
1 code implementation • ICCV 2021 • Hsin-Ping Huang, Hung-Yu Tseng, Saurabh Saini, Maneesh Singh, Ming-Hsuan Yang
Second, we develop point cloud aggregation modules to gather the style information of the 3D scene, and then modulate the features in the point cloud with a linear transformation matrix.
1 code implementation • NeurIPS 2021 • Muzammal Naseer, Kanchana Ranasinghe, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang
We show and analyze the following intriguing properties of ViT: (a) Transformers are highly robust to severe occlusions, perturbations and domain shifts, e. g., retain as high as 60% top-1 accuracy on ImageNet even after randomly occluding 80% of the image content.
2 code implementations • ICCV 2021 • Yinxiao Li, Pengchong Jin, Feng Yang, Ce Liu, Ming-Hsuan Yang, Peyman Milanfar
Most video super-resolution methods focus on restoring high-resolution video frames from low-resolution videos without taking into account compression.
1 code implementation • CVPR 2021 • Jingkai Zhou, Varun Jampani, Zhixiong Pi, Qiong Liu, Ming-Hsuan Yang
Inspired by recent advances in attention, DDF decouples a depth-wise dynamic filter into spatial and channel dynamic filters.
Ranked #4 on
Semantic Segmentation
on MCubeS
1 code implementation • 26 Apr 2021 • Yu-Chuan Su, Soravit Changpinyo, Xiangning Chen, Sathish Thoppay, Cho-Jui Hsieh, Lior Shapira, Radu Soricut, Hartwig Adam, Matthew Brown, Ming-Hsuan Yang, Boqing Gong
To enable progress on this task, we create a new dataset consisting of 220k human-annotated 2. 5D relationships among 512K objects from 11K images.
1 code implementation • 20 Apr 2021 • Yi-Wen Chen, Yi-Hsuan Tsai, Ming-Hsuan Yang
While prior work usually treats each sentence and attends it to an object separately, we focus on learning a referring expression comprehension model that considers the property in synonymous sentences.
no code implementations • 16 Apr 2021 • Dingwen Zhang, Junwei Han, Gong Cheng, Ming-Hsuan Yang
As an emerging and challenging problem in the computer vision community, weakly supervised object localization and detection plays an important role for developing new generation computer vision systems and has received significant attention in the past decade.
1 code implementation • ICCV 2021 • Yuankai Qi, Zizheng Pan, Yicong Hong, Ming-Hsuan Yang, Anton Van Den Hengel, Qi Wu
Vision-and-Language Navigation (VLN) requires an agent to find a path to a remote location on the basis of natural-language instructions and a set of photo-realistic panoramas.
1 code implementation • ICLR 2022 • Chieh Hubert Lin, Hsin-Ying Lee, Yen-Chi Cheng, Sergey Tulyakov, Ming-Hsuan Yang
We present a novel framework, InfinityGAN, for arbitrary-sized image generation.
1 code implementation • CVPR 2021 • Hung-Yu Tseng, Lu Jiang, Ce Liu, Ming-Hsuan Yang, Weilong Yang
Recent years have witnessed the rapid progress of generative adversarial networks (GANs).
Ranked #1 on
Image Generation
on 25% ImageNet 128x128
no code implementations • 1 Apr 2021 • Yen-Chi Cheng, Chieh Hubert Lin, Hsin-Ying Lee, Jian Ren, Sergey Tulyakov, Ming-Hsuan Yang
Existing image outpainting methods pose the problem as a conditional image-to-image translation task, often generating repetitive structures and textures by replicating the content available in the input image.
no code implementations • 1 Apr 2021 • Yan-Bo Lin, Hung-Yu Tseng, Hsin-Ying Lee, Yen-Yu Lin, Ming-Hsuan Yang
Sound localization aims to find the source of the audio signal in the visual scene.
1 code implementation • CVPR 2021 • Jie Cao, Luanxuan Hou, Ming-Hsuan Yang, Ran He, Zhenan Sun
We interpolate training samples at the feature level and propose a novel content loss based on the perceptual relations among samples.
no code implementations • 26 Mar 2021 • Yun-Chun Chen, Marco Piccirilli, Robinson Piramuthu, Ming-Hsuan Yang
The key insights of our method are two-fold.
Ranked #20 on
3D Human Pose Estimation
on MPI-INF-3DHP
2 code implementations • ICCV 2021 • Yu-Lun Liu, Wei-Sheng Lai, Ming-Hsuan Yang, Yung-Yu Chuang, Jia-Bin Huang
Existing video stabilization methods often generate visible distortion or require aggressive cropping of frame boundaries, resulting in smaller field of views.
7 code implementations • CVPR 2021 • Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, Ling Shao
At each stage, we introduce a novel per-pixel adaptive design that leverages in-situ supervised attention to reweight the local features.
Ranked #4 on
Deblurring
on RSBlur
1 code implementation • 2 Feb 2021 • Xiangyu Xu, Yongrui Ma, Wenxiu Sun, Ming-Hsuan Yang
In this paper, we study the problem of real-scene single image super-resolution to bridge the gap between synthetic data and real captured images.
3 code implementations • 26 Jan 2021 • Xiangyu Xu, Muchen Li, Wenxiu Sun, Ming-Hsuan Yang
We present a spatial pixel aggregation network and learn the pixel sampling and averaging strategies for image denoising.
1 code implementation • 14 Jan 2021 • Weihao Xia, Yulun Zhang, Yujiu Yang, Jing-Hao Xue, Bolei Zhou, Ming-Hsuan Yang
GAN inversion aims to invert a given image back into the latent space of a pretrained GAN model, for the image to be faithfully reconstructed from the inverted code by the generator.
no code implementations • 4 Jan 2021 • Aditya Arora, Muhammad Haris, Syed Waqas Zamir, Munawar Hayat, Fahad Shahbaz Khan, Ling Shao, Ming-Hsuan Yang
These contexts can be crucial towards inferring several image enhancement tasks, e. g., local and global contrast, brightness and color corrections; which requires cues from both local and global spatial extent.
no code implementations • ICCV 2021 • Kaihao Zhang, Dongxu Li, Wenhan Luo, Wenqi Ren, Bjorn Stenger, Wei Liu, Hongdong Li, Ming-Hsuan Yang
Increasingly, modern mobile devices allow capturing images at Ultra-High-Definition (UHD) resolution, which includes 4K and 8K images.
no code implementations • ICCV 2021 • Tiantian Wang, Sifei Liu, Yapeng Tian, Kai Li, Ming-Hsuan Yang
In this paper, we propose to enhance the temporal coherence by Consistency-Regularized Graph Neural Networks (CRGNN) with the aid of a synthesized video matting dataset.
1 code implementation • ICCV 2021 • Sanath Narayan, Hisham Cholakkal, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, Ling Shao
The proposed formulation comprises a discriminative and a denoising loss term for enhancing temporal action localization.
Ranked #3 on
Weakly Supervised Action Localization
on THUMOS’14
no code implementations • NeurIPS 2020 • Xueting Li, Sifei Liu, Shalini De Mello, Kihwan Kim, Xiaolong Wang, Ming-Hsuan Yang, Jan Kautz
This paper presents an algorithm to reconstruct temporally consistent 3D meshes of deformable object instances from videos in the wild.
1 code implementation • 24 Nov 2020 • Yu-Ding Lu, Hsin-Ying Lee, Hung-Yu Tseng, Ming-Hsuan Yang
Interpretable generation process is beneficial to various image editing applications.
no code implementations • 18 Nov 2020 • Weitao Wan, Jiansheng Chen, Cheng Yu, Tong Wu, Yuanyi Zhong, Ming-Hsuan Yang
In this work, we propose a Gaussian mixture (GM) loss function for deep neural networks for visual classification.
1 code implementation • 2 Nov 2020 • Qi Mao, Hung-Yu Tseng, Hsin-Ying Lee, Jia-Bin Huang, Siwei Ma, Ming-Hsuan Yang
Generating a smooth sequence of intermediate results bridges the gap of two different domains, facilitating the morphing effect across domains.
no code implementations • 19 Oct 2020 • Nakul Agarwal, Yi-Ting Chen, Behzad Dariush, Ming-Hsuan Yang
Spatio-temporal action localization is an important problem in computer vision that involves detecting where and when activities occur, and therefore requires modeling of both spatial and temporal features.
no code implementations • 10 Oct 2020 • Qifei Wang, Junjie Ke, Joshua Greaves, Grace Chu, Gabriel Bender, Luciano Sbaiz, Alec Go, Andrew Howard, Feng Yang, Ming-Hsuan Yang, Jeff Gilbert, Peyman Milanfar
This approach effectively reduces the total number of parameters and FLOPS, encouraging positive knowledge transfer while mitigating negative interference across domains.
1 code implementation • ECCV 2020 • Cheng-Chun Hsu, Yi-Hsuan Tsai, Yen-Yu Lin, Ming-Hsuan Yang
A domain adaptive object detector aims to adapt itself to unseen domains that may contain variations of object appearance, viewpoints or backgrounds.
no code implementations • 18 Aug 2020 • Wei-Chih Hung, Henrik Kretzschmar, Tsung-Yi Lin, Yuning Chai, Ruichi Yu, Ming-Hsuan Yang, Dragomir Anguelov
Robust multi-object tracking (MOT) is a prerequisite fora safe deployment of self-driving cars.
1 code implementation • 12 Aug 2020 • Wenqing Chu, Wei-Chih Hung, Yi-Hsuan Tsai, Yu-Ting Chang, Yijun Li, Deng Cai, Ming-Hsuan Yang
Caricature is an artistic drawing created to abstract or exaggerate facial features of a person.
1 code implementation • 11 Aug 2020 • Yu-Lun Liu, Wei-Sheng Lai, Ming-Hsuan Yang, Yung-Yu Chuang, Jia-Bin Huang
We present a learning-based approach for removing unwanted obstructions, such as window reflections, fence occlusions, or adherent raindrops, from a short sequence of images captured by a moving camera.
3 code implementations • CVPR 2021 • Rui Qian, Tianjian Meng, Boqing Gong, Ming-Hsuan Yang, Huisheng Wang, Serge Belongie, Yin Cui
Our representations are learned using a contrastive loss, where two augmented clips from the same short video are pulled together in the embedding space, while clips from different videos are pushed away.
Ranked #1 on
Self-Supervised Action Recognition
on Kinetics-600
no code implementations • 3 Aug 2020 • Yu-Ting Chang, Qiaosong Wang, Wei-Chih Hung, Robinson Piramuthu, Yi-Hsuan Tsai, Ming-Hsuan Yang
Obtaining object response maps is one important step to achieve weakly-supervised semantic segmentation using image-level labels.
1 code implementation • CVPR 2020 • Yu-Ting Chang, Qiaosong Wang, Wei-Chih Hung, Robinson Piramuthu, Yi-Hsuan Tsai, Ming-Hsuan Yang
Existing weakly-supervised semantic segmentation methods using image-level annotations typically rely on initial responses to locate object regions.
1 code implementation • ECCV 2020 • Taihong Xiao, Jinwei Yuan, Deqing Sun, Qifei Wang, Xin-Yu Zhang, Kehan Xu, Ming-Hsuan Yang
Cost volume is an essential component of recent deep models for optical flow estimation and is usually constructed by calculating the inner product between two feature vectors.
no code implementations • ECCV 2020 • Hung-Yu Tseng, Hsin-Ying Lee, Lu Jiang, Ming-Hsuan Yang, Weilong Yang
Image generation from scene description is a cornerstone technique for the controlled generation, which is beneficial to applications such as content creation and image editing.
no code implementations • ECCV 2020 • Yen-Chi Cheng, Hsin-Ying Lee, Min Sun, Ming-Hsuan Yang
We also apply an off-the-shelf image-to-image translation model to generate realistic RGB images to better understand the quality of the synthesized semantic maps.
1 code implementation • ECCV 2020 • Hung-Yu Tseng, Matthew Fisher, Jingwan Lu, Yijun Li, Vladimir Kim, Ming-Hsuan Yang
People often create art by following an artistic workflow involving multiple stages that inform the overall design.
1 code implementation • 8 Jul 2020 • Xin-Yu Zhang, Taihong Xiao, HaoLin Jia, Ming-Ming Cheng, Ming-Hsuan Yang
In this work, we propose a simple yet effective meta-learning algorithm in semi-supervised learning.
no code implementations • 15 May 2020 • Mohammad K. Ebrahimpour, Jiayun Li, Yen-Yun Yu, Jackson L. Reese, Azadeh Moghtaderi, Ming-Hsuan Yang, David C. Noelle
The coarse functional distinction between these streams is between object recognition -- the "what" of the signal -- and extracting location related information -- the "where" of the signal.
no code implementations • 15 May 2020 • Mohammad K. Ebrahimpour, J. Ben Falandays, Samuel Spevack, Ming-Hsuan Yang, David C. Noelle
Inspired by this structure, we have proposed an object detection framework involving the integration of a "What Network" and a "Where Network".
no code implementations • ICLR 2020 • Jongbin Ryu, Gitaek Kwon, Ming-Hsuan Yang, Jongwoo Lim
When constructing random forests, it is of prime importance to ensure high accuracy and low correlation of individual tree classifiers for good performance.
1 code implementation • CVPR 2020 • Hang Dong, Jinshan Pan, Lei Xiang, Zhe Hu, Xinyi Zhang, Fei Wang, Ming-Hsuan Yang
To address the issue of preserving spatial information in the U-Net architecture, we design a dense feature fusion module using the back-projection feedback scheme.
Ranked #8 on
Image Dehazing
on Haze4k
1 code implementation • 13 Apr 2020 • Hung-Yu Tseng, Yi-Wen Chen, Yi-Hsuan Tsai, Sifei Liu, Yen-Yu Lin, Ming-Hsuan Yang
With the growing attention on learning-to-learn new tasks using only a few examples, meta-learning has been widely used in numerous problems such as few-shot classification, reinforcement learning, and domain generalization.
1 code implementation • CVPR 2020 • Yu-Lun Liu, Wei-Sheng Lai, Yu-Sheng Chen, Yi-Lung Kao, Ming-Hsuan Yang, Yung-Yu Chuang, Jia-Bin Huang
We model the HDRto-LDR image formation pipeline as the (1) dynamic range clipping, (2) non-linear mapping from a camera response function, and (3) quantization.
1 code implementation • CVPR 2020 • Yu-Lun Liu, Wei-Sheng Lai, Ming-Hsuan Yang, Yung-Yu Chuang, Jia-Bin Huang
We present a learning-based approach for removing unwanted obstructions, such as window reflections, fence occlusions or raindrops, from a short sequence of images captured by a moving camera.
no code implementations • 31 Mar 2020 • Yun-Chun Chen, Po-Hsiang Huang, Li-Yu Yu, Jia-Bin Huang, Ming-Hsuan Yang, Yen-Yu Lin
Establishing dense semantic correspondences between object instances remains a challenging problem due to background clutter, significant scale and pose differences, and large intra-class variations.
1 code implementation • 30 Mar 2020 • Junyi Feng, Songyuan Li, Xi Li, Fei Wu, Qi Tian, Ming-Hsuan Yang, Haibin Ling
Real-time semantic video segmentation is a challenging task due to the strict requirements of inference speed.
1 code implementation • CVPR 2020 • Muhammad Abdullah Jamal, Matthew Brown, Ming-Hsuan Yang, Liqiang Wang, Boqing Gong
Object frequency in the real world often follows a power law, leading to a mismatch between datasets with long-tailed class distributions seen by a machine learning model and our expectation of the model to perform well on all classes.
Ranked #27 on
Long-tail Learning
on Places-LT
1 code implementation • CVPR 2020 • Huan Wang, Yijun Li, Yuehai Wang, Haoji Hu, Ming-Hsuan Yang
In this work, we present a new knowledge distillation method (named Collaborative Distillation) for encoder-decoder based neural style transfer to reduce the convolutional filters.
8 code implementations • CVPR 2020 • Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, Ling Shao
This is mainly because the AWGN is not adequate for modeling the real camera noise which is signal-dependent and heavily transformed by the camera imaging pipeline.
Ranked #9 on
Image Denoising
on DND
(using extra training data)
12 code implementations • ECCV 2020 • Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, Ling Shao
With the goal of recovering high-quality image content from its degraded version, image restoration enjoys numerous applications, such as in surveillance, computational photography, medical imaging, and remote sensing.
Ranked #4 on
Image Denoising
on DND
1 code implementation • ECCV 2020 • Xueting Li, Sifei Liu, Kihwan Kim, Shalini De Mello, Varun Jampani, Ming-Hsuan Yang, Jan Kautz
To the best of our knowledge, we are the first to try and solve the single-view reconstruction problem without a category-specific template mesh or semantic keypoints.
1 code implementation • 2 Mar 2020 • Xinyi Zhang, Hang Dong, Zhe Hu, Wei-Sheng Lai, Fei Wang, Ming-Hsuan Yang
To address this problem, we propose a dual-branch convolutional neural network to extract base features and recovered features separately.
1 code implementation • 19 Feb 2020 • Xin-Yu Zhang, Kai Zhao, Taihong Xiao, Ming-Ming Cheng, Ming-Hsuan Yang
Recent advances in convolutional neural networks(CNNs) usually come with the expense of excessive computational overhead and memory footprint.
no code implementations • 19 Feb 2020 • Xiang Wang, Sifei Liu, Huimin Ma, Ming-Hsuan Yang
In this paper, we propose an iterative algorithm to learn such pairwise relations, which consists of two branches, a unary segmentation network which learns the label probabilities for each pixel, and a pairwise affinity network which learns affinity matrix and refines the probability map generated from the unary network.
Weakly supervised Semantic Segmentation
Weakly-Supervised Semantic Segmentation
1 code implementation • ICLR 2020 • Hung-Yu Tseng, Hsin-Ying Lee, Jia-Bin Huang, Ming-Hsuan Yang
Few-shot classification aims to recognize novel categories with only few labeled images in each class.
no code implementations • 19 Jan 2020 • Ziyi Shen, Wei-Sheng Lai, Tingfa Xu, Jan Kautz, Ming-Hsuan Yang
Specifically, we first use a coarse deblurring network to reduce the motion blur on the input face image.
no code implementations • 10 Jan 2020 • Shih-Han Chou, Wei-Lun Chao, Wei-Sheng Lai, Min Sun, Ming-Hsuan Yang
We then study two different VQA models on VQA 360, including one conventional model that takes an equirectangular image (with intrinsic distortion) as input and one dedicated model that first projects a 360 image onto cubemaps and subsequently aggregates the information from multiple spatial resolutions.
no code implementations • CVPR 2019 • Yun-Chun Chen, Yen-Yu Lin, Ming-Hsuan Yang, Jia-Bin Huang
Unsupervised domain adaptation algorithms aim to transfer the knowledge learned from one domain to another (e. g., synthetic to real images).
no code implementations • 30 Dec 2019 • Xiaojie Jin, Jiang Wang, Joshua Slocum, Ming-Hsuan Yang, Shengyang Dai, Shuicheng Yan, Jiashi Feng
In this paper, we propose the resource constrained differentiable architecture search (RC-DARTS) method to learn architectures that are significantly smaller and faster while achieving comparable accuracy.
no code implementations • 25 Dec 2019 • Yijun Li, Lu Jiang, Ming-Hsuan Yang
Image extrapolation aims at expanding the narrow field of view of a given image patch.
no code implementations • ECCV 2020 • Hsin-Ying Lee, Lu Jiang, Irfan Essa, Phuong B Le, Haifeng Gong, Ming-Hsuan Yang, Weilong Yang
The first module predicts a graph with complete relations from a graph with user-specified relations.
no code implementations • 22 Nov 2019 • Taihong Xiao, Yi-Hsuan Tsai, Kihyuk Sohn, Manmohan Chandraker, Ming-Hsuan Yang
For instance, there could be a potential privacy risk of machine learning systems via the model inversion attack, whose goal is to reconstruct the input data from the latent representation of deep networks.
1 code implementation • 20 Nov 2019 • Arda Senocak, Tae-Hyun Oh, Junsik Kim, Ming-Hsuan Yang, In So Kweon
Visual events are usually accompanied by sounds in our daily lives.
1 code implementation • NeurIPS 2019 • Hsin-Ying Lee, Xiaodong Yang, Ming-Yu Liu, Ting-Chun Wang, Yu-Ding Lu, Ming-Hsuan Yang, Jan Kautz
In the analysis phase, we decompose a dance into a series of basic dance units, through which the model learns how to move.
Ranked #3 on
Motion Synthesis
on BRACE
1 code implementation • NeurIPS 2019 • Xiangyu Xu, Li Si-Yao, Wenxiu Sun, Qian Yin, Ming-Hsuan Yang
Video interpolation is an important problem in computer vision, which helps overcome the temporal limitation of camera sensors.
1 code implementation • 24 Oct 2019 • Han-Kai Hsu, Chun-Han Yao, Yi-Hsuan Tsai, Wei-Chih Hung, Hung-Yu Tseng, Maneesh Singh, Ming-Hsuan Yang
This intermediate domain is constructed by translating the source images to mimic the ones in the target domain.
1 code implementation • 10 Oct 2019 • Yi-Wen Chen, Yi-Hsuan Tsai, Tiantian Wang, Yen-Yu Lin, Ming-Hsuan Yang
To this end, we propose an end-to-end trainable comprehension network that consists of the language and visual encoders to extract feature representations from both domains.
Ranked #15 on
Referring Expression Segmentation
on RefCoCo val