Search Results for author: Ming-Hsuan Yang

Found 343 papers, 173 papers with code

Video Object Detection via Object-level Temporal Aggregation

no code implementations ECCV 2020 Chun-Han Yao, Chen Fang, Xiaohui Shen, Yangyue Wan, Ming-Hsuan Yang

While single-image object detectors can be naively applied to videos in a frame-by-frame fashion, the prediction is often temporally inconsistent.

Object object-detection +2

RTracker: Recoverable Tracking via PN Tree Structured Memory

1 code implementation28 Mar 2024 Yuqing Huang, Xin Li, Zikun Zhou, YaoWei Wang, Zhenyu He, Ming-Hsuan Yang

Upon the PN tree memory, we develop corresponding walking rules for determining the state of the target and define a set of control flows to unite the tracker and the detector in different tracking scenarios.

Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

no code implementations29 Feb 2024 Tsai-Shien Chen, Aliaksandr Siarohin, Willi Menapace, Ekaterina Deyneka, Hsiang-wei Chao, Byung Eun Jeon, Yuwei Fang, Hsin-Ying Lee, Jian Ren, Ming-Hsuan Yang, Sergey Tulyakov

Next, we finetune a retrieval model on a small subset where the best caption of each video is manually selected and then employ the model in the whole dataset to select the best caption as the annotation.

Retrieval Text Retrieval +3

Interactive Multi-Head Self-Attention with Linear Complexity

no code implementations27 Feb 2024 Hankyul Kang, Ming-Hsuan Yang, Jongbin Ryu

In this work, we propose an effective method to decompose the attention operation into query- and key-less components.

Scene Prior Filtering for Depth Map Super-Resolution

no code implementations21 Feb 2024 Zhengxue Wang, Zhiqiang Yan, Ming-Hsuan Yang, Jinshan Pan, Jian Yang, Ying Tai, Guangwei Gao

Specifically, we design an All-in-one Prior Propagation that computes the similarity between multi-modal scene priors, i. e., RGB, normal, semantic, and depth, to reduce the texture interference.

Depth Map Super-Resolution

StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing

no code implementations20 Feb 2024 Gaoxiang Cong, Yuankai Qi, Liang Li, Amin Beheshti, Zhedong Zhang, Anton Van Den Hengel, Ming-Hsuan Yang, Chenggang Yan, Qingming Huang

It contains three main components: (1) A multimodal style adaptor operating at the phoneme level to learn pronunciation style from the reference audio, and generate intermediate representations informed by the facial emotion presented in the video; (2) An utterance-level style learning module, which guides both the mel-spectrogram decoding and the refining processes from the intermediate embeddings to improve the overall style expression; And (3) a phoneme-guided lip aligner to maintain lip sync.

Voice Cloning

Training Class-Imbalanced Diffusion Model Via Overlap Optimization

1 code implementation16 Feb 2024 Divin Yan, Lu Qi, Vincent Tao Hu, Ming-Hsuan Yang, Meng Tang

To address the observed appearance overlap between synthesized images of rare classes and tail classes, we propose a method based on contrastive learning to minimize the overlap between distributions of synthetic images for different classes.

Contrastive Learning Image Generation

PromptRR: Diffusion Models as Prompt Generators for Single Image Reflection Removal

1 code implementation4 Feb 2024 Tao Wang, Wanglong Lu, Kaihao Zhang, Wenhan Luo, Tae-Kyun Kim, Tong Lu, Hongdong Li, Ming-Hsuan Yang

For the prompt generation, we first propose a prompt pre-training strategy to train a frequency prompt encoder that encodes the ground-truth image into LF and HF prompts.

Reflection Removal

Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding

1 code implementation31 Dec 2023 Syed Talal Wasim, Muzammal Naseer, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan

Our contributions include a novel spatio-temporal video grounding model, surpassing state-of-the-art results in closed-set evaluations on multiple datasets and demonstrating superior performance in open-vocabulary scenarios.

Spatio-Temporal Video Grounding Video Grounding +1

VidToMe: Video Token Merging for Zero-Shot Video Editing

no code implementations17 Dec 2023 Xirui Li, Chao Ma, Xiaokang Yang, Ming-Hsuan Yang

In this work, we propose a novel approach to enhance temporal consistency in generated videos by merging self-attention tokens across frames.

Video Editing Video Generation

PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection

1 code implementation13 Dec 2023 Kuan-Chih Huang, Weijie Lyu, Ming-Hsuan Yang, Yi-Hsuan Tsai

Recent temporal LiDAR-based 3D object detectors achieve promising performance based on the two-stage proposal-based approach.

3D Object Detection object-detection

DrivingGaussian: Composite Gaussian Splatting for Surrounding Dynamic Autonomous Driving Scenes

1 code implementation13 Dec 2023 Xiaoyu Zhou, Zhiwei Lin, Xiaojun Shan, Yongtao Wang, Deqing Sun, Ming-Hsuan Yang

We present DrivingGaussian, an efficient and effective framework for surrounding dynamic autonomous driving scenes.

Autonomous Driving

Weakly Supervised 3D Object Detection via Multi-Level Visual Guidance

no code implementations12 Dec 2023 Kuan-Chih Huang, Yi-Hsuan Tsai, Ming-Hsuan Yang

Finally, the training-level constraint is utilized by producing accurate and consistent 3D pseudo-labels that align with the visual data.

3D Object Detection object-detection

Weakly Supervised Video Individual CountingWeakly Supervised Video Individual Counting

1 code implementation10 Dec 2023 Xinyan Liu, Guorong Li, Yuankai Qi, Ziheng Yan, Zhenjun Han, Anton Van Den Hengel, Ming-Hsuan Yang, Qingming Huang

% To provide a more realistic reflection of the underlying practical challenge, we introduce a weakly supervised VIC task, wherein trajectory labels are not provided.

Contrastive Learning Video Individual Counting

CSL: Class-Agnostic Structure-Constrained Learning for Segmentation Including the Unseen

no code implementations9 Dec 2023 Hao Zhang, Fang Li, Lu Qi, Ming-Hsuan Yang, Narendra Ahuja

Addressing Out-Of-Distribution (OOD) Segmentation and Zero-Shot Semantic Segmentation (ZS3) is challenging, necessitating segmenting unseen classes.

Domain Adaptation Segmentation +2

Towards 4D Human Video Stylization

1 code implementation7 Dec 2023 Tiantian Wang, Xinxin Zuo, Fangzhou Mu, Jian Wang, Ming-Hsuan Yang

To overcome these limitations, we leverage Neural Radiance Fields (NeRFs) to represent videos, conducting stylization in the rendered feature space.

Novel View Synthesis Style Transfer +1

DreaMo: Articulated 3D Reconstruction From A Single Casual Video

no code implementations5 Dec 2023 Tao Tu, Ming-Feng Li, Chieh Hubert Lin, Yen-Chi Cheng, Min Sun, Ming-Hsuan Yang

In this work, we study articulated 3D shape reconstruction from a single and casually captured internet video, where the subject's view coverage is incomplete.

3D Reconstruction 3D Shape Reconstruction

Diffusion-SS3D: Diffusion Model for Semi-supervised 3D Object Detection

1 code implementation NeurIPS 2023 Cheng-Ju Ho, Chen-Hsuan Tai, Yen-Yu Lin, Ming-Hsuan Yang, Yi-Hsuan Tsai

Semi-supervised object detection is crucial for 3D scene understanding, efficiently addressing the limitation of acquiring large-scale 3D bounding box annotations.

3D Object Detection Denoising +5

Fine-grained Controllable Video Generation via Object Appearance and Context

no code implementations5 Dec 2023 Hsin-Ping Huang, Yu-Chuan Su, Deqing Sun, Lu Jiang, Xuhui Jia, Yukun Zhu, Ming-Hsuan Yang

To achieve detailed control, we propose a unified framework to jointly inject control signals into the existing text-to-video model.

Text-to-Video Generation Video Generation

Dynamic Erasing Network Based on Multi-Scale Temporal Features for Weakly Supervised Video Anomaly Detection

1 code implementation4 Dec 2023 Chen Zhang, Guorong Li, Yuankai Qi, Hanhua Ye, Laiyun Qing, Ming-Hsuan Yang, Qingming Huang

To address these limitations, we propose a Dynamic Erasing Network (DE-Net) for weakly supervised video anomaly detection, which learns multi-scale temporal features.

Anomaly Detection Video Anomaly Detection

Effective Adapter for Face Recognition in the Wild

no code implementations4 Dec 2023 Yunhao Liu, Lu Qi, Yu-Ju Tsai, Xiangtai Li, Kelvin C. K. Chan, Ming-Hsuan Yang

The key of our adapter is to process both the unrefined and the enhanced images by two similar structures where one is fixed and the other trainable.

Face Recognition

UniGS: Unified Representation for Image Generation and Segmentation

1 code implementation4 Dec 2023 Lu Qi, Lehan Yang, Weidong Guo, Yu Xu, Bo Du, Varun Jampani, Ming-Hsuan Yang

On the other hand, the progressive dichotomy module can efficiently decode the synthesized colormap to high-quality entity-level masks in a depth-first binary search without knowing the cluster numbers.

Image Generation Segmentation

Multi-task Image Restoration Guided By Robust DINO Features

no code implementations4 Dec 2023 Xin Lin, Chao Ren, Kelvin C. K. Chan, Lu Qi, Jinshan Pan, Ming-Hsuan Yang

Multi-task image restoration has gained significant interest due to its inherent versatility and efficiency compared to its single-task counterpart.

Image Restoration

Exploiting Diffusion Prior for Generalizable Pixel-Level Semantic Prediction

1 code implementation30 Nov 2023 Hung-Yu Tseng, Hsin-Ying Lee, Ming-Hsuan Yang

Contents generated by recent advanced Text-to-Image (T2I) diffusion models are sometimes too imaginative for existing off-the-shelf property semantic predictors to estimate due to the immitigable domain gap.

Intrinsic Image Decomposition Semantic Segmentation

Telling Left from Right: Identifying Geometry-Aware Semantic Correspondence

no code implementations28 Nov 2023 Junyi Zhang, Charles Herrmann, Junhwa Hur, Eric Chen, Varun Jampani, Deqing Sun, Ming-Hsuan Yang

This paper identifies the importance of being geometry-aware for semantic correspondence and reveals a limitation of the features of current foundation models under simple post-processing.

Animal Pose Estimation Semantic correspondence

Pyramid Diffusion for Fine 3D Large Scene Generation

1 code implementation20 Nov 2023 Yuheng Liu, Xinke Li, Xueting Li, Lu Qi, Chongshou Li, Ming-Hsuan Yang

Directly transferring the 2D techniques to 3D scene generation is challenging due to significant resolution reduction and the scarcity of comprehensive real-world 3D scene datasets.

Scene Generation

Rethinking Evaluation Metrics of Open-Vocabulary Segmentaion

1 code implementation6 Nov 2023 Hao Zhou, Tiancheng Shen, Xu Yang, Hai Huang, Xiangtai Li, Lu Qi, Ming-Hsuan Yang

We benchmarked the proposed evaluation metrics on 12 open-vocabulary methods of three segmentation tasks.

Segmentation

GLaMM: Pixel Grounding Large Multimodal Model

1 code implementation6 Nov 2023 Hanoona Rasheed, Muhammad Maaz, Sahal Shaji Mullappilly, Abdelrahman Shaker, Salman Khan, Hisham Cholakkal, Rao M. Anwer, Erix Xing, Ming-Hsuan Yang, Fahad S. Khan

In this work, we present Grounding LMM (GLaMM), the first model that can generate natural language responses seamlessly intertwined with corresponding object segmentation masks.

Conversational Question Answering Image Captioning +5

One-for-All: Towards Universal Domain Translation with a Single StyleGAN

no code implementations22 Oct 2023 Yong Du, Jiahui Zhan, Shengfeng He, Xinzhe Li, Junyu Dong, Sheng Chen, Ming-Hsuan Yang

In this paper, we propose a novel translation model, UniTranslator, for transforming representations between visually distinct domains under conditions of limited training data and significant visual differences.

Translation

SAMPLING: Scene-adaptive Hierarchical Multiplane Images Representation for Novel View Synthesis from a Single Image

no code implementations ICCV 2023 Xiaoyu Zhou, Zhiwei Lin, Xiaojun Shan, Yongtao Wang, Deqing Sun, Ming-Hsuan Yang

Recent novel view synthesis methods obtain promising results for relatively small scenes, e. g., indoor environments and scenes with a few objects, but tend to fail for unbounded outdoor scenes with a single image as input.

Novel View Synthesis

Editing 3D Scenes via Text Prompts without Retraining

no code implementations10 Sep 2023 Shuangkang Fang, Yufeng Wang, Yi Yang, Yi-Hsuan Tsai, Wenrui Ding, Shuchang Zhou, Ming-Hsuan Yang

To tackle these issues, we introduce a text-driven editing method, termed DN2N, which allows for the direct acquisition of a NeRF model with universal editing capabilities, eliminating the requirement for retraining.

3D scene Editing 3D Scene Reconstruction +2

Delving into Motion-Aware Matching for Monocular 3D Object Tracking

1 code implementation ICCV 2023 Kuan-Chih Huang, Ming-Hsuan Yang, Yi-Hsuan Tsai

In this paper, we find that the motion cue of objects along different time frames is critical in 3D multi-object tracking, which is less explored in existing monocular-based approaches.

3D Multi-Object Tracking 3D Object Tracking +3

CiteTracker: Correlating Image and Text for Visual Tracking

1 code implementation ICCV 2023 Xin Li, Yuqing Huang, Zhenyu He, YaoWei Wang, Huchuan Lu, Ming-Hsuan Yang

Existing visual tracking methods typically take an image patch as the reference of the target to perform tracking.

Attribute Descriptive +2

Dual Associated Encoder for Face Restoration

1 code implementation14 Aug 2023 Yu-Ju Tsai, Yu-Lun Liu, Lu Qi, Kelvin C. K. Chan, Ming-Hsuan Yang

Restoring facial details from low-quality (LQ) images has remained a challenging problem due to its ill-posedness induced by various degradations in the wild.

Foundational Models Defining a New Era in Vision: A Survey and Outlook

1 code implementation25 Jul 2023 Muhammad Awais, Muzammal Naseer, Salman Khan, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, Fahad Shahbaz Khan

Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world.

Benchmarking

CLR: Channel-wise Lightweight Reprogramming for Continual Learning

1 code implementation ICCV 2023 Yunhao Ge, Yuecheng Li, Shuo Ni, Jiaping Zhao, Ming-Hsuan Yang, Laurent Itti

Reprogramming parameters are task-specific and exclusive to each task, which makes our method immune to catastrophic forgetting.

Continual Learning Image Classification

Self-regulating Prompts: Foundational Model Adaptation without Forgetting

2 code implementations ICCV 2023 Muhammad Uzair Khattak, Syed Talal Wasim, Muzammal Naseer, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan

To the best of our knowledge, this is the first regularization framework for prompt learning that avoids overfitting by jointly attending to pre-trained model features, the training trajectory during prompting, and the textual diversity.

Prompt Engineering

VideoGLUE: Video General Understanding Evaluation of Foundation Models

1 code implementation6 Jul 2023 Liangzhe Yuan, Nitesh Bharadwaj Gundavarapu, Long Zhao, Hao Zhou, Yin Cui, Lu Jiang, Xuan Yang, Menglin Jia, Tobias Weyand, Luke Friedman, Mikhail Sirotenko, Huisheng Wang, Florian Schroff, Hartwig Adam, Ming-Hsuan Yang, Ting Liu, Boqing Gong

We evaluate existing foundation models video understanding capabilities using a carefully designed experiment protocol consisting of three hallmark tasks (action recognition, temporal localization, and spatiotemporal localization), eight datasets well received by the community, and four adaptation methods tailoring a foundation model (FM) for a downstream task.

Action Recognition Temporal Localization +1

SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs

no code implementations NeurIPS 2023 Lijun Yu, Yong Cheng, Zhiruo Wang, Vivek Kumar, Wolfgang Macherey, Yanping Huang, David A. Ross, Irfan Essa, Yonatan Bisk, Ming-Hsuan Yang, Kevin Murphy, Alexander G. Hauptmann, Lu Jiang

In this work, we introduce Semantic Pyramid AutoEncoder (SPAE) for enabling frozen LLMs to perform both understanding and generation tasks involving non-linguistic modalities such as images or videos.

In-Context Learning multimodal generation

Counting Crowds in Bad Weather

no code implementations ICCV 2023 Zhi-Kai Huang, Wei-Ting Chen, Yuan-Chun Chiang, Sy-Yen Kuo, Ming-Hsuan Yang

Crowd counting has recently attracted significant attention in the field of computer vision due to its wide applications to image understanding.

Crowd Counting Image Restoration

AIMS: All-Inclusive Multi-Level Segmentation

1 code implementation28 May 2023 Lu Qi, Jason Kuen, Weidong Guo, Jiuxiang Gu, Zhe Lin, Bo Du, Yu Xu, Ming-Hsuan Yang

Despite the progress of image segmentation for accurate visual entity segmentation, completing the diverse requirements of image editing applications for different-level region-of-interest selections remains unsolved.

Image Segmentation Segmentation +1

Motion-Conditioned Diffusion Model for Controllable Video Synthesis

no code implementations27 Apr 2023 Tsai-Shien Chen, Chieh Hubert Lin, Hung-Yu Tseng, Tsung-Yi Lin, Ming-Hsuan Yang

In response to this gap, we introduce MCDiff, a conditional diffusion model that generates a video from a starting image frame and a set of strokes, which allow users to specify the intended content and dynamics for synthesis.

Motion Synthesis

Video Generation Beyond a Single Clip

no code implementations15 Apr 2023 Hsin-Ping Huang, Yu-Chuan Su, Ming-Hsuan Yang

We tackle the long video generation problem, i. e.~generating videos beyond the output length of video generation models.

Video Generation

Burstormer: Burst Image Restoration and Enhancement Transformer

1 code implementation CVPR 2023 Akshay Dudhane, Syed Waqas Zamir, Salman Khan, Fahad Shahbaz Khan, Ming-Hsuan Yang

Unlike existing methods, the proposed alignment module not only aligns burst features but also exchanges feature information and maintains focused communication with the reference frame through the proposed reference-based feature enrichment mechanism, which facilitates handling complex motions.

Denoising Image Restoration +1

Structured Video-Language Modeling with Temporal Grouping and Spatial Grounding

no code implementations28 Mar 2023 Yuanhao Xiong, Long Zhao, Boqing Gong, Ming-Hsuan Yang, Florian Schroff, Ting Liu, Cho-Jui Hsieh, Liangzhe Yuan

Existing video-language pre-training methods primarily focus on instance-level alignment between video clips and captions via global contrastive learning but neglect rich fine-grained local information in both videos and text, which is of importance to downstream tasks requiring temporal localization and semantic reasoning.

Action Recognition Contrastive Learning +7

SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications

2 code implementations ICCV 2023 Abdelrahman Shaker, Muhammad Maaz, Hanoona Rasheed, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan

Using our proposed efficient additive attention, we build a series of models called "SwiftFormer" which achieves state-of-the-art performance in terms of both accuracy and mobile inference speed.

Unified Visual Relationship Detection with Vision and Language Models

1 code implementation ICCV 2023 Long Zhao, Liangzhe Yuan, Boqing Gong, Yin Cui, Florian Schroff, Ming-Hsuan Yang, Hartwig Adam, Ting Liu

To address this challenge, we propose UniVRD, a novel bottom-up method for Unified Visual Relationship Detection by leveraging vision and language models (VLMs).

Human-Object Interaction Detection Relationship Detection +2

InfiniCity: Infinite-Scale City Synthesis

no code implementations ICCV 2023 Chieh Hubert Lin, Hsin-Ying Lee, Willi Menapace, Menglei Chai, Aliaksandr Siarohin, Ming-Hsuan Yang, Sergey Tulyakov

Toward infinite-scale 3D city synthesis, we propose a novel framework, InfiniCity, which constructs and renders an unconstrainedly large and 3D-grounded environment from random noises.

Image Generation Neural Rendering

PanopticPartFormer++: A Unified and Decoupled View for Panoptic Part Segmentation

1 code implementation3 Jan 2023 Xiangtai Li, Shilin Xu, Yibo Yang, Haobo Yuan, Guangliang Cheng, Yunhai Tong, Zhouchen Lin, Ming-Hsuan Yang, DaCheng Tao

Third, inspired by Mask2Former, based on our meta-architecture, we propose Panoptic-PartFormer++ and design a new part-whole cross-attention scheme to boost part segmentation qualities further.

Panoptic Segmentation Segmentation

Muse: Text-To-Image Generation via Masked Generative Transformers

4 code implementations2 Jan 2023 Huiwen Chang, Han Zhang, Jarred Barber, AJ Maschinot, Jose Lezama, Lu Jiang, Ming-Hsuan Yang, Kevin Murphy, William T. Freeman, Michael Rubinstein, Yuanzhen Li, Dilip Krishnan

Compared to pixel-space diffusion models, such as Imagen and DALL-E 2, Muse is significantly more efficient due to the use of discrete tokens and requiring fewer sampling iterations; compared to autoregressive models, such as Parti, Muse is more efficient due to the use of parallel decoding.

Language Modelling Large Language Model +1

Self-Supervised Super-Plane for Neural 3D Reconstruction

1 code implementation CVPR 2023 Botao Ye, Sifei Liu, Xueting Li, Ming-Hsuan Yang

In this work, we introduce a self-supervised super-plane constraint by exploring the free geometry cues from the predicted surface, which can further regularize the reconstruction of plane regions without any other ground truth annotations.

3D Reconstruction

High Quality Entity Segmentation

no code implementations ICCV 2023 Lu Qi, Jason Kuen, Tiancheng Shen, Jiuxiang Gu, Wenbo Li, Weidong Guo, Jiaya Jia, Zhe Lin, Ming-Hsuan Yang

Given the high-quality and -resolution nature of the dataset, we propose CropFormer which is designed to tackle the intractability of instance-level segmentation on high-resolution images.

Image Segmentation Segmentation +1

Beyond SOT: Tracking Multiple Generic Objects at Once

1 code implementation22 Dec 2022 Christoph Mayer, Martin Danelljan, Ming-Hsuan Yang, Vittorio Ferrari, Luc van Gool, Alina Kuznetsova

Our approach achieves a 4x faster run-time in case of 10 concurrent objects compared to tracking each object independently and outperforms existing single object trackers on our new benchmark.

Attribute Object +1

Hi-LASSIE: High-Fidelity Articulated Shape and Skeleton Discovery from Sparse Image Ensemble

1 code implementation CVPR 2023 Chun-Han Yao, Wei-Chih Hung, Yuanzhen Li, Michael Rubinstein, Ming-Hsuan Yang, Varun Jampani

Automatically estimating 3D skeleton, shape, camera viewpoints, and part articulation from sparse in-the-wild image ensembles is a severely under-constrained and challenging problem.

Learning Object-level Point Augmentor for Semi-supervised 3D Object Detection

1 code implementation19 Dec 2022 Cheng-Ju Ho, Chen-Hsuan Tai, Yi-Hsuan Tsai, Yen-Yu Lin, Ming-Hsuan Yang

In this work, we propose an object-level point augmentor (OPA) that performs local transformations for semi-supervised 3D object detection.

3D Object Detection Knowledge Distillation +4

BEV-MAE: Bird's Eye View Masked Autoencoders for Point Cloud Pre-training in Autonomous Driving Scenarios

1 code implementation12 Dec 2022 Zhiwei Lin, Yongtao Wang, Shengxiang Qi, Nan Dong, Ming-Hsuan Yang

Based on the property of outdoor point clouds in autonomous driving scenarios, i. e., the point clouds of distant objects are more sparse, we propose point density prediction to enable the 3D encoder to learn location information, which is essential for object detection.

3D Object Detection Autonomous Driving +3

Physics-based Indirect Illumination for Inverse Rendering

no code implementations9 Dec 2022 Youming Deng, Xueting Li, Sifei Liu, Ming-Hsuan Yang

We present a physics-based inverse rendering method that learns the illumination, geometry, and materials of a scene from posed multi-view RGB images.

Efficient Neural Network Inverse Rendering +1

Learning to Dub Movies via Hierarchical Prosody Models

1 code implementation CVPR 2023 Gaoxiang Cong, Liang Li, Yuankai Qi, ZhengJun Zha, Qi Wu, Wenyu Wang, Bin Jiang, Ming-Hsuan Yang, Qingming Huang

Given a piece of text, a video clip and a reference audio, the movie dubbing (also known as visual voice clone V2C) task aims to generate speeches that match the speaker's emotion presented in the video using the desired speaker voice as reference.

Progressive Multi-resolution Loss for Crowd Counting

no code implementations8 Dec 2022 Ziheng Yan, Yuankai Qi, Guorong Li, Xinyan Liu, Weigang Zhang, Qingming Huang, Ming-Hsuan Yang

Crowd counting is usually handled in a density map regression fashion, which is supervised via a L2 loss between the predicted density map and ground truth.

Crowd Counting

Self-supervised AutoFlow

no code implementations CVPR 2023 Hsin-Ping Huang, Charles Herrmann, Junhwa Hur, Erika Lu, Kyle Sargent, Austin Stone, Ming-Hsuan Yang, Deqing Sun

Recently, AutoFlow has shown promising results on learning a training set for optical flow, but requires ground truth labels in the target domain to compute its search metric.

Optical Flow Estimation

Diffusion-Based Scene Graph to Image Generation with Masked Contrastive Pre-Training

1 code implementation21 Nov 2022 Ling Yang, Zhilin Huang, Yang song, Shenda Hong, Guohao Li, Wentao Zhang, Bin Cui, Bernard Ghanem, Ming-Hsuan Yang

Generating images from graph-structured inputs, such as scene graphs, is uniquely challenging due to the difficulty of aligning nodes and connections in graphs with objects and their relations in images.

Image Generation

High-Quality Entity Segmentation

1 code implementation10 Nov 2022 Lu Qi, Jason Kuen, Weidong Guo, Tiancheng Shen, Jiuxiang Gu, Jiaya Jia, Zhe Lin, Ming-Hsuan Yang

It improves mask prediction by fusing high-res image crops that provide more fine-grained image details and the full image.

Image Segmentation Segmentation +2

ScoreMix: A Scalable Augmentation Strategy for Training GANs with Limited Data

no code implementations27 Oct 2022 Jie Cao, Mandi Luo, Junchi Yu, Ming-Hsuan Yang, Ran He

Then, we optimize the augmented samples by minimizing the norms of the data scores, i. e., the gradients of the log-density functions.

Data Augmentation Image Generation

GAN-based Facial Attribute Manipulation

no code implementations23 Oct 2022 Yunfan Liu, Qi Li, Qiyao Deng, Zhenan Sun, Ming-Hsuan Yang

Facial Attribute Manipulation (FAM) aims to aesthetically modify a given face image to render desired attributes, which has received significant attention due to its broad practical applications ranging from digital entertainment to biometric forensics.

Attribute

Diffusion Models: A Comprehensive Survey of Methods and Applications

2 code implementations2 Sep 2022 Ling Yang, Zhilong Zhang, Yang song, Shenda Hong, Runsheng Xu, Yue Zhao, Yingxia Shao, Wentao Zhang, Bin Cui, Ming-Hsuan Yang

This survey aims to provide a contextualized, in-depth look at the state of diffusion models, identifying the key areas of focus and pointing to potential areas for further exploration.

Image Super-Resolution Text-to-Image Generation +1

Learning Visibility for Robust Dense Human Body Estimation

1 code implementation23 Aug 2022 Chun-Han Yao, Jimei Yang, Duygu Ceylan, Yi Zhou, Yang Zhou, Ming-Hsuan Yang

An alternative approach is to estimate dense vertices of a predefined template body in the image space.

3D Vision with Transformers: A Survey

1 code implementation8 Aug 2022 Jean Lahoud, Jiale Cao, Fahad Shahbaz Khan, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Ming-Hsuan Yang

The success of the transformer architecture in natural language processing has recently triggered attention in the computer vision field.

Pose Estimation

Automatically Discovering Novel Visual Categories with Self-supervised Prototype Learning

1 code implementation1 Aug 2022 Lu Zhang, Lu Qi, Xu Yang, Hong Qiao, Ming-Hsuan Yang, Zhiyong Liu

In the first stage, we obtain a robust feature extractor, which could serve for all images with base and novel categories.

Representation Learning Self-Supervised Learning

Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models

no code implementations15 Jul 2022 Rui Qian, Yeqing Li, Zheng Xu, Ming-Hsuan Yang, Serge Belongie, Yin Cui

Utilizing vision and language models (VLMs) pre-trained on large-scale image-text pairs is becoming a promising paradigm for open-vocabulary visual recognition.

Optical Flow Estimation Video Classification +1

LASSIE: Learning Articulated Shapes from Sparse Image Ensemble via 3D Part Discovery

no code implementations7 Jul 2022 Chun-Han Yao, Wei-Chih Hung, Yuanzhen Li, Michael Rubinstein, Ming-Hsuan Yang, Varun Jampani

In this work, we propose a practical problem setting to estimate 3D pose and shape of animals given only a few (10-30) in-the-wild images of a particular animal species (say, horse).

FlowNAS: Neural Architecture Search for Optical Flow Estimation

1 code implementation4 Jul 2022 Zhiwei Lin, TingTing Liang, Taihong Xiao, Yongtao Wang, Zhi Tang, Ming-Hsuan Yang

To address this issue, we propose a neural architecture search method named FlowNAS to automatically find the better encoder architecture for flow estimation task.

Image Classification Neural Architecture Search +1

Unveiling The Mask of Position-Information Pattern Through the Mist of Image Features

no code implementations2 Jun 2022 Chieh Hubert Lin, Hsin-Ying Lee, Hung-Yu Tseng, Maneesh Singh, Ming-Hsuan Yang

Recent studies show that paddings in convolutional neural networks encode absolute position information which can negatively affect the model performance for certain tasks.

Position

Autoregressive 3D Shape Generation via Canonical Mapping

1 code implementation5 Apr 2022 An-Chieh Cheng, Xueting Li, Sifei Liu, Min Sun, Ming-Hsuan Yang

With the capacity of modeling long-range dependencies in sequential data, transformers have shown remarkable performances in a variety of generative tasks such as image, audio, and text generation.

3D Shape Generation Point Cloud Generation +1

Neural Rendering of Humans in Novel View and Pose from Monocular Video

no code implementations4 Apr 2022 Tiantian Wang, Nikolaos Sarafianos, Ming-Hsuan Yang, Tony Tung

We accomplish this by utilizing both the human pose that models the body shape as well as point clouds that partially cover the human as input.

Neural Rendering

Adaptive Transformers for Robust Few-shot Cross-domain Face Anti-spoofing

no code implementations23 Mar 2022 Hsin-Ping Huang, Deqing Sun, Yaojie Liu, Wen-Sheng Chu, Taihong Xiao, Jinwei Yuan, Hartwig Adam, Ming-Hsuan Yang

While recent face anti-spoofing methods perform well under the intra-domain setups, an effective approach needs to account for much larger appearance variations of images acquired in complex scenes with different sensors for robust performance.

Face Anti-Spoofing

V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer

2 code implementations20 Mar 2022 Runsheng Xu, Hao Xiang, Zhengzhong Tu, Xin Xia, Ming-Hsuan Yang, Jiaqi Ma

In this paper, we investigate the application of Vehicle-to-Everything (V2X) communication to improve the perception performance of autonomous vehicles.

3D Object Detection Autonomous Vehicles +1

Deep Image Deblurring: A Survey

no code implementations26 Jan 2022 Kaihao Zhang, Wenqi Ren, Wenhan Luo, Wei-Sheng Lai, Bjorn Stenger, Ming-Hsuan Yang, Hongdong Li

Image deblurring is a classic problem in low-level computer vision with the aim to recover a sharp image from a blurred input image.

Deblurring Image Deblurring

InOut: Diverse Image Outpainting via GAN Inversion

no code implementations CVPR 2022 Yen-Chi Cheng, Chieh Hubert Lin, Hsin-Ying Lee, Jian Ren, Sergey Tulyakov, Ming-Hsuan Yang

Existing image outpainting methods pose the problem as a conditional image-to-image translation task, often generating repetitive structures and textures by replicating the content available in the input image.

Image Outpainting Image-to-Image Translation

Towards a Unified Foundation Model: Jointly Pre-Training Transformers on Unpaired Images and Text

no code implementations14 Dec 2021 Qing Li, Boqing Gong, Yin Cui, Dan Kondratyuk, Xianzhi Du, Ming-Hsuan Yang, Matthew Brown

The experiments show that the resultant unified foundation transformer works surprisingly well on both the vision-only and text-only tasks, and the proposed knowledge distillation and gradient masking strategy can effectively lift the performance to approach the level of separately-trained models.

Image Classification Knowledge Distillation +1

An Informative Tracking Benchmark

1 code implementation13 Dec 2021 Xin Li, Qiao Liu, Wenjie Pei, Qiuhong Shen, YaoWei Wang, Huchuan Lu, Ming-Hsuan Yang

Along with the rapid progress of visual tracking, existing benchmarks become less informative due to redundancy of samples and weak discrimination between current trackers, making evaluations on all datasets extremely time-consuming.

Visual Tracking

CA-SSL: Class-Agnostic Semi-Supervised Learning for Detection and Segmentation

1 code implementation9 Dec 2021 Lu Qi, Jason Kuen, Zhe Lin, Jiuxiang Gu, Fengyun Rao, Dian Li, Weidong Guo, Zhen Wen, Ming-Hsuan Yang, Jiaya Jia

To improve instance-level detection/segmentation performance, existing self-supervised and semi-supervised methods extract either task-unrelated or task-specific training signals from unlabeled data.

object-detection Object Detection +2

Exploring Temporal Granularity in Self-Supervised Video Representation Learning

no code implementations8 Dec 2021 Rui Qian, Yeqing Li, Liangzhe Yuan, Boqing Gong, Ting Liu, Matthew Brown, Serge Belongie, Ming-Hsuan Yang, Hartwig Adam, Yin Cui

The training objective consists of two parts: a fine-grained temporal learning objective to maximize the similarity between corresponding temporal embeddings in the short clip and the long clip, and a persistent temporal learning objective to pull together global embeddings of the two clips.

Representation Learning Self-Supervised Learning

MC-Blur: A Comprehensive Benchmark for Image Deblurring

2 code implementations1 Dec 2021 Kaihao Zhang, Tao Wang, Wenhan Luo, Boheng Chen, Wenqi Ren, Bjorn Stenger, Wei Liu, Hongdong Li, Ming-Hsuan Yang

Blur artifacts can seriously degrade the visual quality of images, and numerous deblurring methods have been proposed for specific scenarios.

Benchmarking Deblurring +1

Video Frame Interpolation Transformer

1 code implementation CVPR 2022 Zhihao Shi, Xiangyu Xu, Xiaohong Liu, Jun Chen, Ming-Hsuan Yang

Existing methods for video interpolation heavily rely on deep convolution neural networks, and thus suffer from their intrinsic limitations, such as content-agnostic kernel weights and restricted receptive field.

Video Frame Interpolation

Learning Continuous Environment Fields via Implicit Functions

no code implementations ICLR 2022 Xueting Li, Shalini De Mello, Xiaolong Wang, Ming-Hsuan Yang, Jan Kautz, Sifei Liu

We propose a novel scene representation that encodes reaching distance -- the distance between any position in the scene to a goal along a feasible trajectory.

Position Trajectory Prediction

Learning Discriminative Shrinkage Deep Networks for Image Deconvolution

1 code implementation27 Nov 2021 Pin-Hung Kuo, Jinshan Pan, Shao-Yi Chien, Ming-Hsuan Yang

Most existing methods usually formulate the non-blind deconvolution problem into a maximum-a-posteriori framework and address it by manually designing kinds of regularization terms and data terms of the latent clear images.

Image Deconvolution Image Restoration

Hierarchical Modular Network for Video Captioning

1 code implementation CVPR 2022 Hanhua Ye, Guorong Li, Yuankai Qi, Shuhui Wang, Qingming Huang, Ming-Hsuan Yang

(II) Predicate level, which learns the actions conditioned on highlighted objects and is supervised by the predicate in captions.

Representation Learning Sentence +1

Correcting Face Distortion in Wide-Angle Videos

no code implementations18 Nov 2021 Wei-Sheng Lai, YiChang Shih, Chia-Kai Liang, Ming-Hsuan Yang

Video blogs and selfies are popular social media formats, which are often captured by wide-angle cameras to show human subjects and expanded background.

Restormer: Efficient Transformer for High-Resolution Image Restoration

11 code implementations CVPR 2022 Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang

Since convolutional neural networks (CNNs) perform well at learning generalizable image priors from large-scale data, these models have been extensively applied to image restoration and related tasks.

Color Image Denoising Deblurring +7

Semi-supervised Multi-task Learning for Semantics and Depth

no code implementations14 Oct 2021 Yufeng Wang, Yi-Hsuan Tsai, Wei-Chih Hung, Wenrui Ding, Shuo Liu, Ming-Hsuan Yang

Multi-Task Learning (MTL) aims to enhance the model generalization by sharing representations between related tasks for better performance.

Depth Estimation Multi-Task Learning +1

Burst Image Restoration and Enhancement

1 code implementation CVPR 2022 Akshay Dudhane, Syed Waqas Zamir, Salman Khan, Fahad Shahbaz Khan, Ming-Hsuan Yang

Our central idea is to create a set of pseudo-burst features that combine complementary information from all the input burst frames to seamlessly exchange information.

Burst Image Super-Resolution Denoising +3

Learning Contrastive Representation for Semantic Correspondence

no code implementations22 Sep 2021 Taihong Xiao, Sifei Liu, Shalini De Mello, Zhiding Yu, Jan Kautz, Ming-Hsuan Yang

Dense correspondence across semantically related images has been extensively studied, but still faces two challenges: 1) large variations in appearance, scale and pose exist even for objects from the same category, and 2) labeling pixel-level dense correspondences is labor intensive and infeasible to scale.

Contrastive Learning Semantic correspondence

Federated Multi-Target Domain Adaptation

no code implementations17 Aug 2021 Chun-Han Yao, Boqing Gong, Yin Cui, Hang Qi, Yukun Zhu, Ming-Hsuan Yang

We further take the server-client and inter-client domain shifts into account and pose a domain adaptation problem with one source (centralized server data) and multiple targets (distributed client data).

Domain Adaptation Federated Learning +3

Discovering 3D Parts from Image Collections

no code implementations ICCV 2021 Chun-Han Yao, Wei-Chih Hung, Varun Jampani, Ming-Hsuan Yang

Reasoning 3D shapes from 2D images is an essential yet challenging task, especially when only single-view images are at our disposal.

Object

End-to-end Multi-modal Video Temporal Grounding

1 code implementation NeurIPS 2021 Yi-Wen Chen, Yi-Hsuan Tsai, Ming-Hsuan Yang

Specifically, we adopt RGB images for appearance, optical flow for motion, and depth maps for image structure.

Optical Flow Estimation Self-Supervised Learning

Learning 3D Dense Correspondence via Canonical Point Autoencoder

no code implementations NeurIPS 2021 An-Chieh Cheng, Xueting Li, Min Sun, Ming-Hsuan Yang, Sifei Liu

We propose a canonical point autoencoder (CPAE) that predicts dense correspondences between 3D shapes of the same category.

Segmentation

Self-Supervised Tracking via Target-Aware Data Synthesis

no code implementations21 Jun 2021 Xin Li, Wenjie Pei, YaoWei Wang, Zhenyu He, Huchuan Lu, Ming-Hsuan Yang

While deep-learning based tracking methods have achieved substantial progress, they entail large-scale and high-quality annotated data for sufficient training.

Representation Learning Self-Supervised Learning +1

Incremental False Negative Detection for Contrastive Learning

no code implementations ICLR 2022 Tsai-Shien Chen, Wei-Chih Hung, Hung-Yu Tseng, Shao-Yi Chien, Ming-Hsuan Yang

Self-supervised learning has recently shown great potential in vision tasks through contrastive learning, which aims to discriminate each image, or instance, in the dataset.

Contrastive Learning Self-Supervised Learning

Large-scale Unsupervised Semantic Segmentation

3 code implementations6 Jun 2021 ShangHua Gao, Zhong-Yu Li, Ming-Hsuan Yang, Ming-Ming Cheng, Junwei Han, Philip Torr

In this work, we propose a new problem of large-scale unsupervised semantic segmentation (LUSS) with a newly created benchmark dataset to help the research progress.

Representation Learning Segmentation +1

Learning to Stylize Novel Views

1 code implementation ICCV 2021 Hsin-Ping Huang, Hung-Yu Tseng, Saurabh Saini, Maneesh Singh, Ming-Hsuan Yang

Second, we develop point cloud aggregation modules to gather the style information of the 3D scene, and then modulate the features in the point cloud with a linear transformation matrix.

Novel View Synthesis

Intriguing Properties of Vision Transformers

1 code implementation NeurIPS 2021 Muzammal Naseer, Kanchana Ranasinghe, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang

We show and analyze the following intriguing properties of ViT: (a) Transformers are highly robust to severe occlusions, perturbations and domain shifts, e. g., retain as high as 60% top-1 accuracy on ImageNet even after randomly occluding 80% of the image content.

Few-Shot Learning Semantic Segmentation

COMISR: Compression-Informed Video Super-Resolution

2 code implementations ICCV 2021 Yinxiao Li, Pengchong Jin, Feng Yang, Ce Liu, Ming-Hsuan Yang, Peyman Milanfar

Most video super-resolution methods focus on restoring high-resolution video frames from low-resolution videos without taking into account compression.

Video Super-Resolution

Decoupled Dynamic Filter Networks

1 code implementation CVPR 2021 Jingkai Zhou, Varun Jampani, Zhixiong Pi, Qiong Liu, Ming-Hsuan Yang

Inspired by recent advances in attention, DDF decouples a depth-wise dynamic filter into spatial and channel dynamic filters.

Image Classification Semantic Segmentation

2.5D Visual Relationship Detection

1 code implementation26 Apr 2021 Yu-Chuan Su, Soravit Changpinyo, Xiangning Chen, Sathish Thoppay, Cho-Jui Hsieh, Lior Shapira, Radu Soricut, Hartwig Adam, Matthew Brown, Ming-Hsuan Yang, Boqing Gong

To enable progress on this task, we create a new dataset consisting of 220k human-annotated 2. 5D relationships among 512K objects from 11K images.

Benchmarking Depth Estimation +2

Understanding Synonymous Referring Expressions via Contrastive Features

1 code implementation20 Apr 2021 Yi-Wen Chen, Yi-Hsuan Tsai, Ming-Hsuan Yang

While prior work usually treats each sentence and attends it to an object separately, we focus on learning a referring expression comprehension model that considers the property in synonymous sentences.

Object Referring Expression +3

Weakly Supervised Object Localization and Detection: A Survey

no code implementations16 Apr 2021 Dingwen Zhang, Junwei Han, Gong Cheng, Ming-Hsuan Yang

As an emerging and challenging problem in the computer vision community, weakly supervised object localization and detection plays an important role for developing new generation computer vision systems and has received significant attention in the past decade.

Object Weakly-Supervised Object Localization

The Road to Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation

1 code implementation ICCV 2021 Yuankai Qi, Zizheng Pan, Yicong Hong, Ming-Hsuan Yang, Anton Van Den Hengel, Qi Wu

Vision-and-Language Navigation (VLN) requires an agent to find a path to a remote location on the basis of natural-language instructions and a set of photo-realistic panoramas.

Vision and Language Navigation Vision-Language Navigation

In&Out : Diverse Image Outpainting via GAN Inversion

no code implementations1 Apr 2021 Yen-Chi Cheng, Chieh Hubert Lin, Hsin-Ying Lee, Jian Ren, Sergey Tulyakov, Ming-Hsuan Yang

Existing image outpainting methods pose the problem as a conditional image-to-image translation task, often generating repetitive structures and textures by replicating the content available in the input image.

Image Outpainting Image-to-Image Translation +1

ReMix: Towards Image-to-Image Translation with Limited Data

1 code implementation CVPR 2021 Jie Cao, Luanxuan Hou, Ming-Hsuan Yang, Ran He, Zhenan Sun

We interpolate training samples at the feature level and propose a novel content loss based on the perceptual relations among samples.

Data Augmentation Image-to-Image Translation +1

Hybrid Neural Fusion for Full-frame Video Stabilization

2 code implementations ICCV 2021 Yu-Lun Liu, Wei-Sheng Lai, Ming-Hsuan Yang, Yung-Yu Chuang, Jia-Bin Huang

Existing video stabilization methods often generate visible distortion or require aggressive cropping of frame boundaries, resulting in smaller field of views.

Video Stabilization

Multi-Stage Progressive Image Restoration

7 code implementations CVPR 2021 Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, Ling Shao

At each stage, we introduce a novel per-pixel adaptive design that leverages in-situ supervised attention to reweight the local features.

Deblurring Image Deblurring +3

Exploiting Raw Images for Real-Scene Super-Resolution

1 code implementation2 Feb 2021 Xiangyu Xu, Yongrui Ma, Wenxiu Sun, Ming-Hsuan Yang

In this paper, we study the problem of real-scene single image super-resolution to bridge the gap between synthetic data and real captured images.

Image Restoration Image Super-Resolution

Learning Spatial and Spatio-Temporal Pixel Aggregations for Image and Video Denoising

3 code implementations26 Jan 2021 Xiangyu Xu, Muchen Li, Wenxiu Sun, Ming-Hsuan Yang

We present a spatial pixel aggregation network and learn the pixel sampling and averaging strategies for image denoising.

Image Denoising Video Denoising

GAN Inversion: A Survey

1 code implementation14 Jan 2021 Weihao Xia, Yulun Zhang, Yujiu Yang, Jing-Hao Xue, Bolei Zhou, Ming-Hsuan Yang

GAN inversion aims to invert a given image back into the latent space of a pretrained GAN model, for the image to be faithfully reconstructed from the inverted code by the generator.

Image Manipulation Image Restoration

Low Light Image Enhancement via Global and Local Context Modeling

no code implementations4 Jan 2021 Aditya Arora, Muhammad Haris, Syed Waqas Zamir, Munawar Hayat, Fahad Shahbaz Khan, Ling Shao, Ming-Hsuan Yang

These contexts can be crucial towards inferring several image enhancement tasks, e. g., local and global contrast, brightness and color corrections; which requires cues from both local and global spatial extent.

Low-Light Image Enhancement

Video Matting via Consistency-Regularized Graph Neural Networks

no code implementations ICCV 2021 Tiantian Wang, Sifei Liu, Yapeng Tian, Kai Li, Ming-Hsuan Yang

In this paper, we propose to enhance the temporal coherence by Consistency-Regularized Graph Neural Networks (CRGNN) with the aid of a synthesized video matting dataset.

Image Matting Optical Flow Estimation +1

Online Adaptation for Consistent Mesh Reconstruction in the Wild

no code implementations NeurIPS 2020 Xueting Li, Sifei Liu, Shalini De Mello, Kihwan Kim, Xiaolong Wang, Ming-Hsuan Yang, Jan Kautz

This paper presents an algorithm to reconstruct temporally consistent 3D meshes of deformable object instances from videos in the wild.

3D Reconstruction

Unsupervised Discovery of Disentangled Manifolds in GANs

1 code implementation24 Nov 2020 Yu-Ding Lu, Hsin-Ying Lee, Hung-Yu Tseng, Ming-Hsuan Yang

Interpretable generation process is beneficial to various image editing applications.

Attribute

Continuous and Diverse Image-to-Image Translation via Signed Attribute Vectors

1 code implementation2 Nov 2020 Qi Mao, Hung-Yu Tseng, Hsin-Ying Lee, Jia-Bin Huang, Siwei Ma, Ming-Hsuan Yang

Generating a smooth sequence of intermediate results bridges the gap of two different domains, facilitating the morphing effect across domains.

Attribute Image-to-Image Translation +1

Unsupervised Domain Adaptation for Spatio-Temporal Action Localization

no code implementations19 Oct 2020 Nakul Agarwal, Yi-Ting Chen, Behzad Dariush, Ming-Hsuan Yang

Spatio-temporal action localization is an important problem in computer vision that involves detecting where and when activities occur, and therefore requires modeling of both spatial and temporal features.

object-detection Object Detection +3

Multi-path Neural Networks for On-device Multi-domain Visual Classification

no code implementations10 Oct 2020 Qifei Wang, Junjie Ke, Joshua Greaves, Grace Chu, Gabriel Bender, Luciano Sbaiz, Alec Go, Andrew Howard, Feng Yang, Ming-Hsuan Yang, Jeff Gilbert, Peyman Milanfar

This approach effectively reduces the total number of parameters and FLOPS, encouraging positive knowledge transfer while mitigating negative interference across domains.

General Classification Neural Architecture Search +1

Every Pixel Matters: Center-aware Feature Alignment for Domain Adaptive Object Detector

1 code implementation ECCV 2020 Cheng-Chun Hsu, Yi-Hsuan Tsai, Yen-Yu Lin, Ming-Hsuan Yang

A domain adaptive object detector aims to adapt itself to unseen domains that may contain variations of object appearance, viewpoints or backgrounds.

Domain Adaptation

Learning to Caricature via Semantic Shape Transform

1 code implementation12 Aug 2020 Wenqing Chu, Wei-Chih Hung, Yi-Hsuan Tsai, Yu-Ting Chang, Yijun Li, Deng Cai, Ming-Hsuan Yang

Caricature is an artistic drawing created to abstract or exaggerate facial features of a person.

Caricature

Learning to See Through Obstructions with Layered Decomposition

1 code implementation11 Aug 2020 Yu-Lun Liu, Wei-Sheng Lai, Ming-Hsuan Yang, Yung-Yu Chuang, Jia-Bin Huang

We present a learning-based approach for removing unwanted obstructions, such as window reflections, fence occlusions, or adherent raindrops, from a short sequence of images captured by a moving camera.

Optical Flow Estimation

Spatiotemporal Contrastive Video Representation Learning

4 code implementations CVPR 2021 Rui Qian, Tianjian Meng, Boqing Gong, Ming-Hsuan Yang, Huisheng Wang, Serge Belongie, Yin Cui

Our representations are learned using a contrastive loss, where two augmented clips from the same short video are pulled together in the embedding space, while clips from different videos are pushed away.

Contrastive Learning Data Augmentation +4

Learnable Cost Volume Using the Cayley Representation

1 code implementation ECCV 2020 Taihong Xiao, Jinwei Yuan, Deqing Sun, Qifei Wang, Xin-Yu Zhang, Kehan Xu, Ming-Hsuan Yang

Cost volume is an essential component of recent deep models for optical flow estimation and is usually constructed by calculating the inner product between two feature vectors.

Optical Flow Estimation

Controllable Image Synthesis via SegVAE

no code implementations ECCV 2020 Yen-Chi Cheng, Hsin-Ying Lee, Min Sun, Ming-Hsuan Yang

We also apply an off-the-shelf image-to-image translation model to generate realistic RGB images to better understand the quality of the synthesized semantic maps.

Conditional Image Generation Image-to-Image Translation +2

RetrieveGAN: Image Synthesis via Differentiable Patch Retrieval

no code implementations ECCV 2020 Hung-Yu Tseng, Hsin-Ying Lee, Lu Jiang, Ming-Hsuan Yang, Weilong Yang

Image generation from scene description is a cornerstone technique for the controlled generation, which is beneficial to applications such as content creation and image editing.

Image Generation Retrieval

Modeling Artistic Workflows for Image Generation and Editing

1 code implementation ECCV 2020 Hung-Yu Tseng, Matthew Fisher, Jingwan Lu, Yijun Li, Vladimir Kim, Ming-Hsuan Yang

People often create art by following an artistic workflow involving multiple stages that inform the overall design.

Image Generation

Semi-Supervised Learning with Meta-Gradient

1 code implementation8 Jul 2020 Xin-Yu Zhang, Taihong Xiao, HaoLin Jia, Ming-Ming Cheng, Ming-Hsuan Yang

In this work, we propose a simple yet effective meta-learning algorithm in semi-supervised learning.

Meta-Learning Pseudo Label

Ventral-Dorsal Neural Networks: Object Detection via Selective Attention

no code implementations15 May 2020 Mohammad K. Ebrahimpour, Jiayun Li, Yen-Yun Yu, Jackson L. Reese, Azadeh Moghtaderi, Ming-Hsuan Yang, David C. Noelle

The coarse functional distinction between these streams is between object recognition -- the "what" of the signal -- and extracting location related information -- the "where" of the signal.

Image Classification Object +3

WW-Nets: Dual Neural Networks for Object Detection

no code implementations15 May 2020 Mohammad K. Ebrahimpour, J. Ben Falandays, Samuel Spevack, Ming-Hsuan Yang, David C. Noelle

Inspired by this structure, we have proposed an object detection framework involving the integration of a "What Network" and a "Where Network".

Object object-detection +1

Generalized Convolutional Forest Networks for Domain Generalization and Visual Recognition

no code implementations ICLR 2020 Jongbin Ryu, Gitaek Kwon, Ming-Hsuan Yang, Jongwoo Lim

When constructing random forests, it is of prime importance to ensure high accuracy and low correlation of individual tree classifiers for good performance.

Domain Generalization Image Classification

Multi-Scale Boosted Dehazing Network with Dense Feature Fusion

1 code implementation CVPR 2020 Hang Dong, Jinshan Pan, Lei Xiang, Zhe Hu, Xinyi Zhang, Fei Wang, Ming-Hsuan Yang

To address the issue of preserving spatial information in the U-Net architecture, we design a dense feature fusion module using the back-projection feedback scheme.

Image Dehazing

Regularizing Meta-Learning via Gradient Dropout

1 code implementation13 Apr 2020 Hung-Yu Tseng, Yi-Wen Chen, Yi-Hsuan Tsai, Sifei Liu, Yen-Yu Lin, Ming-Hsuan Yang

With the growing attention on learning-to-learn new tasks using only a few examples, meta-learning has been widely used in numerous problems such as few-shot classification, reinforcement learning, and domain generalization.

Domain Generalization Meta-Learning

Learning to See Through Obstructions

1 code implementation CVPR 2020 Yu-Lun Liu, Wei-Sheng Lai, Ming-Hsuan Yang, Yung-Yu Chuang, Jia-Bin Huang

We present a learning-based approach for removing unwanted obstructions, such as window reflections, fence occlusions or raindrops, from a short sequence of images captured by a moving camera.

Optical Flow Estimation Reflection Removal

Deep Semantic Matching with Foreground Detection and Cycle-Consistency

no code implementations31 Mar 2020 Yun-Chun Chen, Po-Hsiang Huang, Li-Yu Yu, Jia-Bin Huang, Ming-Hsuan Yang, Yen-Yu Lin

Establishing dense semantic correspondences between object instances remains a challenging problem due to background clutter, significant scale and pose differences, and large intra-class variations.

Rethinking Class-Balanced Methods for Long-Tailed Visual Recognition from a Domain Adaptation Perspective

1 code implementation CVPR 2020 Muhammad Abdullah Jamal, Matthew Brown, Ming-Hsuan Yang, Liqiang Wang, Boqing Gong

Object frequency in the real world often follows a power law, leading to a mismatch between datasets with long-tailed class distributions seen by a machine learning model and our expectation of the model to perform well on all classes.

Domain Adaptation Long-tail Learning +1

Collaborative Distillation for Ultra-Resolution Universal Style Transfer

1 code implementation CVPR 2020 Huan Wang, Yijun Li, Yuehai Wang, Haoji Hu, Ming-Hsuan Yang

In this work, we present a new knowledge distillation method (named Collaborative Distillation) for encoder-decoder based neural style transfer to reduce the convolutional filters.

Knowledge Distillation Style Transfer

CycleISP: Real Image Restoration via Improved Data Synthesis

8 code implementations CVPR 2020 Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, Ling Shao

This is mainly because the AWGN is not adequate for modeling the real camera noise which is signal-dependent and heavily transformed by the camera imaging pipeline.

Ranked #9 on Image Denoising on DND (using extra training data)

Image Denoising Image Restoration

Learning Enriched Features for Real Image Restoration and Enhancement

12 code implementations ECCV 2020 Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, Ling Shao

With the goal of recovering high-quality image content from its degraded version, image restoration enjoys numerous applications, such as in surveillance, computational photography, medical imaging, and remote sensing.

Image Denoising Image Enhancement +2

Self-supervised Single-view 3D Reconstruction via Semantic Consistency

1 code implementation ECCV 2020 Xueting Li, Sifei Liu, Kihwan Kim, Shalini De Mello, Varun Jampani, Ming-Hsuan Yang, Jan Kautz

To the best of our knowledge, we are the first to try and solve the single-view reconstruction problem without a category-specific template mesh or semantic keypoints.

3D Reconstruction Object +1

Gated Fusion Network for Degraded Image Super Resolution

1 code implementation2 Mar 2020 Xinyi Zhang, Hang Dong, Zhe Hu, Wei-Sheng Lai, Fei Wang, Ming-Hsuan Yang

To address this problem, we propose a dual-branch convolutional neural network to extract base features and recovered features separately.

Image Super-Resolution

Weakly-Supervised Semantic Segmentation by Iterative Affinity Learning

no code implementations19 Feb 2020 Xiang Wang, Sifei Liu, Huimin Ma, Ming-Hsuan Yang

In this paper, we propose an iterative algorithm to learn such pairwise relations, which consists of two branches, a unary segmentation network which learns the label probabilities for each pixel, and a pairwise affinity network which learns affinity matrix and refines the probability map generated from the unary network.

Segmentation Weakly supervised Semantic Segmentation +1

Structured Sparsification with Joint Optimization of Group Convolution and Channel Shuffle

1 code implementation19 Feb 2020 Xin-Yu Zhang, Kai Zhao, Taihong Xiao, Ming-Ming Cheng, Ming-Hsuan Yang

Recent advances in convolutional neural networks(CNNs) usually come with the expense of excessive computational overhead and memory footprint.

Network Pruning

Exploiting Semantics for Face Image Deblurring

no code implementations19 Jan 2020 Ziyi Shen, Wei-Sheng Lai, Tingfa Xu, Jan Kautz, Ming-Hsuan Yang

Specifically, we first use a coarse deblurring network to reduce the motion blur on the input face image.

Deblurring Face Recognition +1

Visual Question Answering on 360° Images

no code implementations10 Jan 2020 Shih-Han Chou, Wei-Lun Chao, Wei-Sheng Lai, Min Sun, Ming-Hsuan Yang

We then study two different VQA models on VQA 360, including one conventional model that takes an equirectangular image (with intrinsic distortion) as input and one dedicated model that first projects a 360 image onto cubemaps and subsequently aggregates the information from multiple spatial resolutions.

Question Answering Visual Question Answering

CrDoCo: Pixel-level Domain Transfer with Cross-Domain Consistency

no code implementations CVPR 2019 Yun-Chun Chen, Yen-Yu Lin, Ming-Hsuan Yang, Jia-Bin Huang

Unsupervised domain adaptation algorithms aim to transfer the knowledge learned from one domain to another (e. g., synthetic to real images).

Data Augmentation Image-to-Image Translation +3

RC-DARTS: Resource Constrained Differentiable Architecture Search

no code implementations30 Dec 2019 Xiaojie Jin, Jiang Wang, Joshua Slocum, Ming-Hsuan Yang, Shengyang Dai, Shuicheng Yan, Jiashi Feng

In this paper, we propose the resource constrained differentiable architecture search (RC-DARTS) method to learn architectures that are significantly smaller and faster while achieving comparable accuracy.

Image Classification One-Shot Learning

Controllable and Progressive Image Extrapolation

no code implementations25 Dec 2019 Yijun Li, Lu Jiang, Ming-Hsuan Yang

Image extrapolation aims at expanding the narrow field of view of a given image patch.

Adversarial Learning of Privacy-Preserving and Task-Oriented Representations

no code implementations22 Nov 2019 Taihong Xiao, Yi-Hsuan Tsai, Kihyuk Sohn, Manmohan Chandraker, Ming-Hsuan Yang

For instance, there could be a potential privacy risk of machine learning systems via the model inversion attack, whose goal is to reconstruct the input data from the latent representation of deep networks.

Attribute BIG-bench Machine Learning +2

Dancing to Music

2 code implementations NeurIPS 2019 Hsin-Ying Lee, Xiaodong Yang, Ming-Yu Liu, Ting-Chun Wang, Yu-Ding Lu, Ming-Hsuan Yang, Jan Kautz

In the analysis phase, we decompose a dance into a series of basic dance units, through which the model learns how to move.

Motion Synthesis Pose Estimation

Quadratic video interpolation

1 code implementation NeurIPS 2019 Xiangyu Xu, Li Si-Yao, Wenxiu Sun, Qian Yin, Ming-Hsuan Yang

Video interpolation is an important problem in computer vision, which helps overcome the temporal limitation of camera sensors.

Referring Expression Object Segmentation with Caption-Aware Consistency

1 code implementation10 Oct 2019 Yi-Wen Chen, Yi-Hsuan Tsai, Tiantian Wang, Yen-Yu Lin, Ming-Hsuan Yang

To this end, we propose an end-to-end trainable comprehension network that consists of the language and visual encoders to extract feature representations from both domains.

Object Referring Expression +3

Joint-task Self-supervised Learning for Temporal Correspondence

2 code implementations NeurIPS 2019 Xueting Li, Sifei Liu, Shalini De Mello, Xiaolong Wang, Jan Kautz, Ming-Hsuan Yang

Our learning process integrates two highly related tasks: tracking large image regions \emph{and} establishing fine-grained pixel-level associations between consecutive video frames.

Object Tracking Self-Supervised Learning +2

Video Stitching for Linear Camera Arrays

no code implementations31 Jul 2019 Wei-Sheng Lai, Orazio Gallo, Jinwei Gu, Deqing Sun, Ming-Hsuan Yang, Jan Kautz

Despite the long history of image and video stitching research, existing academic and commercial solutions still produce strong artifacts.

Autonomous Driving Spatial Interpolation

Show, Match and Segment: Joint Weakly Supervised Learning of Semantic Matching and Object Co-segmentation

1 code implementation13 Jun 2019 Yun-Chun Chen, Yen-Yu Lin, Ming-Hsuan Yang, Jia-Bin Huang

In contrast to existing algorithms that tackle the tasks of semantic matching and object co-segmentation in isolation, our method exploits the complementary nature of the two tasks.

Object Segmentation +1

Cannot find the paper you are looking for? You can Submit a new open access paper.