Search Results for author: Ming-Hsuan Yang

Found 353 papers, 176 papers with code

Contextualized Spatio-Temporal Contrastive Learning with Self-Supervision

1 code implementation • CVPR 2022 • Liangzhe Yuan, Rui Qian, Yin Cui, Boqing Gong, Florian Schroff, Ming-Hsuan Yang, Hartwig Adam, Ting Liu

Modern self-supervised learning algorithms typically enforce persistency of instance representations across views.

Action Recognition Contrastive Learning +4

76,633

Paper
Code

VideoGLUE: Video General Understanding Evaluation of Foundation Models

1 code implementation • 6 Jul 2023 • Liangzhe Yuan, Nitesh Bharadwaj Gundavarapu, Long Zhao, Hao Zhou, Yin Cui, Lu Jiang, Xuan Yang, Menglin Jia, Tobias Weyand, Luke Friedman, Mikhail Sirotenko, Huisheng Wang, Florian Schroff, Hartwig Adam, Ming-Hsuan Yang, Ting Liu, Boqing Gong

We evaluate existing foundation models video understanding capabilities using a carefully designed experiment protocol consisting of three hallmark tasks (action recognition, temporal localization, and spatiotemporal localization), eight datasets well received by the community, and four adaptation methods tailoring a foundation model (FM) for a downstream task.

Action Recognition Temporal Localization +1

76,633

Paper
Code

Spatiotemporal Contrastive Video Representation Learning

4 code implementations • CVPR 2021 • Rui Qian, Tianjian Meng, Boqing Gong, Ming-Hsuan Yang, Huisheng Wang, Serge Belongie, Yin Cui

Our representations are learned using a contrastive loss, where two augmented clips from the same short video are pulled together in the embedding space, while clips from different videos are pushed away.

Ranked #1 on Self-Supervised Action Recognition on Kinetics-600

Contrastive Learning Data Augmentation +4

76,632

Paper
Code

COMISR: Compression-Informed Video Super-Resolution

2 code implementations • ICCV 2021 • Yinxiao Li, Pengchong Jin, Feng Yang, Ce Liu, Ming-Hsuan Yang, Peyman Milanfar

Most video super-resolution methods focus on restoring high-resolution video frames from low-resolution videos without taking into account compression.

Ranked #6 on Video Super-Resolution on MSU Super-Resolution for Video Compression

Video Super-Resolution

32,943

Paper
Code

Video Timeline Modeling For News Story Understanding

1 code implementation • NeurIPS 2023 • Meng Liu, Mingda Zhang, Jialu Liu, Hanjun Dai, Ming-Hsuan Yang, Shuiwang Ji, Zheyun Feng, Boqing Gong

In this paper, we present a novel problem, namely video timeline modeling.

32,932

Paper
Code

Res2Net: A New Multi-scale Backbone Architecture

32 code implementations • 2 Apr 2019 • Shang-Hua Gao, Ming-Ming Cheng, Kai Zhao, Xin-Yu Zhang, Ming-Hsuan Yang, Philip Torr

We evaluate the Res2Net block on all these models and demonstrate consistent performance gains over baseline models on widely-used datasets, e. g., CIFAR-100 and ImageNet.

Ranked #2 on Image Classification on GasHisSDB

Image Classification Instance Segmentation +4

29,908

Paper
Code

A Closed-form Solution to Photorealistic Image Stylization

12 code implementations • ECCV 2018 • Yijun Li, Ming-Yu Liu, Xueting Li, Ming-Hsuan Yang, Jan Kautz

Photorealistic image stylization concerns transferring style of a reference photo to a content photo with the constraint that the stylized photo should remain photorealistic.

Image Stylization

11,098

Paper
Code

Depth-Aware Video Frame Interpolation

5 code implementations • CVPR 2019 • Wenbo Bao, Wei-Sheng Lai, Chao Ma, Xiaoyun Zhang, Zhiyong Gao, Ming-Hsuan Yang

The proposed model then warps the input frames, depth maps, and contextual features based on the optical flow and local interpolation kernels for synthesizing the output frame.

Ranked #5 on Video Frame Interpolation on Middlebury

Optical Flow Estimation Video Frame Interpolation

8,127

Paper
Code

Beyond SOT: Tracking Multiple Generic Objects at Once

1 code implementation • 22 Dec 2022 • Christoph Mayer, Martin Danelljan, Ming-Hsuan Yang, Vittorio Ferrari, Luc van Gool, Alina Kuznetsova

Our approach achieves a 4x faster run-time in case of 10 concurrent objects compared to tracking each object independently and outperforms existing single object trackers on our new benchmark.

Attribute Object +1

3,097

Paper
Code

Unified Visual Relationship Detection with Vision and Language Models

1 code implementation • ICCV 2023 • Long Zhao, Liangzhe Yuan, Boqing Gong, Yin Cui, Florian Schroff, Ming-Hsuan Yang, Hartwig Adam, Ting Liu

To address this challenge, we propose UniVRD, a novel bottom-up method for Unified Visual Relationship Detection by leveraging vision and language models (VLMs).

Human-Object Interaction Detection Relationship Detection +2

3,029

Paper
Code

Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation

5 code implementations • CVPR 2018 • Huaizu Jiang, Deqing Sun, Varun Jampani, Ming-Hsuan Yang, Erik Learned-Miller, Jan Kautz

Finally, the two input images are warped and linearly fused to form each intermediate frame.

Ranked #3 on Video Frame Interpolation on MSU Video Frame Interpolation

Optical Flow Estimation Video Frame Interpolation +1

2,968

Paper
Code

Diffusion Models: A Comprehensive Survey of Methods and Applications

2 code implementations • 2 Sep 2022 • Ling Yang, Zhilong Zhang, Yang song, Shenda Hong, Runsheng Xu, Yue Zhao, Yingxia Shao, Wentao Zhang, Bin Cui, Ming-Hsuan Yang

This survey aims to provide a contextualized, in-depth look at the state of diffusion models, identifying the key areas of focus and pointing to potential areas for further exploration.

Image Super-Resolution Text-to-Image Generation +1

2,694

Paper
Code

A Fusion Approach for Multi-Frame Optical Flow Estimation

2 code implementations • 23 Oct 2018 • Zhile Ren, Orazio Gallo, Deqing Sun, Ming-Hsuan Yang, Erik B. Sudderth, Jan Kautz

To date, top-performing optical flow estimation methods only take pairs of consecutive frames into account.

Optical Flow Estimation

1,590

Paper
Code

Learning Enriched Features for Real Image Restoration and Enhancement

12 code implementations • ECCV 2020 • Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, Ling Shao

With the goal of recovering high-quality image content from its degraded version, image restoration enjoys numerous applications, such as in surveillance, computational photography, medical imaging, and remote sensing.

Ranked #5 on Spectral Reconstruction on ARAD-1K

Image Denoising Image Enhancement +2

1,557

Paper
Code

CycleISP: Real Image Restoration via Improved Data Synthesis

8 code implementations • CVPR 2020 • Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, Ling Shao

This is mainly because the AWGN is not adequate for modeling the real camera noise which is signal-dependent and heavily transformed by the camera imaging pipeline.

Ranked #10 on Image Denoising on DND (using extra training data)

Image Denoising Image Restoration

1,557

Paper
Code

Multi-Stage Progressive Image Restoration

8 code implementations • CVPR 2021 • Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, Ling Shao

At each stage, we introduce a novel per-pixel adaptive design that leverages in-situ supervised attention to reweight the local features.

Ranked #3 on Spectral Reconstruction on ARAD-1K

Deblurring Decoder +4

1,557

Paper
Code

Restormer: Efficient Transformer for High-Resolution Image Restoration

11 code implementations • CVPR 2022 • Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang

Since convolutional neural networks (CNNs) perform well at learning generalizable image priors from large-scale data, these models have been extensively applied to image restoration and related tasks.

Ranked #1 on Grayscale Image Denoising on Urban100 sigma15

Color Image Denoising Deblurring +7

1,557

Paper
Code

GAN Inversion: A Survey

1 code implementation • 14 Jan 2021 • Weihao Xia, Yulun Zhang, Yujiu Yang, Jing-Hao Xue, Bolei Zhou, Ming-Hsuan Yang

GAN inversion aims to invert a given image back into the latent space of a pretrained GAN model, for the image to be faithfully reconstructed from the inverted code by the generator.

Image Manipulation Image Restoration

1,088

Paper
Code

Learning to See Through Obstructions

1 code implementation • CVPR 2020 • Yu-Lun Liu, Wei-Sheng Lai, Ming-Hsuan Yang, Yung-Yu Chuang, Jia-Bin Huang

We present a learning-based approach for removing unwanted obstructions, such as window reflections, fence occlusions or raindrops, from a short sequence of images captured by a moving camera.

Optical Flow Estimation Reflection Removal

997

Paper
Code

MAGVIT: Masked Generative Video Transformer

1 code implementation • CVPR 2023 • Lijun Yu, Yong Cheng, Kihyuk Sohn, José Lezama, Han Zhang, Huiwen Chang, Alexander G. Hauptmann, Ming-Hsuan Yang, Yuan Hao, Irfan Essa, Lu Jiang

We introduce the MAsked Generative VIdeo Transformer, MAGVIT, to tackle various video synthesis tasks with a single model.

Ranked #1 on Video Prediction on Something-Something V2

Multi-Task Learning Text-to-Video Generation +2

848

Paper
Code

Learning to Adapt Structured Output Space for Semantic Segmentation

12 code implementations • CVPR 2018 • Yi-Hsuan Tsai, Wei-Chih Hung, Samuel Schulter, Kihyuk Sohn, Ming-Hsuan Yang, Manmohan Chandraker

In this paper, we propose an adversarial learning method for domain adaptation in the context of semantic segmentation.

Ranked #3 on Domain Adaptation on Synscapes-to-Cityscapes

Domain Adaptation Segmentation +2

839

Paper
Code

Adversarial Learning for Semi-Supervised Semantic Segmentation

13 code implementations • ICLR 2018 • Wei-Chih Hung, Yi-Hsuan Tsai, Yan-Ting Liou, Yen-Yu Lin, Ming-Hsuan Yang

We propose a method for semi-supervised semantic segmentation using an adversarial network.

Ranked #12 on Semi-Supervised Semantic Segmentation on Pascal VOC 2012 2% labeled

Segmentation Semi-Supervised Semantic Segmentation

839

Paper
Code

Diverse Image-to-Image Translation via Disentangled Representations

7 code implementations • ECCV 2018 • Hsin-Ying Lee, Hung-Yu Tseng, Jia-Bin Huang, Maneesh Kumar Singh, Ming-Hsuan Yang

Our model takes the encoded content features extracted from a given input and the attribute vectors sampled from the attribute space to produce diverse outputs at test time.

Ranked #4 on Multimodal Unsupervised Image-To-Image Translation on CelebA-HQ

Attribute Domain Adaptation +4

832

Paper
Code

DRIT++: Diverse Image-to-Image Translation via Disentangled Representations

4 code implementations • 2 May 2019 • Hsin-Ying Lee, Hung-Yu Tseng, Qi Mao, Jia-Bin Huang, Yu-Ding Lu, Maneesh Singh, Ming-Hsuan Yang

In this work, we present an approach based on disentangled representation for generating diverse outputs without paired training images.

Attribute Image-to-Image Translation +2

832

Paper
Code

Muse: Text-To-Image Generation via Masked Generative Transformers

4 code implementations • 2 Jan 2023 • Huiwen Chang, Han Zhang, Jarred Barber, AJ Maschinot, Jose Lezama, Lu Jiang, Ming-Hsuan Yang, Kevin Murphy, William T. Freeman, Michael Rubinstein, Yuanzhen Li, Dilip Krishnan

Compared to pixel-space diffusion models, such as Imagen and DALL-E 2, Muse is significantly more efficient due to the use of discrete tokens and requiring fewer sampling iterations; compared to autoregressive models, such as Parti, Muse is more efficient due to the use of parallel decoding.

Ranked #1 on Text-to-Image Generation on MS-COCO (FID metric)

Language Modelling Large Language Model +1

818

Paper
Code

CA-SSL: Class-Agnostic Semi-Supervised Learning for Detection and Segmentation

1 code implementation • 9 Dec 2021 • Lu Qi, Jason Kuen, Zhe Lin, Jiuxiang Gu, Fengyun Rao, Dian Li, Weidong Guo, Zhen Wen, Ming-Hsuan Yang, Jiaya Jia

To improve instance-level detection/segmentation performance, existing self-supervised and semi-supervised methods extract either task-unrelated or task-specific training signals from unlabeled data.

object-detection Object Detection +2

669

Paper
Code

Automatically Discovering Novel Visual Categories with Self-supervised Prototype Learning

1 code implementation • 1 Aug 2022 • Lu Zhang, Lu Qi, Xu Yang, Hong Qiao, Ming-Hsuan Yang, Zhiyong Liu

In the first stage, we obtain a robust feature extractor, which could serve for all images with base and novel categories.

Representation Learning Self-Supervised Learning

669

Paper
Code

AIMS: All-Inclusive Multi-Level Segmentation

1 code implementation • 28 May 2023 • Lu Qi, Jason Kuen, Weidong Guo, Jiuxiang Gu, Zhe Lin, Bo Du, Yu Xu, Ming-Hsuan Yang

Despite the progress of image segmentation for accurate visual entity segmentation, completing the diverse requirements of image editing applications for different-level region-of-interest selections remains unsolved.

Image Segmentation Segmentation +1

669

Paper
Code

High-Quality Entity Segmentation

1 code implementation • 10 Nov 2022 • Lu Qi, Jason Kuen, Weidong Guo, Tiancheng Shen, Jiuxiang Gu, Jiaya Jia, Zhe Lin, Ming-Hsuan Yang

It improves mask prediction by fusing high-res image crops that provide more fine-grained image details and the full image.

Image Segmentation Segmentation +2

668

Paper
Code

Rethinking Evaluation Metrics of Open-Vocabulary Segmentaion

1 code implementation • 6 Nov 2023 • Hao Zhou, Tiancheng Shen, Xu Yang, Hai Huang, Xiangtai Li, Lu Qi, Ming-Hsuan Yang

We benchmarked the proposed evaluation metrics on 12 open-vocabulary methods of three segmentation tasks.

Segmentation

668

Paper
Code

UniGS: Unified Representation for Image Generation and Segmentation

1 code implementation • 4 Dec 2023 • Lu Qi, Lehan Yang, Weidong Guo, Yu Xu, Bo Du, Varun Jampani, Ming-Hsuan Yang

On the other hand, the progressive dichotomy module can efficiently decode the synthesized colormap to high-quality entity-level masks in a depth-first binary search without knowing the cluster numbers.

Image Generation Segmentation

668

Paper
Code

Universal Style Transfer via Feature Transforms

15 code implementations • NeurIPS 2017 • Yijun Li, Chen Fang, Jimei Yang, Zhaowen Wang, Xin Lu, Ming-Hsuan Yang

The whitening and coloring transforms reflect a direct matching of feature covariance of the content image to a given style image, which shares similar spirits with the optimization of Gram matrix based cost in neural style transfer.

Image Reconstruction Style Transfer

591

Paper
Code

Eidetic 3D LSTM: A Model for Video Prediction and Beyond

3 code implementations • ICLR 2019 • Yunbo Wang, Lu Jiang, Ming-Hsuan Yang, Li-Jia Li, Mingsheng Long, Li Fei-Fei

We first evaluate the E3D-LSTM network on widely-used future video prediction datasets and achieve the state-of-the-art performance.

Ranked #1 on Video Prediction on KTH (Cond metric)

Activity Recognition Video Prediction +1

590

Paper
Code

GLaMM: Pixel Grounding Large Multimodal Model

1 code implementation • 6 Nov 2023 • Hanoona Rasheed, Muhammad Maaz, Sahal Shaji Mullappilly, Abdelrahman Shaker, Salman Khan, Hisham Cholakkal, Rao M. Anwer, Erix Xing, Ming-Hsuan Yang, Fahad S. Khan

In this work, we present Grounding LMM (GLaMM), the first model that can generate natural language responses seamlessly intertwined with corresponding object segmentation masks.

Conversational Question Answering Image Captioning +5

582

Paper
Code

Dancing to Music

2 code implementations • NeurIPS 2019 • Hsin-Ying Lee, Xiaodong Yang, Ming-Yu Liu, Ting-Chun Wang, Yu-Ding Lu, Ming-Hsuan Yang, Jan Kautz

In the analysis phase, we decompose a dance into a series of basic dance units, through which the model learns how to move.

Ranked #3 on Motion Synthesis on BRACE

Motion Synthesis Pose Estimation

522

Paper
Code

Single-Image HDR Reconstruction by Learning to Reverse the Camera Pipeline

1 code implementation • CVPR 2020 • Yu-Lun Liu, Wei-Sheng Lai, Yu-Sheng Chen, Yi-Lung Kao, Ming-Hsuan Yang, Yung-Yu Chuang, Jia-Bin Huang

We model the HDRto-LDR image formation pipeline as the (1) dynamic range clipping, (2) non-linear mapping from a camera response function, and (3) quantization.

Ranked #4 on Inverse-Tone-Mapping on MSU HDR Video Reconstruction Benchmark

HDR Reconstruction Inverse-Tone-Mapping +2

522

Paper
Code

Hybrid Neural Fusion for Full-frame Video Stabilization

2 code implementations • ICCV 2021 • Yu-Lun Liu, Wei-Sheng Lai, Ming-Hsuan Yang, Yung-Yu Chuang, Jia-Bin Huang

Existing video stabilization methods often generate visible distortion or require aggressive cropping of frame boundaries, resulting in smaller field of views.

Video Stabilization

521

Paper
Code

Fast and Accurate Online Video Object Segmentation via Tracking Parts

2 code implementations • CVPR 2018 • Jingchun Cheng, Yi-Hsuan Tsai, Wei-Chih Hung, Shengjin Wang, Ming-Hsuan Yang

基于视频的目标检测算法研究

Ranked #56 on Semi-Supervised Video Object Segmentation on DAVIS 2017 (test-dev)

Semantic Segmentation Semi-Supervised Video Object Segmentation +2

451

Paper
Code

Foundational Models Defining a New Era in Vision: A Survey and Outlook

1 code implementation • 25 Jul 2023 • Muhammad Awais, Muzammal Naseer, Salman Khan, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, Fahad Shahbaz Khan

Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world.

Benchmarking

418

Paper
Code

Mode Seeking Generative Adversarial Networks for Diverse Image Synthesis

2 code implementations • CVPR 2019 • Qi Mao, Hsin-Ying Lee, Hung-Yu Tseng, Siwei Ma, Ming-Hsuan Yang

In this work, we propose a simple yet effective regularization term to address the mode collapse issue for cGANs.

Ranked #3 on Multimodal Unsupervised Image-To-Image Translation on CelebA-HQ

Multimodal Unsupervised Image-To-Image Translation Translation

411

Paper
Code

Learning Blind Video Temporal Consistency

1 code implementation • ECCV 2018 • Wei-Sheng Lai, Jia-Bin Huang, Oliver Wang, Eli Shechtman, Ersin Yumer, Ming-Hsuan Yang

Our method takes the original unprocessed and per-frame processed videos as inputs to produce a temporally consistent video.

Colorization Image-to-Image Translation +4

405

Paper
Code

Learning Linear Transformations for Fast Arbitrary Style Transfer

1 code implementation • 14 Aug 2018 • Xueting Li, Sifei Liu, Jan Kautz, Ming-Hsuan Yang

Recent arbitrary style transfer methods transfer second order statistics from reference image onto content image via a multiplication between content image features and a transformation matrix, which is computed from features with a pre-determined algorithm.

Domain Adaptation Style Transfer

377

Paper
Code

Learning Enriched Features for Fast Image Restoration and Enhancement

1 code implementation • 19 Apr 2022 • Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, Ling Shao

In the former case, spatial details are preserved but the contextual information cannot be precisely encoded.

Autonomous Vehicles Deblurring +4

369

Paper
Code

3D Vision with Transformers: A Survey

1 code implementation • 8 Aug 2022 • Jean Lahoud, Jiale Cao, Fahad Shahbaz Khan, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Ming-Hsuan Yang

The success of the transformer architecture in natural language processing has recently triggered attention in the computer vision field.

Pose Estimation

368

Paper
Code

Superpixel Sampling Networks

2 code implementations • ECCV 2018 • Varun Jampani, Deqing Sun, Ming-Yu Liu, Ming-Hsuan Yang, Jan Kautz

Superpixels provide an efficient low/mid-level representation of image data, which greatly reduces the number of image primitives for subsequent vision tasks.

Segmentation Superpixels

342

Paper
Code

SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications

2 code implementations • ICCV 2023 • Abdelrahman Shaker, Muhammad Maaz, Hanoona Rasheed, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan

Using our proposed efficient additive attention, we build a series of models called "SwiftFormer" which achieves state-of-the-art performance in terms of both accuracy and mobile inference speed.

326

Paper
Code

Multi-Scale Boosted Dehazing Network with Dense Feature Fusion

1 code implementation • CVPR 2020 • Hang Dong, Jinshan Pan, Lei Xiang, Zhe Hu, Xinyi Zhang, Fei Wang, Ming-Hsuan Yang

To address the issue of preserving spatial information in the U-Net architecture, we design a dense feature fusion module using the back-projection feedback scheme.

Ranked #9 on Image Dehazing on Haze4k

Decoder Image Dehazing

325

Paper
Code

Cross-Domain Few-Shot Classification via Learned Feature-Wise Transformation

1 code implementation • ICLR 2020 • Hung-Yu Tseng, Hsin-Ying Lee, Jia-Bin Huang, Ming-Hsuan Yang

Few-shot classification aims to recognize novel categories with only few labeled images in each class.

Ranked #6 on Cross-Domain Few-Shot on CUB

Classification Cross-Domain Few-Shot +2

320

Paper
Code

InfinityGAN: Towards Infinite-Pixel Image Synthesis

1 code implementation • ICLR 2022 • Chieh Hubert Lin, Hsin-Ying Lee, Yen-Chi Cheng, Sergey Tulyakov, Ming-Hsuan Yang

We present a novel framework, InfinityGAN, for arbitrary-sized image generation.

Ranked #2 on Scene Generation on OSM

Image Generation Scene Generation

318

Paper
Code

Generative Face Completion

2 code implementations • CVPR 2017 • Yijun Li, Sifei Liu, Jimei Yang, Ming-Hsuan Yang

In this paper, we propose an effective face completion algorithm using a deep generative model.

Facial Inpainting Semantic Parsing

316

Paper
Code

ViDT: An Efficient and Effective Fully Transformer-based Object Detector

1 code implementation • ICLR 2022 • Hwanjun Song, Deqing Sun, Sanghyuk Chun, Varun Jampani, Dongyoon Han, Byeongho Heo, Wonjae Kim, Ming-Hsuan Yang

Transformers are transforming the landscape of computer vision, especially for recognition tasks.

Ranked #12 on Object Detection on COCO 2017 val

Decoder Image Classification +3

300

Paper
Code

An Extendable, Efficient and Effective Transformer-based Object Detector

1 code implementation • 17 Apr 2022 • Hwanjun Song, Deqing Sun, Sanghyuk Chun, Varun Jampani, Dongyoon Han, Byeongho Heo, Wonjae Kim, Ming-Hsuan Yang

Transformers have been widely used in numerous vision problems especially for visual recognition and detection.

Decoder Image Classification +5

300

Paper
Code

Class-agnostic Object Detection with Multi-modal Transformer

1 code implementation • 22 Nov 2021 • Muhammad Maaz, Hanoona Rasheed, Salman Khan, Fahad Shahbaz Khan, Rao Muhammad Anwer, Ming-Hsuan Yang

This has been a long-standing question in computer vision.

Ranked #1 on Open World Object Detection on COCO 2017 (Outdoor, Accessories, Appliance, Truck)

Class-agnostic Object Detection Object +3

294

Paper
Code

UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation

2 code implementations • 8 Dec 2022 • Abdelrahman Shaker, Muhammad Maaz, Hanoona Rasheed, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan

Owing to the success of transformer models, recent works study their applicability in 3D medical segmentation tasks.

Image Segmentation Medical Image Segmentation +2

286

Paper
Code

SPLATNet: Sparse Lattice Networks for Point Cloud Processing

2 code implementations • CVPR 2018 • Hang Su, Varun Jampani, Deqing Sun, Subhransu Maji, Evangelos Kalogerakis, Ming-Hsuan Yang, Jan Kautz

We present a network architecture for processing point clouds that directly operates on a collection of points represented as a sparse set of samples in a high-dimensional lattice.

Ranked #30 on Semantic Segmentation on ScanNet

3D Part Segmentation 3D Semantic Segmentation

266

Paper
Code

V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer

2 code implementations • 20 Mar 2022 • Runsheng Xu, Hao Xiang, Zhengzhong Tu, Xin Xia, Ming-Hsuan Yang, Jiaqi Ma

In this paper, we investigate the application of Vehicle-to-Everything (V2X) communication to improve the perception performance of autonomous vehicles.

Ranked #1 on 3D Object Detection on V2XSet

3D Object Detection Autonomous Vehicles +1

257

Paper
Code

Self-supervised Single-view 3D Reconstruction via Semantic Consistency

1 code implementation • ECCV 2020 • Xueting Li, Sifei Liu, Kihwan Kim, Shalini De Mello, Varun Jampani, Ming-Hsuan Yang, Jan Kautz

To the best of our knowledge, we are the first to try and solve the single-view reconstruction problem without a category-specific template mesh or semantic keypoints.

3D Reconstruction Object +1

226

Paper
Code

SCOPS: Self-Supervised Co-Part Segmentation

1 code implementation • CVPR 2019 • Wei-Chih Hung, Varun Jampani, Sifei Liu, Pavlo Molchanov, Ming-Hsuan Yang, Jan Kautz

Parts provide a good intermediate representation of objects that is robust with respect to the camera, pose and appearance variations.

Ranked #4 on Unsupervised Keypoint Estimation on CUB

Object Segmentation +4

218

Paper
Code

Learning Spatial and Spatio-Temporal Pixel Aggregations for Image and Video Denoising

3 code implementations • 26 Jan 2021 • Xiangyu Xu, Muchen Li, Wenxiu Sun, Ming-Hsuan Yang

We present a spatial pixel aggregation network and learn the pixel sampling and averaging strategies for image denoising.

Image Denoising Video Denoising

216

Paper
Code

Fast and Accurate Image Super-Resolution with Deep Laplacian Pyramid Networks

7 code implementations • 4 Oct 2017 • Wei-Sheng Lai, Jia-Bin Huang, Narendra Ahuja, Ming-Hsuan Yang

However, existing methods often require a large number of network parameters and entail heavy computational loads at runtime for generating high-accuracy super-resolution results.

Image Reconstruction Image Super-Resolution

213

Paper
Code

Decoupled Dynamic Filter Networks

1 code implementation • CVPR 2021 • Jingkai Zhou, Varun Jampani, Zhixiong Pi, Qiong Liu, Ming-Hsuan Yang

Inspired by recent advances in attention, DDF decouples a depth-wise dynamic filter into spatial and channel dynamic filters.

Ranked #13 on Semantic Segmentation on MCubeS

Image Classification Semantic Segmentation

211

Paper
Code

A Tale of Two Features: Stable Diffusion Complements DINO for Zero-Shot Semantic Correspondence

1 code implementation • NeurIPS 2023 • Junyi Zhang, Charles Herrmann, Junhwa Hur, Luisa Polania Cabrera, Varun Jampani, Deqing Sun, Ming-Hsuan Yang

Text-to-image diffusion models have made significant advances in generating and editing high-quality images.

Ranked #3 on Semantic correspondence on SPair-71k

Representation Learning Semantic correspondence +1

209

Paper
Code

RAP-SAM: Towards Real-Time All-Purpose Segment Anything

1 code implementation • 18 Jan 2024 • Shilin Xu, Haobo Yuan, Qingyu Shi, Lu Qi, Jingbo Wang, Yibo Yang, Yining Li, Kai Chen, Yunhai Tong, Bernard Ghanem, Xiangtai Li, Ming-Hsuan Yang

Segment Anything Model (SAM) is one remarkable model that can achieve generalized segmentation.

Decoder Interactive Segmentation +4

191

Paper
Code

Self-regulating Prompts: Foundational Model Adaptation without Forgetting

2 code implementations • ICCV 2023 • Muhammad Uzair Khattak, Syed Talal Wasim, Muzammal Naseer, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan

To the best of our knowledge, this is the first regularization framework for prompt learning that avoids overfitting by jointly attending to pre-trained model features, the training trajectory during prompting, and the textual diversity.

Ranked #2 on Prompt Engineering on ImageNet V2

Prompt Engineering

187

Paper
Code

Collaborative Distillation for Ultra-Resolution Universal Style Transfer

1 code implementation • CVPR 2020 • Huan Wang, Yijun Li, Yuehai Wang, Haoji Hu, Ming-Hsuan Yang

In this work, we present a new knowledge distillation method (named Collaborative Distillation) for encoder-decoder based neural style transfer to reduce the convolutional filters.

Decoder Knowledge Distillation +1

185

Paper
Code

Im2Pencil: Controllable Pencil Illustration from Photographs

1 code implementation • CVPR 2019 • Yijun Li, Chen Fang, Aaron Hertzmann, Eli Shechtman, Ming-Hsuan Yang

We propose a high-quality photo-to-pencil translation method with fine-grained control over the drawing style.

Translation

181

Paper
Code

SegFlow: Joint Learning for Video Object Segmentation and Optical Flow

1 code implementation • ICCV 2017 • Jingchun Cheng, Yi-Hsuan Tsai, Shengjin Wang, Ming-Hsuan Yang

This paper proposes an end-to-end trainable network, SegFlow, for simultaneously predicting pixel-wise object segmentation and optical flow in videos.

Ranked #67 on Semi-Supervised Video Object Segmentation on DAVIS 2016

Image Segmentation Object +7

178

Paper
Code

Weakly-Supervised Semantic Segmentation via Sub-category Exploration

1 code implementation • CVPR 2020 • Yu-Ting Chang, Qiaosong Wang, Wei-Chih Hung, Robinson Piramuthu, Yi-Hsuan Tsai, Ming-Hsuan Yang

Existing weakly-supervised semantic segmentation methods using image-level annotations typically rely on initial responses to locate object regions.

Ranked #66 on Weakly-Supervised Semantic Segmentation on PASCAL VOC 2012 val

Clustering Object +2

178

Paper
Code

PiCANet: Learning Pixel-wise Contextual Attention for Saliency Detection

2 code implementations • CVPR 2018 • Nian Liu, Junwei Han, Ming-Hsuan Yang

We formulate the proposed PiCANet in both global and local forms to attend to global and local contexts, respectively.

Ranked #7 on RGB Salient Object Detection on SOC

RGB Salient Object Detection Saliency Detection

177

Paper
Code

Switchable Temporal Propagation Network

1 code implementation • ECCV 2018 • Sifei Liu, Guangyu Zhong, Shalini De Mello, Jinwei Gu, Varun Jampani, Ming-Hsuan Yang, Jan Kautz

Our approach is based on a temporal propagation network (TPN), which models the transition-related affinity between a pair of frames in a purely data-driven manner.

Video Compression

176

Paper
Code

Joint-task Self-supervised Learning for Temporal Correspondence

2 code implementations • NeurIPS 2019 • Xueting Li, Sifei Liu, Shalini De Mello, Xiaolong Wang, Jan Kautz, Ming-Hsuan Yang

Our learning process integrates two highly related tasks: tracking large image regions \emph{and} establishing fine-grained pixel-level associations between consecutive video frames.

Ranked #73 on Semi-Supervised Video Object Segmentation on DAVIS 2017 (val)

Object Tracking Self-Supervised Learning +2

176

Paper
Code

Intriguing Properties of Vision Transformers

1 code implementation • NeurIPS 2021 • Muzammal Naseer, Kanchana Ranasinghe, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang

We show and analyze the following intriguing properties of ViT: (a) Transformers are highly robust to severe occlusions, perturbations and domain shifts, e. g., retain as high as 60% top-1 accuracy on ImageNet even after randomly occluding 80% of the image content.

Few-Shot Learning Semantic Segmentation

173

Paper
Code

MEMC-Net: Motion Estimation and Motion Compensation Driven Neural Network for Video Interpolation and Enhancement

1 code implementation • 20 Oct 2018 • Wenbo Bao, Wei-Sheng Lai, Xiaoyun Zhang, Zhiyong Gao, Ming-Hsuan Yang

Recently, a number of data-driven frame interpolation methods based on convolutional neural networks have been proposed.

Ranked #22 on Video Frame Interpolation on Vimeo90K

Computational Efficiency Denoising +6

166

Paper
Code

MEMC-Net: Motion Estimation and Motion Compensation Driven Neural Network for Video Frame Interpolation and Enhancement

1 code implementation • arXiv 2018 • Wenbo Bao, Wei-Sheng Lai, Xiaoyun Zhang, Zhiyong Gao, Ming-Hsuan Yang

In this work, we propose a motion estimation and motion compensation driven neural network for video frame interpolation.

Ranked #6 on Video Frame Interpolation on Middlebury

Computational Efficiency Denoising +6

166

Paper
Code

Regularizing Generative Adversarial Networks under Limited Data

1 code implementation • CVPR 2021 • Hung-Yu Tseng, Lu Jiang, Ce Liu, Ming-Hsuan Yang, Weilong Yang

Recent years have witnessed the rapid progress of generative adversarial networks (GANs).

Ranked #1 on Image Generation on CIFAR-100

Data Augmentation Image Generation

163

Paper
Code

Every Pixel Matters: Center-aware Feature Alignment for Domain Adaptive Object Detector

1 code implementation • ECCV 2020 • Cheng-Chun Hsu, Yi-Hsuan Tsai, Yen-Yu Lin, Ming-Hsuan Yang

A domain adaptive object detector aims to adapt itself to unseen domains that may contain variations of object appearance, viewpoints or backgrounds.

Domain Adaptation

159

Paper
Code

Large-scale Unsupervised Semantic Segmentation

3 code implementations • 6 Jun 2021 • ShangHua Gao, Zhong-Yu Li, Ming-Hsuan Yang, Ming-Ming Cheng, Junwei Han, Philip Torr

In this work, we propose a new problem of large-scale unsupervised semantic segmentation (LUSS) with a newly created benchmark dataset to help the research progress.

Ranked #1 on Unsupervised Semantic Segmentation on ImageNet-S-300

Representation Learning Segmentation +1

156

Paper
Code

Adaptive Correlation Filters with Long-Term and Short-Term Memory for Object Tracking

1 code implementation • 7 Jul 2017 • Chao Ma, Jia-Bin Huang, Xiaokang Yang, Ming-Hsuan Yang

Second, we learn a correlation filter over a feature pyramid centered at the estimated target position for predicting scale changes.

Object Tracking Position

155

Paper
Code

Deep Image Harmonization

2 code implementations • CVPR 2017 • Yi-Hsuan Tsai, Xiaohui Shen, Zhe Lin, Kalyan Sunkavalli, Xin Lu, Ming-Hsuan Yang

Compositing is one of the most common operations in photo editing.

Image Harmonization

149

Paper
Code

Learning Spatial-Temporal Regularized Correlation Filters for Visual Tracking

1 code implementation • CVPR 2018 • Feng Li, Cheng Tian, WangMeng Zuo, Lei Zhang, Ming-Hsuan Yang

Compared with SRDCF, STRCF with hand-crafted features provides a 5 times speedup and achieves a gain of 5. 4% and 3. 6% AUC score on OTB-2015 and Temple-Color, respectively.

Ranked #9 on Visual Object Tracking on VOT2017/18

Visual Object Tracking Visual Tracking

147

Paper
Code

MC-Blur: A Comprehensive Benchmark for Image Deblurring

2 code implementations • 1 Dec 2021 • Kaihao Zhang, Tao Wang, Wenhan Luo, Boheng Chen, Wenqi Ren, Bjorn Stenger, Wei Liu, Hongdong Li, Ming-Hsuan Yang

Blur artifacts can seriously degrade the visual quality of images, and numerous deblurring methods have been proposed for specific scenarios.

Benchmarking Deblurring +1

145

Paper
Code

Learning a No-Reference Quality Metric for Single-Image Super-Resolution

2 code implementations • 18 Dec 2016 • Chao Ma, Chih-Yuan Yang, Xiaokang Yang, Ming-Hsuan Yang

Numerous single-image super-resolution algorithms have been proposed in the literature, but few studies address the problem of performance evaluation based on visual perception.

Ranked #7 on Video Quality Assessment on MSU SR-QA Dataset

Image Super-Resolution Video Quality Assessment

141

Paper
Code

Gated Fusion Network for Joint Image Deblurring and Super-Resolution

2 code implementations • 27 Jul 2018 • Xinyi Zhang, Hang Dong, Zhe Hu, Wei-Sheng Lai, Fei Wang, Ming-Hsuan Yang

Single-image super-resolution is a fundamental task for vision applications to enhance the image quality with respect to spatial resolution.

Computational Efficiency Deblurring +2

139

Paper
Code

DrivingGaussian: Composite Gaussian Splatting for Surrounding Dynamic Autonomous Driving Scenes

1 code implementation • 13 Dec 2023 • Xiaoyu Zhou, Zhiwei Lin, Xiaojun Shan, Yongtao Wang, Deqing Sun, Ming-Hsuan Yang

We present DrivingGaussian, an efficient and effective framework for surrounding dynamic autonomous driving scenes.

Autonomous Driving

128

Paper
Code

Burst Image Restoration and Enhancement

1 code implementation • CVPR 2022 • Akshay Dudhane, Syed Waqas Zamir, Salman Khan, Fahad Shahbaz Khan, Ming-Hsuan Yang

Our central idea is to create a set of pseudo-burst features that combine complementary information from all the input burst frames to seamlessly exchange information.

Ranked #2 on Burst Image Super-Resolution on BurstSR

Burst Image Super-Resolution Denoising +3

127

Paper
Code

Object Contour Detection with a Fully Convolutional Encoder-Decoder Network

3 code implementations • CVPR 2016 • Jimei Yang, Brian Price, Scott Cohen, Honglak Lee, Ming-Hsuan Yang

We develop a deep learning algorithm for contour detection with a fully convolutional encoder-decoder network.

Contour Detection Decoder +2

120

Paper
Code

Progressive Domain Adaptation for Object Detection

1 code implementation • 24 Oct 2019 • Han-Kai Hsu, Chun-Han Yao, Yi-Hsuan Tsai, Wei-Chih Hung, Hung-Yu Tseng, Maneesh Singh, Ming-Hsuan Yang

This intermediate domain is constructed by translating the source images to mimic the ones in the target domain.

Ranked #4 on Image-to-Image Translation on Cityscapes-to-Foggy Cityscapes

Domain Adaptation Object +2

102

Paper
Code

Burstormer: Burst Image Restoration and Enhancement Transformer

1 code implementation • CVPR 2023 • Akshay Dudhane, Syed Waqas Zamir, Salman Khan, Fahad Shahbaz Khan, Ming-Hsuan Yang

Unlike existing methods, the proposed alignment module not only aligns burst features but also exchanges feature information and maintains focused communication with the reference frame through the proposed reference-based feature enrichment mechanism, which facilitates handling complex motions.

Denoising Image Restoration +1

Paper
Code

Video Frame Interpolation Transformer

1 code implementation • CVPR 2022 • Zhihao Shi, Xiangyu Xu, Xiaohong Liu, Jun Chen, Ming-Hsuan Yang

Existing methods for video interpolation heavily rely on deep convolution neural networks, and thus suffer from their intrinsic limitations, such as content-agnostic kernel weights and restricted receptive field.

Video Frame Interpolation

Paper
Code

Learning to Dub Movies via Hierarchical Prosody Models

1 code implementation • CVPR 2023 • Gaoxiang Cong, Liang Li, Yuankai Qi, ZhengJun Zha, Qi Wu, Wenyu Wang, Bin Jiang, Ming-Hsuan Yang, Qingming Huang

Given a piece of text, a video clip and a reference audio, the movie dubbing (also known as visual voice clone V2C) task aims to generate speeches that match the speaker's emotion presented in the video using the desired speaker voice as reference.

Paper
Code

Modeling Artistic Workflows for Image Generation and Editing

1 code implementation • ECCV 2020 • Hung-Yu Tseng, Matthew Fisher, Jingwan Lu, Yijun Li, Vladimir Kim, Ming-Hsuan Yang

People often create art by following an artistic workflow involving multiple stages that inform the overall design.

Image Generation

Paper
Code

Context-Aware Synthesis and Placement of Object Instances

2 code implementations • NeurIPS 2018 • Donghoon Lee, Sifei Liu, Jinwei Gu, Ming-Yu Liu, Ming-Hsuan Yang, Jan Kautz

Learning to insert an object instance into an image in a semantically coherent manner is a challenging and interesting problem.

Object Scene Parsing

Paper
Code

Online Multi-Object Tracking with Dual Matching Attention Networks

1 code implementation • ECCV 2018 • Ji Zhu, Hua Yang, Nian Liu, Minyoung Kim, Wenjun Zhang, Ming-Hsuan Yang

In this paper, we propose an online Multi-Object Tracking (MOT) approach which integrates the merits of single object tracking and data association methods in a unified framework to handle noisy detections and frequent interactions between targets.

Ranked #5 on Online Multi-Object Tracking on MOT16

Multi-Object Tracking Object +1

Paper
Code

Learning to Localize Sound Sources in Visual Scenes: Analysis and Applications

1 code implementation • 20 Nov 2019 • Arda Senocak, Tae-Hyun Oh, Junsik Kim, Ming-Hsuan Yang, In So Kweon

Visual events are usually accompanied by sounds in our daily lives.

Paper
Code

Progressive Representation Adaptation for Weakly Supervised Object Localization

1 code implementation • 12 Oct 2017 • Dong Li, Jia-Bin Huang, Ya-Li Li, Shengjin Wang, Ming-Hsuan Yang

In classification adaptation, we transfer a pre-trained network to a multi-label classification task for recognizing the presence of a certain object in an image.

Classification General Classification +4

Paper
Code

Show, Match and Segment: Joint Weakly Supervised Learning of Semantic Matching and Object Co-segmentation

1 code implementation • 13 Jun 2019 • Yun-Chun Chen, Yen-Yu Lin, Ming-Hsuan Yang, Jia-Bin Huang

In contrast to existing algorithms that tackle the tasks of semantic matching and object co-segmentation in isolation, our method exploits the complementary nature of the two tasks.

Object Segmentation +1

Paper
Code

Learning to Caricature via Semantic Shape Transform

1 code implementation • 12 Aug 2020 • Wenqing Chu, Wei-Chih Hung, Yi-Hsuan Tsai, Yu-Ting Chang, Yijun Li, Deng Cai, Ming-Hsuan Yang

Caricature is an artistic drawing created to abstract or exaggerate facial features of a person.

Caricature

Paper
Code

Exploring Plain ViT Reconstruction for Multi-class Unsupervised Anomaly Detection

1 code implementation • 12 Dec 2023 • Jiangning Zhang, Xuhai Chen, Yabiao Wang, Chengjie Wang, Yong liu, Xiangtai Li, Ming-Hsuan Yang, DaCheng Tao

Following this spirit, this paper explores plain ViT architecture for MUAD.

Unsupervised Anomaly Detection

Paper
Code

Flow-Grounded Spatial-Temporal Video Prediction from Still Images

1 code implementation • ECCV 2018 • Yijun Li, Chen Fang, Jimei Yang, Zhaowen Wang, Xin Lu, Ming-Hsuan Yang

Existing video prediction methods mainly rely on observing multiple historical frames or focus on predicting the next one-frame.

Video Prediction

Paper
Code

Deep Regression Tracking with Shrinkage Loss

1 code implementation • ECCV 2018 • Xiankai Lu, Chao Ma, Bingbing Ni, Xiaokang Yang, Ian Reid, Ming-Hsuan Yang

Regression trackers directly learn a mapping from regularly dense samples of target objects to soft labels, which are usually generated by a Gaussian function, to estimate target positions.

regression

Paper
Code

Learning to See Through Obstructions with Layered Decomposition

1 code implementation • 11 Aug 2020 • Yu-Lun Liu, Wei-Sheng Lai, Ming-Hsuan Yang, Yung-Yu Chuang, Jia-Bin Huang

We present a learning-based approach for removing unwanted obstructions, such as window reflections, fence occlusions, or adherent raindrops, from a short sequence of images captured by a moving camera.

Optical Flow Estimation

Paper
Code

Weakly-supervised Caricature Face Parsing through Domain Adaptation

1 code implementation • 13 May 2019 • Wenqing Chu, Wei-Chih Hung, Yi-Hsuan Tsai, Deng Cai, Ming-Hsuan Yang

However, current state-of-the-art face parsing methods require large amounts of labeled data on the pixel-level and such process for caricature is tedious and labor-intensive.

Attribute Caricature +3

Paper
Code

Hierarchical Modular Network for Video Captioning

1 code implementation • CVPR 2022 • Hanhua Ye, Guorong Li, Yuankai Qi, Shuhui Wang, Qingming Huang, Ming-Hsuan Yang

(II) Predicate level, which learns the actions conditioned on highlighted objects and is supervised by the predicate in captions.

Representation Learning Sentence +1

Paper
Code

An Adaptive Random Path Selection Approach for Incremental Learning

1 code implementation • 3 Jun 2019 • Jathushan Rajasegaran, Munawar Hayat, Salman Khan, Fahad Shahbaz Khan, Ling Shao, Ming-Hsuan Yang

In a conventional supervised learning setting, a machine learning model has access to examples of all object classes that are desired to be recognized during the inference stage.

Ranked #7 on Incremental Learning on CIFAR-100-B0(5steps of 20 classes)

Incremental Learning Knowledge Distillation +1

Paper
Code

Regularizing Meta-Learning via Gradient Dropout

1 code implementation • 13 Apr 2020 • Hung-Yu Tseng, Yi-Wen Chen, Yi-Hsuan Tsai, Sifei Liu, Yen-Yu Lin, Ming-Hsuan Yang

With the growing attention on learning-to-learn new tasks using only a few examples, meta-learning has been widely used in numerous problems such as few-shot classification, reinforcement learning, and domain generalization.

Domain Generalization Meta-Learning

Paper
Code

PanopticPartFormer++: A Unified and Decoupled View for Panoptic Part Segmentation

1 code implementation • 3 Jan 2023 • Xiangtai Li, Shilin Xu, Yibo Yang, Haobo Yuan, Guangliang Cheng, Yunhai Tong, Zhouchen Lin, Ming-Hsuan Yang, DaCheng Tao

Third, inspired by Mask2Former, based on our meta-architecture, we propose Panoptic-PartFormer++ and design a new part-whole cross-attention scheme to boost part segmentation qualities further.

Panoptic Segmentation Segmentation

Paper
Code

Learning to Blend Photos

1 code implementation • ECCV 2018 • Wei-Chih Hung, Jianming Zhang, Xiaohui Shen, Zhe Lin, Joon-Young Lee, Ming-Hsuan Yang

Specifically, given a foreground image and a background image, our proposed method automatically generates a set of blending photos with scores that indicate the aesthetics quality with the proposed quality network and policy network.

Paper
Code

Diffusion-Based Scene Graph to Image Generation with Masked Contrastive Pre-Training

1 code implementation • 21 Nov 2022 • Ling Yang, Zhilin Huang, Yang song, Shenda Hong, Guohao Li, Wentao Zhang, Bin Cui, Bernard Ghanem, Ming-Hsuan Yang

Generating images from graph-structured inputs, such as scene graphs, is uniquely challenging due to the difficulty of aligning nodes and connections in graphs with objects and their relations in images.

Image Generation

Paper
Code

Unsupervised Representation Learning by Sorting Sequences

1 code implementation • ICCV 2017 • Hsin-Ying Lee, Jia-Bin Huang, Maneesh Singh, Ming-Hsuan Yang

We present an unsupervised representation learning approach using videos without semantic labels.

Ranked #46 on Self-Supervised Action Recognition on HMDB51

Image Classification object-detection +4

Paper
Code

Continuous and Diverse Image-to-Image Translation via Signed Attribute Vectors

1 code implementation • 2 Nov 2020 • Qi Mao, Hung-Yu Tseng, Hsin-Ying Lee, Jia-Bin Huang, Siwei Ma, Ming-Hsuan Yang

Generating a smooth sequence of intermediate results bridges the gap of two different domains, facilitating the morphing effect across domains.

Attribute Image-to-Image Translation +1

Paper
Code

Exploiting Raw Images for Real-Scene Super-Resolution

1 code implementation • 2 Feb 2021 • Xiangyu Xu, Yongrui Ma, Wenxiu Sun, Ming-Hsuan Yang

In this paper, we study the problem of real-scene single image super-resolution to bridge the gap between synthetic data and real captured images.

Image Restoration Image Super-Resolution

Paper
Code

Quadratic video interpolation

1 code implementation • NeurIPS 2019 • Xiangyu Xu, Li Si-Yao, Wenxiu Sun, Qian Yin, Ming-Hsuan Yang

Video interpolation is an important problem in computer vision, which helps overcome the temporal limitation of camera sensors.

Paper
Code

Efficient Video Object Segmentation via Modulated Cross-Attention Memory

1 code implementation • 26 Mar 2024 • Abdelrahman Shaker, Syed Talal Wasim, Martin Danelljan, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan

Recently, transformer-based approaches have shown promising results for semi-supervised video object segmentation.

Object Segmentation +3

Paper
Code

Dynamic Scene Deblurring Using Spatially Variant Recurrent Neural Networks

1 code implementation • CVPR 2018 • Jiawei Zhang, Jinshan Pan, Jimmy Ren, Yibing Song, Linchao Bao, Rynson W. H. Lau, Ming-Hsuan Yang

The proposed network is composed of three deep convolutional neural networks (CNNs) and a recurrent neural network (RNN).

Ranked #10 on Deblurring on RealBlur-R (trained on GoPro) (SSIM (sRGB) metric)

Deblurring

Paper
Code

Learning to Stylize Novel Views

1 code implementation • ICCV 2021 • Hsin-Ping Huang, Hung-Yu Tseng, Saurabh Saini, Maneesh Singh, Ming-Hsuan Yang

Second, we develop point cloud aggregation modules to gather the style information of the 3D scene, and then modulate the features in the point cloud with a linear transformation matrix.

Novel View Synthesis

Paper
Code

Learning Visibility for Robust Dense Human Body Estimation

1 code implementation • 23 Aug 2022 • Chun-Han Yao, Jimei Yang, Duygu Ceylan, Yi Zhou, Yang Zhou, Ming-Hsuan Yang

An alternative approach is to estimate dense vertices of a predefined template body in the image space.

Paper
Code

Simultaneous Fidelity and Regularization Learning for Image Restoration

1 code implementation • 12 Apr 2018 • Dongwei Ren, WangMeng Zuo, David Zhang, Lei Zhang, Ming-Hsuan Yang

For blind deconvolution, as estimation error of blur kernel is usually introduced, the subsequent non-blind deconvolution process does not restore the latent image well.

Denoising Image Deconvolution +1

Paper
Code

PiCANet: Pixel-wise Contextual Attention Learning for Accurate Saliency Detection

2 code implementations • 15 Dec 2018 • Nian Liu, Junwei Han, Ming-Hsuan Yang

We propose three specific formulations of the PiCANet via embedding the pixel-wise contextual attention mechanism into the pooling and convolution operations with attending to global or local contexts.

object-detection RGB Salient Object Detection +3

Paper
Code

Exploiting Diffusion Prior for Generalizable Dense Prediction

2 code implementations • 30 Nov 2023 • Hung-Yu Tseng, Hsin-Ying Lee, Ming-Hsuan Yang

Contents generated by recent advanced Text-to-Image (T2I) diffusion models are sometimes too imaginative for existing off-the-shelf dense predictors to estimate due to the immitigable domain gap.

Intrinsic Image Decomposition Semantic Segmentation

Paper
Code

Tracking Persons-of-Interest via Unsupervised Representation Adaptation

2 code implementations • 5 Oct 2017 • Shun Zhang, Jia-Bin Huang, Jongwoo Lim, Yihong Gong, Jinjun Wang, Narendra Ahuja, Ming-Hsuan Yang

Multi-face tracking in unconstrained videos is a challenging problem as faces of one person often appear drastically different in multiple shots due to significant variations in scale, pose, expression, illumination, and make-up.

Clustering

Paper
Code

Robust Visual Tracking via Hierarchical Convolutional Features

1 code implementation • 12 Jul 2017 • Chao Ma, Jia-Bin Huang, Xiaokang Yang, Ming-Hsuan Yang

Specifically, we learn adaptive correlation filters on the outputs from each convolutional layer to encode the target appearance.

Object Recognition Visual Tracking

Paper
Code

BEV-MAE: Bird's Eye View Masked Autoencoders for Point Cloud Pre-training in Autonomous Driving Scenarios

1 code implementation • 12 Dec 2022 • Zhiwei Lin, Yongtao Wang, Shengxiang Qi, Nan Dong, Ming-Hsuan Yang

Based on the property of outdoor point clouds in autonomous driving scenarios, i. e., the point clouds of distant objects are more sparse, we propose point density prediction to enable the 3D encoder to learn location information, which is essential for object detection.

3D Object Detection Autonomous Driving +3

Paper
Code

Text-Driven Image Editing via Learnable Regions

1 code implementation • 28 Nov 2023 • Yuanze Lin, Yi-Wen Chen, Yi-Hsuan Tsai, Lu Jiang, Ming-Hsuan Yang

Language has emerged as a natural interface for image editing.

Image Generation

Paper
Code

Referring Expression Object Segmentation with Caption-Aware Consistency

1 code implementation • 10 Oct 2019 • Yi-Wen Chen, Yi-Hsuan Tsai, Tiantian Wang, Yen-Yu Lin, Ming-Hsuan Yang

To this end, we propose an end-to-end trainable comprehension network that consists of the language and visual encoders to extract feature representations from both domains.

Ranked #19 on Referring Expression Segmentation on RefCOCO testB

Caption Generation Object +4

Paper
Code

Gated Fusion Network for Degraded Image Super Resolution

1 code implementation • 2 Mar 2020 • Xinyi Zhang, Hang Dong, Zhe Hu, Wei-Sheng Lai, Fei Wang, Ming-Hsuan Yang

To address this problem, we propose a dual-branch convolutional neural network to extract base features and recovered features separately.

Image Super-Resolution

Paper
Code

Autoregressive 3D Shape Generation via Canonical Mapping

1 code implementation • 5 Apr 2022 • An-Chieh Cheng, Xueting Li, Sifei Liu, Min Sun, Ming-Hsuan Yang

With the capacity of modeling long-range dependencies in sequential data, transformers have shown remarkable performances in a variety of generative tasks such as image, audio, and text generation.

3D Shape Generation Point Cloud Generation +1

Paper
Code

CiteTracker: Correlating Image and Text for Visual Tracking

1 code implementation • ICCV 2023 • Xin Li, Yuqing Huang, Zhenyu He, YaoWei Wang, Huchuan Lu, Ming-Hsuan Yang

Existing visual tracking methods typically take an image patch as the reference of the target to perform tracking.

Attribute Descriptive +2

Paper
Code

Telling Left from Right: Identifying Geometry-Aware Semantic Correspondence

1 code implementation • 28 Nov 2023 • Junyi Zhang, Charles Herrmann, Junhwa Hur, Eric Chen, Varun Jampani, Deqing Sun, Ming-Hsuan Yang

This paper identifies the importance of being geometry-aware for semantic correspondence and reveals a limitation of the features of current foundation models under simple post-processing.

Ranked #1 on Semantic correspondence on PF-PASCAL

Animal Pose Estimation Semantic correspondence

Paper
Code

Diffusion-SS3D: Diffusion Model for Semi-supervised 3D Object Detection

1 code implementation • NeurIPS 2023 • Cheng-Ju Ho, Chen-Hsuan Tai, Yen-Yu Lin, Ming-Hsuan Yang, Yi-Hsuan Tsai

Semi-supervised object detection is crucial for 3D scene understanding, efficiently addressing the limitation of acquiring large-scale 3D bounding box annotations.

3D Object Detection Denoising +5

Paper
Code

PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection

1 code implementation • 13 Dec 2023 • Kuan-Chih Huang, Weijie Lyu, Ming-Hsuan Yang, Yi-Hsuan Tsai

Recent temporal LiDAR-based 3D object detectors achieve promising performance based on the two-stage proposal-based approach.

3D Object Detection object-detection

Paper
Code

Dynamic Pre-training: Towards Efficient and Scalable All-in-One Image Restoration

1 code implementation • 2 Apr 2024 • Akshay Dudhane, Omkar Thawakar, Syed Waqas Zamir, Salman Khan, Fahad Shahbaz Khan, Ming-Hsuan Yang

All-in-one image restoration tackles different types of degradations with a unified model instead of having task-specific, non-generic models for each degradation.

Decoder Image Denoising +2

Paper
Code

Pyramid Diffusion for Fine 3D Large Scene Generation

1 code implementation • 20 Nov 2023 • Yuheng Liu, Xinke Li, Xueting Li, Lu Qi, Chongshou Li, Ming-Hsuan Yang

Directly transferring the 2D techniques to 3D scene generation is challenging due to significant resolution reduction and the scarcity of comprehensive real-world 3D scene datasets.

Scene Generation

Paper
Code

Improving Zero-shot Generalization and Robustness of Multi-modal Models

1 code implementation • CVPR 2023 • Yunhao Ge, Jie Ren, Andrew Gallagher, Yuxiao Wang, Ming-Hsuan Yang, Hartwig Adam, Laurent Itti, Balaji Lakshminarayanan, Jiaping Zhao

We also show that our method improves across ImageNet shifted datasets, four other datasets, and other model architectures such as LiT.

Image Classification Zero-shot Generalization

Paper
Code

Self-Supervised Super-Plane for Neural 3D Reconstruction

1 code implementation • CVPR 2023 • Botao Ye, Sifei Liu, Xueting Li, Ming-Hsuan Yang

In this work, we introduce a self-supervised super-plane constraint by exploring the free geometry cues from the predicted surface, which can further regularize the reconstruction of plane regions without any other ground truth annotations.

3D Reconstruction

Paper
Code

CLR: Channel-wise Lightweight Reprogramming for Continual Learning

1 code implementation • ICCV 2023 • Yunhao Ge, Yuecheng Li, Shuo Ni, Jiaping Zhao, Ming-Hsuan Yang, Laurent Itti

Reprogramming parameters are task-specific and exclusive to each task, which makes our method immune to catastrophic forgetting.

Continual Learning Image Classification

Paper
Code

Superpixel Hierarchy

1 code implementation • 20 May 2016 • Xing Wei, Qingxiong Yang, Yihong Gong, Ming-Hsuan Yang, Narendra Ahuja

Quantitative and qualitative evaluation on a number of computer vision applications was conducted, demonstrating that the proposed method is the top performer.

Image Segmentation Segmentation +2

Paper
Code

Delving into Motion-Aware Matching for Monocular 3D Object Tracking

1 code implementation • ICCV 2023 • Kuan-Chih Huang, Ming-Hsuan Yang, Yi-Hsuan Tsai

In this paper, we find that the motion cue of objects along different time frames is critical in 3D multi-object tracking, which is less explored in existing monocular-based approaches.

3D Multi-Object Tracking 3D Object Tracking +3

Paper
Code

Correlation Tracking via Joint Discrimination and Reliability Learning

1 code implementation • CVPR 2018 • Chong Sun, Dong Wang, Huchuan Lu, Ming-Hsuan Yang

To address this issue, we propose a novel CF-based optimization problem to jointly model the discrimination and reliability information.

Visual Tracking

Paper
Code

Rethinking Class-Balanced Methods for Long-Tailed Visual Recognition from a Domain Adaptation Perspective

1 code implementation • CVPR 2020 • Muhammad Abdullah Jamal, Matthew Brown, Ming-Hsuan Yang, Liqiang Wang, Boqing Gong

Object frequency in the real world often follows a power law, leading to a mismatch between datasets with long-tailed class distributions seen by a machine learning model and our expectation of the model to perform well on all classes.

Ranked #27 on Long-tail Learning on Places-LT

Domain Adaptation Long-tail Learning +1

Paper
Code

TapLab: A Fast Framework for Semantic Video Segmentation Tapping into Compressed-Domain Knowledge

1 code implementation • 30 Mar 2020 • Junyi Feng, Songyuan Li, Xi Li, Fei Wu, Qi Tian, Ming-Hsuan Yang, Haibin Ling

Real-time semantic video segmentation is a challenging task due to the strict requirements of inference speed.

Image Segmentation Semantic Segmentation +2

Paper
Code

Scene Parsing with Global Context Embedding

1 code implementation • ICCV 2017 • Wei-Chih Hung, Yi-Hsuan Tsai, Xiaohui Shen, Zhe Lin, Kalyan Sunkavalli, Xin Lu, Ming-Hsuan Yang

We present a scene parsing method that utilizes global context information based on both the parametric and non- parametric models.

Scene Parsing

Paper
Code

MiniROAD: Minimal RNN Framework for Online Action Detection

1 code implementation • ICCV 2023 • Joungbin An, Hyolim Kang, Su Ho Han, Ming-Hsuan Yang, Seon Joo Kim

Online Action Detection (OAD) is the task of identifying actions in streaming videos without access to future frames.

Ranked #1 on Online Action Detection on TVSeries

Online Action Detection

Paper
Code

Integrating Boundary and Center Correlation Filters for Visual Tracking with Aspect Ratio Variation

1 code implementation • 5 Oct 2017 • Feng Li, Yingjie Yao, Peihua Li, David Zhang, WangMeng Zuo, Ming-Hsuan Yang

The aspect ratio variation frequently appears in visual tracking and has a severe influence on performance.

Visual Tracking

Paper
Code

HENet: Hybrid Encoding for End-to-end Multi-task 3D Perception from Multi-view Cameras

1 code implementation • 3 Apr 2024 • Zhongyu Xia, Zhiwei Lin, Xinhao Wang, Yongtao Wang, Yun Xing, Shengxiang Qi, Nan Dong, Ming-Hsuan Yang

Three-dimensional perception from multi-view cameras is a crucial component in autonomous driving systems, which involves multiple tasks like 3D object detection and bird's-eye-view (BEV) semantic segmentation.

3D Object Detection Autonomous Driving +2

Paper
Code

End-to-end Multi-modal Video Temporal Grounding

1 code implementation • NeurIPS 2021 • Yi-Wen Chen, Yi-Hsuan Tsai, Ming-Hsuan Yang

Specifically, we adopt RGB images for appearance, optical flow for motion, and depth maps for image structure.

Optical Flow Estimation Self-Supervised Learning

Paper
Code

Generative Multiplane Neural Radiance for 3D-Aware Image Generation

1 code implementation • ICCV 2023 • Amandeep Kumar, Ankan Kumar Bhunia, Sanath Narayan, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan

We present a method to efficiently generate 3D-aware high-resolution images that are view-consistent across multiple target views.

Computational Efficiency Image Generation

Paper
Code

The Road to Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation

1 code implementation • ICCV 2021 • Yuankai Qi, Zizheng Pan, Yicong Hong, Ming-Hsuan Yang, Anton Van Den Hengel, Qi Wu

Vision-and-Language Navigation (VLN) requires an agent to find a path to a remote location on the basis of natural-language instructions and a set of photo-realistic panoramas.

Vision and Language Navigation Vision-Language Navigation

Paper
Code

Hi-LASSIE: High-Fidelity Articulated Shape and Skeleton Discovery from Sparse Image Ensemble

1 code implementation • CVPR 2023 • Chun-Han Yao, Wei-Chih Hung, Yuanzhen Li, Michael Rubinstein, Ming-Hsuan Yang, Varun Jampani

Automatically estimating 3D skeleton, shape, camera viewpoints, and part articulation from sparse in-the-wild image ensembles is a severely under-constrained and challenging problem.

Paper
Code

Dual Associated Encoder for Face Restoration

1 code implementation • 14 Aug 2023 • Yu-Ju Tsai, Yu-Lun Liu, Lu Qi, Kelvin C. K. Chan, Ming-Hsuan Yang

Restoring facial details from low-quality (LQ) images has remained a challenging problem due to its ill-posedness induced by various degradations in the wild.

Ranked #2 on Blind Face Restoration on WIDER

Blind Face Restoration

Paper
Code

An Informative Tracking Benchmark

1 code implementation • 13 Dec 2021 • Xin Li, Qiao Liu, Wenjie Pei, Qiuhong Shen, YaoWei Wang, Huchuan Lu, Ming-Hsuan Yang

Along with the rapid progress of visual tracking, existing benchmarks become less informative due to redundancy of samples and weak discrimination between current trackers, making evaluations on all datasets extremely time-consuming.

Visual Tracking

Paper
Code

Learning Spatial-Aware Regressions for Visual Tracking

1 code implementation • CVPR 2018 • Chong Sun, Dong Wang, Huchuan Lu, Ming-Hsuan Yang

Second, we propose a fully convolutional neural network with spatially regularized kernels, through which the filter kernel corresponding to each output channel is forced to focus on a specific region of the target.

Ranked #12 on Visual Object Tracking on VOT2017/18

regression Visual Object Tracking +1

Paper
Code

Dual Deep Network for Visual Tracking

1 code implementation • 19 Dec 2016 • Zhizhen Chi, Hongyang Li, Huchuan Lu, Ming-Hsuan Yang

In this paper, we propose a dual network to better utilize features among layers for visual tracking.

Visual Tracking

Paper
Code

2.5D Visual Relationship Detection

1 code implementation • 26 Apr 2021 • Yu-Chuan Su, Soravit Changpinyo, Xiangning Chen, Sathish Thoppay, Cho-Jui Hsieh, Lior Shapira, Radu Soricut, Hartwig Adam, Matthew Brown, Ming-Hsuan Yang, Boqing Gong

To enable progress on this task, we create a new dataset consisting of 220k human-annotated 2. 5D relationships among 512K objects from 11K images.

Benchmarking Depth Estimation +2

Paper
Code

FlowNAS: Neural Architecture Search for Optical Flow Estimation

1 code implementation • 4 Jul 2022 • Zhiwei Lin, TingTing Liang, Taihong Xiao, Yongtao Wang, Zhi Tang, Ming-Hsuan Yang

To address this issue, we propose a neural architecture search method named FlowNAS to automatically find the better encoder architecture for flow estimation task.

Image Classification Neural Architecture Search +1

Paper
Code

Weakly Supervised Coupled Networks for Visual Sentiment Analysis

1 code implementation • CVPR 2018 • Jufeng Yang, Dongyu She, Yu-Kun Lai, Paul L. Rosin, Ming-Hsuan Yang

The second branch utilizes both the holistic and localized information by coupling the sentiment map with deep features for robust classification.

General Classification Robust classification +1

Paper
Code

Unsupervised Holistic Image Generation from Key Local Patches

1 code implementation • ECCV 2018 • Donghoon Lee, Sangdoo Yun, Sungjoon Choi, Hwiyeon Yoo, Ming-Hsuan Yang, Songhwai Oh

We introduce a new problem of generating an image based on a small number of key local patches without any geometric prior.

Decoder Image Generation

Paper
Code

Semi-Supervised Learning with Meta-Gradient

1 code implementation • 8 Jul 2020 • Xin-Yu Zhang, Taihong Xiao, HaoLin Jia, Ming-Ming Cheng, Ming-Hsuan Yang

In this work, we propose a simple yet effective meta-learning algorithm in semi-supervised learning.

Meta-Learning Pseudo Label

Paper
Code

PromptRR: Diffusion Models as Prompt Generators for Single Image Reflection Removal

1 code implementation • 4 Feb 2024 • Tao Wang, Wanglong Lu, Kaihao Zhang, Wenhan Luo, Tae-Kyun Kim, Tong Lu, Hongdong Li, Ming-Hsuan Yang

For the prompt generation, we first propose a prompt pre-training strategy to train a frequency prompt encoder that encodes the ground-truth image into LF and HF prompts.

Reflection Removal

Paper
Code

ReMix: Towards Image-to-Image Translation with Limited Data

1 code implementation • CVPR 2021 • Jie Cao, Luanxuan Hou, Ming-Hsuan Yang, Ran He, Zhenan Sun

We interpolate training samples at the feature level and propose a novel content loss based on the perceptual relations among samples.

Data Augmentation Image-to-Image Translation +1

Paper
Code

Learning Object-level Point Augmentor for Semi-supervised 3D Object Detection

1 code implementation • 19 Dec 2022 • Cheng-Ju Ho, Chen-Hsuan Tai, Yi-Hsuan Tsai, Yen-Yu Lin, Ming-Hsuan Yang

In this work, we propose an object-level point augmentor (OPA) that performs local transformations for semi-supervised 3D object detection.

3D Object Detection Knowledge Distillation +4

Paper
Code

D2-Net: Weakly-Supervised Action Localization via Discriminative Embeddings and Denoised Activations

1 code implementation • ICCV 2021 • Sanath Narayan, Hisham Cholakkal, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, Ling Shao

The proposed formulation comprises a discriminative and a denoising loss term for enhancing temporal action localization.

Ranked #3 on Weakly Supervised Action Localization on THUMOS’14

Denoising Weakly Supervised Action Localization +2

Paper
Code

Understanding Synonymous Referring Expressions via Contrastive Features

1 code implementation • 20 Apr 2021 • Yi-Wen Chen, Yi-Hsuan Tsai, Ming-Hsuan Yang

While prior work usually treats each sentence and attends it to an object separately, we focus on learning a referring expression comprehension model that considers the property in synonymous sentences.

Object Referring Expression +3

Paper
Code

RTracker: Recoverable Tracking via PN Tree Structured Memory

1 code implementation • 28 Mar 2024 • Yuqing Huang, Xin Li, Zikun Zhou, YaoWei Wang, Zhenyu He, Ming-Hsuan Yang

Upon the PN tree memory, we develop corresponding walking rules for determining the state of the target and define a set of control flows to unite the tracker and the detector in different tracking scenarios.

Paper
Code

Progressive Multi-resolution Loss for Crowd Counting

1 code implementation • 8 Dec 2022 • Ziheng Yan, Yuankai Qi, Guorong Li, Xinyan Liu, Weigang Zhang, Qingming Huang, Ming-Hsuan Yang

Crowd counting is usually handled in a density map regression fashion, which is supervised via a L2 loss between the predicted density map and ground truth.

Crowd Counting

Paper
Code

Online multi-object tracking via robust collaborative model and sample selection

1 code implementation • Computer Vision and Image Understanding 2017 • Mohamed A. Naiel, M. Omair Ahmad, M.N.S. Swamy, Jongwoo Lim, Ming-Hsuan Yang

For each frame, we construct an association between detections and trackers, and treat each detected image region as a key sample, for online update, if it is associated to a tracker.

Ranked #1 on Online Multi-Object Tracking on Oxford Town Center

Multi-Object Tracking Object +3

Paper
Code

Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution

1 code implementation • CVPR 2017 • Wei-Sheng Lai, Jia-Bin Huang, Narendra Ahuja, Ming-Hsuan Yang

Convolutional neural networks have recently demonstrated high-quality reconstruction for single-image super-resolution.

Ranked #40 on Image Super-Resolution on BSD100 - 4x upscaling

Image Super-Resolution

Paper
Code

Dynamic Erasing Network Based on Multi-Scale Temporal Features for Weakly Supervised Video Anomaly Detection

1 code implementation • 4 Dec 2023 • Chen Zhang, Guorong Li, Yuankai Qi, Hanhua Ye, Laiyun Qing, Ming-Hsuan Yang, Qingming Huang

To address these limitations, we propose a Dynamic Erasing Network (DE-Net) for weakly supervised video anomaly detection, which learns multi-scale temporal features.

Anomaly Detection Video Anomaly Detection

Paper
Code

Exploring Cross-Video and Cross-Modality Signals for Weakly-Supervised Audio-Visual Video Parsing

1 code implementation • NeurIPS 2021 • Yan-Bo Lin, Hung-Yu Tseng, Hsin-Ying Lee, Yen-Yu Lin, Ming-Hsuan Yang

The audio-visual video parsing task aims to temporally parse a video into audio or visual event categories.

Paper
Code

Structured Sparsification with Joint Optimization of Group Convolution and Channel Shuffle

1 code implementation • 19 Feb 2020 • Xin-Yu Zhang, Kai Zhao, Taihong Xiao, Ming-Ming Cheng, Ming-Hsuan Yang

Recent advances in convolutional neural networks(CNNs) usually come with the expense of excessive computational overhead and memory footprint.

Network Pruning

Paper
Code

Learnable Cost Volume Using the Cayley Representation

1 code implementation • ECCV 2020 • Taihong Xiao, Jinwei Yuan, Deqing Sun, Qifei Wang, Xin-Yu Zhang, Kehan Xu, Ming-Hsuan Yang

Cost volume is an essential component of recent deep models for optical flow estimation and is usually constructed by calculating the inner product between two feature vectors.

Optical Flow Estimation

Paper
Code

Learning Discriminative Shrinkage Deep Networks for Image Deconvolution

1 code implementation • 27 Nov 2021 • Pin-Hung Kuo, Jinshan Pan, Shao-Yi Chien, Ming-Hsuan Yang

Most existing methods usually formulate the non-blind deconvolution problem into a maximum-a-posteriori framework and address it by manually designing kinds of regularization terms and data terms of the latent clear images.

Image Deconvolution Image Restoration

Paper
Code

Towards 4D Human Video Stylization

1 code implementation • 7 Dec 2023 • Tiantian Wang, Xinxin Zuo, Fangzhou Mu, Jian Wang, Ming-Hsuan Yang

To overcome these limitations, we leverage Neural Radiance Fields (NeRFs) to represent videos, conducting stylization in the rendered feature space.

Novel View Synthesis Style Transfer +1

Paper
Code

Parallel Coordinate Descent Newton Method for Efficient $\ell_1$-Regularized Minimization

1 code implementation • 18 Jun 2013 • An Bian, Xiong Li, Yuncai Liu, Ming-Hsuan Yang

We show that: (1) PCDN is guaranteed to converge globally despite increasing parallelism; (2) PCDN converges to the specified accuracy $\epsilon$ within the limited iteration number of $T_\epsilon$, and $T_\epsilon$ decreases with increasing parallelism (bundle size $P$).

Paper
Code

Unsupervised Discovery of Disentangled Manifolds in GANs

1 code implementation • 24 Nov 2020 • Yu-Ding Lu, Hsin-Ying Lee, Hung-Yu Tseng, Ming-Hsuan Yang

Interpretable generation process is beneficial to various image editing applications.

Attribute

Paper
Code

Training Class-Imbalanced Diffusion Model Via Overlap Optimization

1 code implementation • 16 Feb 2024 • Divin Yan, Lu Qi, Vincent Tao Hu, Ming-Hsuan Yang, Meng Tang

To address the observed appearance overlap between synthesized images of rare classes and tail classes, we propose a method based on contrastive learning to minimize the overlap between distributions of synthetic images for different classes.

Contrastive Learning Image Generation

Paper
Code

Weakly Supervised Video Individual CountingWeakly Supervised Video Individual Counting

1 code implementation • 10 Dec 2023 • Xinyan Liu, Guorong Li, Yuankai Qi, Ziheng Yan, Zhenjun Han, Anton Van Den Hengel, Ming-Hsuan Yang, Qingming Huang

% To provide a more realistic reflection of the underlying practical challenge, we introduce a weakly supervised VIC task, wherein trajectory labels are not provided.

Contrastive Learning Video Individual Counting

Paper
Code

Learning to Deblur Images with Exemplars

no code implementations • 15 May 2018 • Jinshan Pan, Wenqi Ren, Zhe Hu, Ming-Hsuan Yang

However, existing methods are less effective as only few edges can be restored from blurry face images for kernel estimation.

Deblurring Image Deblurring

Paper
Add Code

Learning Dual Convolutional Neural Networks for Low-Level Vision

no code implementations • CVPR 2018 • Jinshan Pan, Sifei Liu, Deqing Sun, Jiawei Zhang, Yang Liu, Jimmy Ren, Zechao Li, Jinhui Tang, Huchuan Lu, Yu-Wing Tai, Ming-Hsuan Yang

These problems usually involve the estimation of two components of the target signals: structures and details.

Rain Removal Super-Resolution

Paper
Add Code

Visual Tracking via Dynamic Graph Learning

no code implementations • 4 Oct 2017 • Chenglong Li, Liang Lin, WangMeng Zuo, Jin Tang, Ming-Hsuan Yang

First, the graph is initialized by assigning binary weights of some image patches to indicate the object and background patches according to the predicted bounding box.

Graph Learning Object +2

Paper
Add Code

VITAL: VIsual Tracking via Adversarial Learning

no code implementations • CVPR 2018 • Yibing Song, Chao Ma, Xiaohe Wu, Lijun Gong, Linchao Bao, WangMeng Zuo, Chunhua Shen, Rynson Lau, Ming-Hsuan Yang

To augment positive samples, we use a generative network to randomly generate masks, which are applied to adaptively dropout input features to capture a variety of appearance changes.

General Classification Visual Tracking

Paper
Add Code

Learning a Discriminative Prior for Blind Image Deblurring

no code implementations • CVPR 2018 • Lerenhan Li, Jinshan Pan, Wei-Sheng Lai, Changxin Gao, Nong Sang, Ming-Hsuan Yang

We present an effective blind image deblurring method based on a data-driven discriminative prior. Our work is motivated by the fact that a good image prior should favor clear images over blurred images. In this work, we formulate the image prior as a binary classifier which can be achieved by a deep convolutional neural network (CNN). The learned prior is able to distinguish whether an input image is clear or not. Embedded into the maximum a posterior (MAP) framework, it helps blind deblurring in various scenarios, including natural, face, text, and low-illumination images. However, it is difficult to optimize the deblurring method with the learned image prior as it involves a non-linear CNN. Therefore, we develop an efficient numerical approach based on the half-quadratic splitting method and gradient decent algorithm to solve the proposed model. Furthermore, the proposed model can be easily extended to non-uniform deblurring. Both qualitative and quantitative experimental results show that our method performs favorably against state-of-the-art algorithms as well as domain-specific image deblurring approaches.

Blind Image Deblurring Image Deblurring

Paper
Add Code

Gated Fusion Network for Single Image Dehazing

no code implementations • CVPR 2018 • Wenqi Ren, Lin Ma, Jiawei Zhang, Jinshan Pan, Xiaochun Cao, Wei Liu, Ming-Hsuan Yang

The proposed algorithm hinges on an end-to-end trainable neural network that consists of an encoder and a decoder.

Ranked #23 on Image Dehazing on SOTS Outdoor

Decoder Image Dehazing +1

Paper
Add Code

Deep Semantic Face Deblurring

no code implementations • CVPR 2018 • Ziyi Shen, Wei-Sheng Lai, Tingfa Xu, Jan Kautz, Ming-Hsuan Yang

In this paper, we present an effective and efficient face deblurring algorithm by exploiting semantic cues via deep convolutional neural networks (CNNs).

Deblurring Face Recognition

Paper
Add Code

Learning to Localize Sound Source in Visual Scenes

no code implementations • CVPR 2018 • Arda Senocak, Tae-Hyun Oh, Junsik Kim, Ming-Hsuan Yang, In So Kweon

We show that even with a few supervision, false conclusion is able to be corrected and the source of sound in a visual scene can be localized effectively.

Paper
Add Code

Learning Video-Story Composition via Recurrent Neural Network

no code implementations • 31 Jan 2018 • Guangyu Zhong, Yi-Hsuan Tsai, Sifei Liu, Zhixun Su, Ming-Hsuan Yang

In this paper, we propose a learning-based method to compose a video-story from a group of video clips that describe an activity or experience.

Paper
Add Code

Generative Single Image Reflection Separation

no code implementations • 12 Jan 2018 • Donghoon Lee, Ming-Hsuan Yang, Songhwai Oh

Single image reflection separation is an ill-posed problem since two scenes, a transmitted scene and a reflected scene, need to be inferred from a single observation.

Paper
Add Code

Learning Binary Residual Representations for Domain-specific Video Streaming

no code implementations • 14 Dec 2017 • Yi-Hsuan Tsai, Ming-Yu Liu, Deqing Sun, Ming-Hsuan Yang, Jan Kautz

Specifically, we target a streaming setting where the videos to be streamed from a server to a client are all in the same domain and they have to be compressed to a small size for low-latency transmission.

Video Compression

Paper
Add Code

Joint Image Filtering with Deep Convolutional Networks

no code implementations • 11 Oct 2017 • Yijun Li, Jia-Bin Huang, Narendra Ahuja, Ming-Hsuan Yang

In contrast to existing methods that consider only the guidance image, the proposed algorithm can selectively transfer salient structures that are consistent with both guidance and target images.

Paper
Add Code

Semantic-driven Generation of Hyperlapse from $360^\circ$ Video

no code implementations • 31 Mar 2017 • Wei-Sheng Lai, Yujia Huang, Neel Joshi, Chris Buehler, Ming-Hsuan Yang, Sing Bing Kang

We present a system for converting a fully panoramic ($360^\circ$) video into a normal field-of-view (NFOV) hyperlapse for an optimal viewing experience.

Video Stabilization

Paper
Add Code

Learning Affinity via Spatial Propagation Networks

no code implementations • NeurIPS 2017 • Sifei Liu, Shalini De Mello, Jinwei Gu, Guangyu Zhong, Ming-Hsuan Yang, Jan Kautz

Specifically, we develop a three-way connection for the linear propagation model, which (a) formulates a sparse transformation matrix, where all elements can be the output from a deep CNN, but (b) results in a dense affinity matrix that effectively models any task-specific pairwise similarity matrix.

Colorization Face Parsing +4

Paper
Add Code

Learning to Segment Instances in Videos with Spatial Propagation Network

no code implementations • 14 Sep 2017 • Jingchun Cheng, Sifei Liu, Yi-Hsuan Tsai, Wei-Chih Hung, Shalini De Mello, Jinwei Gu, Jan Kautz, Shengjin Wang, Ming-Hsuan Yang

In addition, we apply a filter on the refined score map that aims to recognize the best connected region using spatial and temporal consistencies in the video.

Object Segmentation +1

Paper
Add Code

Stylizing Face Images via Multiple Exemplars

no code implementations • 28 Aug 2017 • Yibing Song, Linchao Bao, Shengfeng He, Qingxiong Yang, Ming-Hsuan Yang

We address the problem of transferring the style of a headshot photo to face images.

Paper
Add Code

Video Deblurring via Semantic Segmentation and Pixel-Wise Non-Linear Kernel

no code implementations • ICCV 2017 • Wenqi Ren, Jinshan Pan, Xiaochun Cao, Ming-Hsuan Yang

We analyze the relationship between motion blur trajectory and optical flow, and present a novel pixel-wise non-linear kernel model to account for motion blur.

Deblurring Optical Flow Estimation +1

Paper
Add Code

Unsupervised Domain Adaptation for Face Recognition in Unlabeled Videos

no code implementations • ICCV 2017 • Kihyuk Sohn, Sifei Liu, Guangyu Zhong, Xiang Yu, Ming-Hsuan Yang, Manmohan Chandraker

Despite rapid advances in face recognition, there remains a clear gap between the performance of still image-based face recognition and video-based face recognition, due to the vast difference in visual quality between the domains and the difficulty of curating diverse large-scale video datasets.

Data Augmentation Face Recognition +1

Paper
Add Code

Face Parsing via Recurrent Propagation

no code implementations • 6 Aug 2017 • Sifei Liu, Jianping Shi, Ji Liang, Ming-Hsuan Yang

Face parsing is an important problem in computer vision that finds numerous applications including recognition and editing.

Face Parsing

Paper
Add Code

CREST: Convolutional Residual Learning for Visual Tracking

no code implementations • ICCV 2017 • Yibing Song, Chao Ma, Lijun Gong, Jiawei Zhang, Rynson Lau, Ming-Hsuan Yang

Our method integrates feature extraction, response map generation as well as model update into the neural networks for an end-to-end training.

Visual Tracking

Paper
Add Code

Learning Structured Semantic Embeddings for Visual Recognition

no code implementations • 5 Jun 2017 • Dong Li, Hsin-Ying Lee, Jia-Bin Huang, Shengjin Wang, Ming-Hsuan Yang

First, we exploit the discriminative constraints to capture the intra- and inter-class relationships of image embeddings.

General Classification Multi-Label Classification +2

Paper
Add Code

Diversified Texture Synthesis with Feed-forward Networks

no code implementations • CVPR 2017 • Yijun Li, Chen Fang, Jimei Yang, Zhaowen Wang, Xin Lu, Ming-Hsuan Yang

Recent progresses on deep discriminative and generative modeling have shown promising results on texture synthesis.

Texture Synthesis

Paper
Add Code

Learning Fully Convolutional Networks for Iterative Non-blind Deconvolution

no code implementations • CVPR 2017 • Jiawei Zhang, Jinshan Pan, Wei-Sheng Lai, Rynson Lau, Ming-Hsuan Yang

In this paper, we propose a fully convolutional networks for iterative non-blind deconvolution We decompose the non-blind deconvolution problem into image denoising and image deconvolution.

Image Deconvolution Image Denoising

Paper
Add Code

Visual Tracking via Boolean Map Representations

no code implementations • 30 Oct 2016 • Kaihua Zhang, Qingshan Liu, Ming-Hsuan Yang

In this paper, we present a simple yet effective Boolean map based representation that exploits connectivity cues for visual tracking.

Visual Tracking

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.