Search Results for author: Ming-Hsuan Yang

Found 353 papers, 176 papers with code

Video Object Detection via Object-level Temporal Aggregation

no code implementations • ECCV 2020 • Chun-Han Yao, Chen Fang, Xiaohui Shen, Yangyue Wan, Ming-Hsuan Yang

While single-image object detectors can be naively applied to videos in a frame-by-frame fashion, the prediction is often temporally inconsistent.

Object object-detection +2

Paper
Add Code

Adversarial Training with Bi-directional Likelihood Regularization for Visual Classification

no code implementations • ECCV 2020 • Weitao Wan, Jiansheng Chen, Ming-Hsuan Yang

We call such a new robust training strategy the adversarial training with bi-directional likelihood regularization (ATBLR) method.

Classification General Classification +1

Paper
Add Code

Improving Subject-Driven Image Synthesis with Subject-Agnostic Guidance

no code implementations • 2 May 2024 • Kelvin C. K. Chan, Yang Zhao, Xuhui Jia, Ming-Hsuan Yang, Huisheng Wang

In subject-driven text-to-image synthesis, the synthesis process tends to be heavily influenced by the reference images provided by users, often overlooking crucial attributes detailed in the text prompt.

Image Generation

Paper
Add Code

Motion-adaptive Separable Collaborative Filters for Blind Motion Deblurring

no code implementations • 19 Apr 2024 • Chengxu Liu, Xuan Wang, Xiangyu Xu, Ruhao Tian, Shuai Li, Xueming Qian, Ming-Hsuan Yang

In particular, we use a motion estimation network to capture motion information from neighborhoods, thereby adaptively estimating spatially-variant motion flow, mask, kernels, weights, and offsets to obtain the MISC Filter.

Deblurring Motion Estimation

Paper
Add Code

AdaIR: Exploiting Underlying Similarities of Image Restoration Tasks with Adapters

no code implementations • 17 Apr 2024 • Hao-Wei Chen, Yu-Syuan Xu, Kelvin C. K. Chan, Hsien-Kai Kuo, Chun-Yi Lee, Ming-Hsuan Yang

Towards this goal, we propose AdaIR, a novel framework that enables low storage cost and efficient training without sacrificing performance.

Image Restoration

Paper
Add Code

Taming Latent Diffusion Model for Neural Radiance Field Inpainting

no code implementations • 15 Apr 2024 • Chieh Hubert Lin, Changil Kim, Jia-Bin Huang, Qinbo Li, Chih-Yao Ma, Johannes Kopf, Ming-Hsuan Yang, Hung-Yu Tseng

These two problems are further reinforced with the use of pixel-distance losses.

3D Reconstruction

Paper
Add Code

No More Ambiguity in 360° Room Layout via Bi-Layout Estimation

no code implementations • 15 Apr 2024 • Yu-Ju Tsai, Jin-Cheng Jhang, Jingjing Zheng, Wei Wang, Albert Y. C. Chen, Min Sun, Cheng-Hao Kuo, Ming-Hsuan Yang

A unique property of our Bi-Layout model is its ability to inherently detect ambiguous regions by comparing the two predictions.

Room Layout Estimation

Paper
Add Code

Gaga: Group Any Gaussians via 3D-aware Memory Bank

no code implementations • 11 Apr 2024 • Weijie Lyu, Xueting Li, Abhijit Kundu, Yi-Hsuan Tsai, Ming-Hsuan Yang

We introduce Gaga, a framework that reconstructs and segments open-world 3D scenes by leveraging inconsistent 2D masks predicted by zero-shot segmentation models.

Scene Segmentation Scene Understanding +3

Paper
Add Code

Mansformer: Efficient Transformer of Mixed Attention for Image Deblurring and Beyond

no code implementations • 9 Apr 2024 • Pin-Hung Kuo, Jinshan Pan, Shao-Yi Chien, Ming-Hsuan Yang

By elaborate adjustment of the tensor shapes and dimensions for the dot product, we split the typical self-attention of quadratic complexity into four operations of linear complexity.

Deblurring Image Deblurring

Paper
Add Code

Spatial-Temporal Multi-level Association for Video Object Segmentation

no code implementations • 9 Apr 2024 • Deshui Miao, Xin Li, Zhenyu He, Huchuan Lu, Ming-Hsuan Yang

In addition, we propose a spatial-temporal memory to assist feature association and temporal ID assignment and correlation.

Object Segmentation +3

Paper
Add Code

HENet: Hybrid Encoding for End-to-end Multi-task 3D Perception from Multi-view Cameras

1 code implementation • 3 Apr 2024 • Zhongyu Xia, Zhiwei Lin, Xinhao Wang, Yongtao Wang, Yun Xing, Shengxiang Qi, Nan Dong, Ming-Hsuan Yang

Three-dimensional perception from multi-view cameras is a crucial component in autonomous driving systems, which involves multiple tasks like 3D object detection and bird's-eye-view (BEV) semantic segmentation.

3D Object Detection Autonomous Driving +2

Paper
Code

Dynamic Pre-training: Towards Efficient and Scalable All-in-One Image Restoration

1 code implementation • 2 Apr 2024 • Akshay Dudhane, Omkar Thawakar, Syed Waqas Zamir, Salman Khan, Fahad Shahbaz Khan, Ming-Hsuan Yang

All-in-one image restoration tackles different types of degradations with a unified model instead of having task-specific, non-generic models for each degradation.

Decoder Image Denoising +2

Paper
Code

RTracker: Recoverable Tracking via PN Tree Structured Memory

1 code implementation • 28 Mar 2024 • Yuqing Huang, Xin Li, Zikun Zhou, YaoWei Wang, Zhenyu He, Ming-Hsuan Yang

Upon the PN tree memory, we develop corresponding walking rules for determining the state of the target and define a set of control flows to unite the tracker and the detector in different tracking scenarios.

Paper
Code

Efficient Video Object Segmentation via Modulated Cross-Attention Memory

1 code implementation • 26 Mar 2024 • Abdelrahman Shaker, Syed Talal Wasim, Martin Danelljan, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan

Recently, transformer-based approaches have shown promising results for semi-supervised video object segmentation.

Object Segmentation +3

Paper
Code

Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

no code implementations • 29 Feb 2024 • Tsai-Shien Chen, Aliaksandr Siarohin, Willi Menapace, Ekaterina Deyneka, Hsiang-wei Chao, Byung Eun Jeon, Yuwei Fang, Hsin-Ying Lee, Jian Ren, Ming-Hsuan Yang, Sergey Tulyakov

Next, we finetune a retrieval model on a small subset where the best caption of each video is manually selected and then employ the model in the whole dataset to select the best caption as the annotation.

Retrieval Text Retrieval +3

Paper
Add Code

Interactive Multi-Head Self-Attention with Linear Complexity

no code implementations • 27 Feb 2024 • Hankyul Kang, Ming-Hsuan Yang, Jongbin Ryu

In this work, we propose an effective method to decompose the attention operation into query- and key-less components.

Paper
Add Code

Scene Prior Filtering for Depth Map Super-Resolution

no code implementations • 21 Feb 2024 • Zhengxue Wang, Zhiqiang Yan, Ming-Hsuan Yang, Jinshan Pan, Jian Yang, Ying Tai, Guangwei Gao

Specifically, we design an All-in-one Prior Propagation that computes the similarity between multi-modal scene priors, i. e., RGB, normal, semantic, and depth, to reduce the texture interference.

Depth Map Super-Resolution

Paper
Add Code

VideoPrism: A Foundational Visual Encoder for Video Understanding

no code implementations • 20 Feb 2024 • Long Zhao, Nitesh B. Gundavarapu, Liangzhe Yuan, Hao Zhou, Shen Yan, Jennifer J. Sun, Luke Friedman, Rui Qian, Tobias Weyand, Yue Zhao, Rachel Hornung, Florian Schroff, Ming-Hsuan Yang, David A. Ross, Huisheng Wang, Hartwig Adam, Mikhail Sirotenko, Ting Liu, Boqing Gong

We introduce VideoPrism, a general-purpose video encoder that tackles diverse video understanding tasks with a single frozen model.

Question Answering Video Question Answering +1

Paper
Add Code

StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing

no code implementations • 20 Feb 2024 • Gaoxiang Cong, Yuankai Qi, Liang Li, Amin Beheshti, Zhedong Zhang, Anton Van Den Hengel, Ming-Hsuan Yang, Chenggang Yan, Qingming Huang

It contains three main components: (1) A multimodal style adaptor operating at the phoneme level to learn pronunciation style from the reference audio, and generate intermediate representations informed by the facial emotion presented in the video; (2) An utterance-level style learning module, which guides both the mel-spectrogram decoding and the refining processes from the intermediate embeddings to improve the overall style expression; And (3) a phoneme-guided lip aligner to maintain lip sync.

Voice Cloning

Paper
Add Code

Training Class-Imbalanced Diffusion Model Via Overlap Optimization

1 code implementation • 16 Feb 2024 • Divin Yan, Lu Qi, Vincent Tao Hu, Ming-Hsuan Yang, Meng Tang

To address the observed appearance overlap between synthesized images of rare classes and tail classes, we propose a method based on contrastive learning to minimize the overlap between distributions of synthetic images for different classes.

Contrastive Learning Image Generation

Paper
Code

GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting

no code implementations • 11 Feb 2024 • Xiaoyu Zhou, Xingjian Ran, Yajiao Xiong, Jinlin He, Zhiwei Lin, Yongtao Wang, Deqing Sun, Ming-Hsuan Yang

We present GALA3D, generative 3D GAussians with LAyout-guided control, for effective compositional text-to-3D generation.

3D Generation Scene Generation +1

Paper
Add Code

Generalizable Entity Grounding via Assistance of Large Language Model

no code implementations • 4 Feb 2024 • Lu Qi, Yi-Wen Chen, Lehan Yang, Tiancheng Shen, Xiangtai Li, Weidong Guo, Yu Xu, Ming-Hsuan Yang

In this work, we propose a novel approach to densely ground visual entities from a long caption.

Language Modelling Large Language Model +4

Paper
Add Code

PromptRR: Diffusion Models as Prompt Generators for Single Image Reflection Removal

1 code implementation • 4 Feb 2024 • Tao Wang, Wanglong Lu, Kaihao Zhang, Wenhan Luo, Tae-Kyun Kim, Tong Lu, Hongdong Li, Ming-Hsuan Yang

For the prompt generation, we first propose a prompt pre-training strategy to train a frequency prompt encoder that encodes the ground-truth image into LF and HF prompts.

Reflection Removal

Paper
Code

RAP-SAM: Towards Real-Time All-Purpose Segment Anything

1 code implementation • 18 Jan 2024 • Shilin Xu, Haobo Yuan, Qingyu Shi, Lu Qi, Jingbo Wang, Yibo Yang, Yining Li, Kai Chen, Yunhai Tong, Bernard Ghanem, Xiangtai Li, Ming-Hsuan Yang

Segment Anything Model (SAM) is one remarkable model that can achieve generalized segmentation.

Decoder Interactive Segmentation +4

190

Paper
Code

Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding

no code implementations • 31 Dec 2023 • Syed Talal Wasim, Muzammal Naseer, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan

Our contributions include a novel spatio-temporal video grounding model, surpassing state-of-the-art results in closed-set evaluations on multiple datasets and demonstrating superior performance in open-vocabulary scenarios.

Spatio-Temporal Video Grounding Video Grounding +1

Paper
Add Code

VideoPoet: A Large Language Model for Zero-Shot Video Generation

no code implementations • 21 Dec 2023 • Dan Kondratyuk, Lijun Yu, Xiuye Gu, José Lezama, Jonathan Huang, Grant Schindler, Rachel Hornung, Vighnesh Birodkar, Jimmy Yan, Ming-Chang Chiu, Krishna Somandepalli, Hassan Akbari, Yair Alon, Yong Cheng, Josh Dillon, Agrim Gupta, Meera Hahn, Anja Hauth, David Hendon, Alonso Martinez, David Minnen, Mikhail Sirotenko, Kihyuk Sohn, Xuan Yang, Hartwig Adam, Ming-Hsuan Yang, Irfan Essa, Huisheng Wang, David A. Ross, Bryan Seybold, Lu Jiang

We present VideoPoet, a language model capable of synthesizing high-quality video, with matching audio, from a large variety of conditioning signals.

Ranked #3 on Text-to-Video Generation on MSR-VTT

Decoder Language Modelling +3

Paper
Add Code

VidToMe: Video Token Merging for Zero-Shot Video Editing

no code implementations • 17 Dec 2023 • Xirui Li, Chao Ma, Xiaokang Yang, Ming-Hsuan Yang

In this work, we propose a novel approach to enhance temporal consistency in generated videos by merging self-attention tokens across frames.

Video Editing Video Generation

Paper
Add Code

DrivingGaussian: Composite Gaussian Splatting for Surrounding Dynamic Autonomous Driving Scenes

1 code implementation • 13 Dec 2023 • Xiaoyu Zhou, Zhiwei Lin, Xiaojun Shan, Yongtao Wang, Deqing Sun, Ming-Hsuan Yang

We present DrivingGaussian, an efficient and effective framework for surrounding dynamic autonomous driving scenes.

Autonomous Driving

111

Paper
Code

PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection

1 code implementation • 13 Dec 2023 • Kuan-Chih Huang, Weijie Lyu, Ming-Hsuan Yang, Yi-Hsuan Tsai

Recent temporal LiDAR-based 3D object detectors achieve promising performance based on the two-stage proposal-based approach.

3D Object Detection object-detection

Paper
Code

Weakly Supervised 3D Object Detection via Multi-Level Visual Guidance

no code implementations • 12 Dec 2023 • Kuan-Chih Huang, Yi-Hsuan Tsai, Ming-Hsuan Yang

Finally, the training-level constraint is utilized by producing accurate and consistent 3D pseudo-labels that align with the visual data.

3D Object Detection object-detection

Paper
Add Code

Exploring Plain ViT Reconstruction for Multi-class Unsupervised Anomaly Detection

1 code implementation • 12 Dec 2023 • Jiangning Zhang, Xuhai Chen, Yabiao Wang, Chengjie Wang, Yong liu, Xiangtai Li, Ming-Hsuan Yang, DaCheng Tao

Following this spirit, this paper explores plain ViT architecture for MUAD.

Unsupervised Anomaly Detection

Paper
Code

Weakly Supervised Video Individual CountingWeakly Supervised Video Individual Counting

1 code implementation • 10 Dec 2023 • Xinyan Liu, Guorong Li, Yuankai Qi, Ziheng Yan, Zhenjun Han, Anton Van Den Hengel, Ming-Hsuan Yang, Qingming Huang

% To provide a more realistic reflection of the underlying practical challenge, we introduce a weakly supervised VIC task, wherein trajectory labels are not provided.

Contrastive Learning Video Individual Counting

Paper
Code

CSL: Class-Agnostic Structure-Constrained Learning for Segmentation Including the Unseen

no code implementations • 9 Dec 2023 • Hao Zhang, Fang Li, Lu Qi, Ming-Hsuan Yang, Narendra Ahuja

Addressing Out-Of-Distribution (OOD) Segmentation and Zero-Shot Semantic Segmentation (ZS3) is challenging, necessitating segmenting unseen classes.

Domain Adaptation Segmentation +2

Paper
Add Code

Towards 4D Human Video Stylization

1 code implementation • 7 Dec 2023 • Tiantian Wang, Xinxin Zuo, Fangzhou Mu, Jian Wang, Ming-Hsuan Yang

To overcome these limitations, we leverage Neural Radiance Fields (NeRFs) to represent videos, conducting stylization in the rendered feature space.

Novel View Synthesis Style Transfer +1

Paper
Code

Fine-grained Controllable Video Generation via Object Appearance and Context

no code implementations • 5 Dec 2023 • Hsin-Ping Huang, Yu-Chuan Su, Deqing Sun, Lu Jiang, Xuhui Jia, Yukun Zhu, Ming-Hsuan Yang

To achieve detailed control, we propose a unified framework to jointly inject control signals into the existing text-to-video model.

Text-to-Video Generation Video Generation

Paper
Add Code

Diffusion-SS3D: Diffusion Model for Semi-supervised 3D Object Detection

1 code implementation • NeurIPS 2023 • Cheng-Ju Ho, Chen-Hsuan Tai, Yen-Yu Lin, Ming-Hsuan Yang, Yi-Hsuan Tsai

Semi-supervised object detection is crucial for 3D scene understanding, efficiently addressing the limitation of acquiring large-scale 3D bounding box annotations.

3D Object Detection Denoising +5

Paper
Code

DreaMo: Articulated 3D Reconstruction From A Single Casual Video

no code implementations • 5 Dec 2023 • Tao Tu, Ming-Feng Li, Chieh Hubert Lin, Yen-Chi Cheng, Min Sun, Ming-Hsuan Yang

In this work, we study articulated 3D shape reconstruction from a single and casually captured internet video, where the subject's view coverage is incomplete.

3D Reconstruction 3D Shape Reconstruction

Paper
Add Code

UniGS: Unified Representation for Image Generation and Segmentation

1 code implementation • 4 Dec 2023 • Lu Qi, Lehan Yang, Weidong Guo, Yu Xu, Bo Du, Varun Jampani, Ming-Hsuan Yang

On the other hand, the progressive dichotomy module can efficiently decode the synthesized colormap to high-quality entity-level masks in a depth-first binary search without knowing the cluster numbers.

Image Generation Segmentation

668

Paper
Code

Effective Adapter for Face Recognition in the Wild

no code implementations • 4 Dec 2023 • Yunhao Liu, Yu-Ju Tsai, Kelvin C. K. Chan, Xiangtai Li, Lu Qi, Ming-Hsuan Yang

Traditional heuristic approaches-either training models directly on these degraded images or their enhanced counterparts using face restoration techniques-have proven ineffective, primarily due to the degradation of facial features and the discrepancy in image domains.

Face Recognition

Paper
Add Code

Multi-task Image Restoration Guided By Robust DINO Features

no code implementations • 4 Dec 2023 • Xin Lin, Chao Ren, Kelvin C. K. Chan, Lu Qi, Jinshan Pan, Ming-Hsuan Yang

Multi-task image restoration has gained significant interest due to its inherent versatility and efficiency compared to its single-task counterpart.

Image Restoration

Paper
Add Code

Dynamic Erasing Network Based on Multi-Scale Temporal Features for Weakly Supervised Video Anomaly Detection

1 code implementation • 4 Dec 2023 • Chen Zhang, Guorong Li, Yuankai Qi, Hanhua Ye, Laiyun Qing, Ming-Hsuan Yang, Qingming Huang

To address these limitations, we propose a Dynamic Erasing Network (DE-Net) for weakly supervised video anomaly detection, which learns multi-scale temporal features.

Anomaly Detection Video Anomaly Detection

Paper
Code

Exploiting Diffusion Prior for Generalizable Dense Prediction

2 code implementations • 30 Nov 2023 • Hung-Yu Tseng, Hsin-Ying Lee, Ming-Hsuan Yang

Contents generated by recent advanced Text-to-Image (T2I) diffusion models are sometimes too imaginative for existing off-the-shelf dense predictors to estimate due to the immitigable domain gap.

Intrinsic Image Decomposition Semantic Segmentation

Paper
Code

Telling Left from Right: Identifying Geometry-Aware Semantic Correspondence

1 code implementation • 28 Nov 2023 • Junyi Zhang, Charles Herrmann, Junhwa Hur, Eric Chen, Varun Jampani, Deqing Sun, Ming-Hsuan Yang

This paper identifies the importance of being geometry-aware for semantic correspondence and reveals a limitation of the features of current foundation models under simple post-processing.

Ranked #1 on Semantic correspondence on PF-PASCAL

Animal Pose Estimation Semantic correspondence

Paper
Code

Text-Driven Image Editing via Learnable Regions

1 code implementation • 28 Nov 2023 • Yuanze Lin, Yi-Wen Chen, Yi-Hsuan Tsai, Lu Jiang, Ming-Hsuan Yang

Language has emerged as a natural interface for image editing.

Image Generation

Paper
Code

Pyramid Diffusion for Fine 3D Large Scene Generation

1 code implementation • 20 Nov 2023 • Yuheng Liu, Xinke Li, Xueting Li, Lu Qi, Chongshou Li, Ming-Hsuan Yang

Directly transferring the 2D techniques to 3D scene generation is challenging due to significant resolution reduction and the scarcity of comprehensive real-world 3D scene datasets.

Scene Generation

Paper
Code

Rethinking Evaluation Metrics of Open-Vocabulary Segmentaion

1 code implementation • 6 Nov 2023 • Hao Zhou, Tiancheng Shen, Xu Yang, Hai Huang, Xiangtai Li, Lu Qi, Ming-Hsuan Yang

We benchmarked the proposed evaluation metrics on 12 open-vocabulary methods of three segmentation tasks.

Segmentation

668

Paper
Code

GLaMM: Pixel Grounding Large Multimodal Model

1 code implementation • 6 Nov 2023 • Hanoona Rasheed, Muhammad Maaz, Sahal Shaji Mullappilly, Abdelrahman Shaker, Salman Khan, Hisham Cholakkal, Rao M. Anwer, Erix Xing, Ming-Hsuan Yang, Fahad S. Khan

In this work, we present Grounding LMM (GLaMM), the first model that can generate natural language responses seamlessly intertwined with corresponding object segmentation masks.

Conversational Question Answering Image Captioning +5

582

Paper
Code

One-for-All: Towards Universal Domain Translation with a Single StyleGAN

no code implementations • 22 Oct 2023 • Yong Du, Jiahui Zhan, Shengfeng He, Xinzhe Li, Junyu Dong, Sheng Chen, Ming-Hsuan Yang

In this paper, we propose a novel translation model, UniTranslator, for transforming representations between visually distinct domains under conditions of limited training data and significant visual differences.

Translation

Paper
Add Code

Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation

no code implementations • 9 Oct 2023 • Lijun Yu, José Lezama, Nitesh B. Gundavarapu, Luca Versari, Kihyuk Sohn, David Minnen, Yong Cheng, Vighnesh Birodkar, Agrim Gupta, Xiuye Gu, Alexander G. Hauptmann, Boqing Gong, Ming-Hsuan Yang, Irfan Essa, David A. Ross, Lu Jiang

While Large Language Models (LLMs) are the dominant models for generative tasks in language, they do not perform as well as diffusion models on image and video generation.

Ranked #2 on Video Prediction on Kinetics-600 12 frames, 64x64

Action Recognition Image Generation +4

Paper
Add Code

Video Timeline Modeling For News Story Understanding

1 code implementation • NeurIPS 2023 • Meng Liu, Mingda Zhang, Jialu Liu, Hanjun Dai, Ming-Hsuan Yang, Shuiwang Ji, Zheyun Feng, Boqing Gong

In this paper, we present a novel problem, namely video timeline modeling.

32,880

Paper
Code

SAMPLING: Scene-adaptive Hierarchical Multiplane Images Representation for Novel View Synthesis from a Single Image

no code implementations • ICCV 2023 • Xiaoyu Zhou, Zhiwei Lin, Xiaojun Shan, Yongtao Wang, Deqing Sun, Ming-Hsuan Yang

Recent novel view synthesis methods obtain promising results for relatively small scenes, e. g., indoor environments and scenes with a few objects, but tend to fail for unbounded outdoor scenes with a single image as input.

Novel View Synthesis

Paper
Add Code

Editing 3D Scenes via Text Prompts without Retraining

no code implementations • 10 Sep 2023 • Shuangkang Fang, Yufeng Wang, Yi Yang, Yi-Hsuan Tsai, Wenrui Ding, Shuchang Zhou, Ming-Hsuan Yang

To tackle these issues, we introduce a text-driven editing method, termed DN2N, which allows for the direct acquisition of a NeRF model with universal editing capabilities, eliminating the requirement for retraining.

3D scene Editing 3D Scene Reconstruction +2

Paper
Add Code

CiteTracker: Correlating Image and Text for Visual Tracking

1 code implementation • ICCV 2023 • Xin Li, Yuqing Huang, Zhenyu He, YaoWei Wang, Huchuan Lu, Ming-Hsuan Yang

Existing visual tracking methods typically take an image patch as the reference of the target to perform tracking.

Attribute Descriptive +2

Paper
Code

Delving into Motion-Aware Matching for Monocular 3D Object Tracking

1 code implementation • ICCV 2023 • Kuan-Chih Huang, Ming-Hsuan Yang, Yi-Hsuan Tsai

In this paper, we find that the motion cue of objects along different time frames is critical in 3D multi-object tracking, which is less explored in existing monocular-based approaches.

3D Multi-Object Tracking 3D Object Tracking +3

Paper
Code

Dual Associated Encoder for Face Restoration

1 code implementation • 14 Aug 2023 • Yu-Ju Tsai, Yu-Lun Liu, Lu Qi, Kelvin C. K. Chan, Ming-Hsuan Yang

Restoring facial details from low-quality (LQ) images has remained a challenging problem due to its ill-posedness induced by various degradations in the wild.

Ranked #2 on Blind Face Restoration on WIDER

Blind Face Restoration

Paper
Code

Foundational Models Defining a New Era in Vision: A Survey and Outlook

1 code implementation • 25 Jul 2023 • Muhammad Awais, Muzammal Naseer, Salman Khan, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, Fahad Shahbaz Khan

Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world.

Benchmarking

416

Paper
Code

CLR: Channel-wise Lightweight Reprogramming for Continual Learning

1 code implementation • ICCV 2023 • Yunhao Ge, Yuecheng Li, Shuo Ni, Jiaping Zhao, Ming-Hsuan Yang, Laurent Itti

Reprogramming parameters are task-specific and exclusive to each task, which makes our method immune to catastrophic forgetting.

Continual Learning Image Classification

Paper
Code

Self-regulating Prompts: Foundational Model Adaptation without Forgetting

2 code implementations • ICCV 2023 • Muhammad Uzair Khattak, Syed Talal Wasim, Muzammal Naseer, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan

To the best of our knowledge, this is the first regularization framework for prompt learning that avoids overfitting by jointly attending to pre-trained model features, the training trajectory during prompting, and the textual diversity.

Ranked #2 on Prompt Engineering on ImageNet V2

Prompt Engineering

187

Paper
Code

VideoGLUE: Video General Understanding Evaluation of Foundation Models

1 code implementation • 6 Jul 2023 • Liangzhe Yuan, Nitesh Bharadwaj Gundavarapu, Long Zhao, Hao Zhou, Yin Cui, Lu Jiang, Xuan Yang, Menglin Jia, Tobias Weyand, Luke Friedman, Mikhail Sirotenko, Huisheng Wang, Florian Schroff, Hartwig Adam, Ming-Hsuan Yang, Ting Liu, Boqing Gong

We evaluate existing foundation models video understanding capabilities using a carefully designed experiment protocol consisting of three hallmark tasks (action recognition, temporal localization, and spatiotemporal localization), eight datasets well received by the community, and four adaptation methods tailoring a foundation model (FM) for a downstream task.

Action Recognition Temporal Localization +1

76,616

Paper
Code

SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs

no code implementations • NeurIPS 2023 • Lijun Yu, Yong Cheng, Zhiruo Wang, Vivek Kumar, Wolfgang Macherey, Yanping Huang, David A. Ross, Irfan Essa, Yonatan Bisk, Ming-Hsuan Yang, Kevin Murphy, Alexander G. Hauptmann, Lu Jiang

In this work, we introduce Semantic Pyramid AutoEncoder (SPAE) for enabling frozen LLMs to perform both understanding and generation tasks involving non-linguistic modalities such as images or videos.

In-Context Learning multimodal generation

Paper
Add Code

Counting Crowds in Bad Weather

no code implementations • ICCV 2023 • Zhi-Kai Huang, Wei-Ting Chen, Yuan-Chun Chiang, Sy-Yen Kuo, Ming-Hsuan Yang

Crowd counting has recently attracted significant attention in the field of computer vision due to its wide applications to image understanding.

Crowd Counting Image Restoration

Paper
Add Code

AIMS: All-Inclusive Multi-Level Segmentation

1 code implementation • 28 May 2023 • Lu Qi, Jason Kuen, Weidong Guo, Jiuxiang Gu, Zhe Lin, Bo Du, Yu Xu, Ming-Hsuan Yang

Despite the progress of image segmentation for accurate visual entity segmentation, completing the diverse requirements of image editing applications for different-level region-of-interest selections remains unsolved.

Image Segmentation Segmentation +1

668

Paper
Code

A Tale of Two Features: Stable Diffusion Complements DINO for Zero-Shot Semantic Correspondence

1 code implementation • NeurIPS 2023 • Junyi Zhang, Charles Herrmann, Junhwa Hur, Luisa Polania Cabrera, Varun Jampani, Deqing Sun, Ming-Hsuan Yang

Text-to-image diffusion models have made significant advances in generating and editing high-quality images.

Ranked #3 on Semantic correspondence on SPair-71k

Representation Learning Semantic correspondence +1

209

Paper
Code

Motion-Conditioned Diffusion Model for Controllable Video Synthesis

no code implementations • 27 Apr 2023 • Tsai-Shien Chen, Chieh Hubert Lin, Hung-Yu Tseng, Tsung-Yi Lin, Ming-Hsuan Yang

In response to this gap, we introduce MCDiff, a conditional diffusion model that generates a video from a starting image frame and a set of strokes, which allow users to specify the intended content and dynamics for synthesis.

Motion Synthesis

Paper
Add Code

Video Generation Beyond a Single Clip

no code implementations • 15 Apr 2023 • Hsin-Ping Huang, Yu-Chuan Su, Ming-Hsuan Yang

We tackle the long video generation problem, i. e.~generating videos beyond the output length of video generation models.

Video Generation

Paper
Add Code

Generative Multiplane Neural Radiance for 3D-Aware Image Generation

1 code implementation • ICCV 2023 • Amandeep Kumar, Ankan Kumar Bhunia, Sanath Narayan, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan

We present a method to efficiently generate 3D-aware high-resolution images that are view-consistent across multiple target views.

Computational Efficiency Image Generation

Paper
Code

Burstormer: Burst Image Restoration and Enhancement Transformer

1 code implementation • CVPR 2023 • Akshay Dudhane, Syed Waqas Zamir, Salman Khan, Fahad Shahbaz Khan, Ming-Hsuan Yang

Unlike existing methods, the proposed alignment module not only aligns burst features but also exchanges feature information and maintains focused communication with the reference frame through the proposed reference-based feature enrichment mechanism, which facilitates handling complex motions.

Denoising Image Restoration +1

Paper
Code

Structured Video-Language Modeling with Temporal Grouping and Spatial Grounding

no code implementations • 28 Mar 2023 • Yuanhao Xiong, Long Zhao, Boqing Gong, Ming-Hsuan Yang, Florian Schroff, Ting Liu, Cho-Jui Hsieh, Liangzhe Yuan

Existing video-language pre-training methods primarily focus on instance-level alignment between video clips and captions via global contrastive learning but neglect rich fine-grained local information in both videos and text, which is of importance to downstream tasks requiring temporal localization and semantic reasoning.

Action Recognition Contrastive Learning +7

Paper
Add Code

SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications

2 code implementations • ICCV 2023 • Abdelrahman Shaker, Muhammad Maaz, Hanoona Rasheed, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan

Using our proposed efficient additive attention, we build a series of models called "SwiftFormer" which achieves state-of-the-art performance in terms of both accuracy and mobile inference speed.

328

Paper
Code

Unified Visual Relationship Detection with Vision and Language Models

1 code implementation • ICCV 2023 • Long Zhao, Liangzhe Yuan, Boqing Gong, Yin Cui, Florian Schroff, Ming-Hsuan Yang, Hartwig Adam, Ting Liu

To address this challenge, we propose UniVRD, a novel bottom-up method for Unified Visual Relationship Detection by leveraging vision and language models (VLMs).

Human-Object Interaction Detection Relationship Detection +2

3,013

Paper
Code

InfiniCity: Infinite-Scale City Synthesis

no code implementations • ICCV 2023 • Chieh Hubert Lin, Hsin-Ying Lee, Willi Menapace, Menglei Chai, Aliaksandr Siarohin, Ming-Hsuan Yang, Sergey Tulyakov

Toward infinite-scale 3D city synthesis, we propose a novel framework, InfiniCity, which constructs and renders an unconstrainedly large and 3D-grounded environment from random noises.

Image Generation Neural Rendering

Paper
Add Code

PanopticPartFormer++: A Unified and Decoupled View for Panoptic Part Segmentation

1 code implementation • 3 Jan 2023 • Xiangtai Li, Shilin Xu, Yibo Yang, Haobo Yuan, Guangliang Cheng, Yunhai Tong, Zhouchen Lin, Ming-Hsuan Yang, DaCheng Tao

Third, inspired by Mask2Former, based on our meta-architecture, we propose Panoptic-PartFormer++ and design a new part-whole cross-attention scheme to boost part segmentation qualities further.

Panoptic Segmentation Segmentation

Paper
Code

Muse: Text-To-Image Generation via Masked Generative Transformers

4 code implementations • 2 Jan 2023 • Huiwen Chang, Han Zhang, Jarred Barber, AJ Maschinot, Jose Lezama, Lu Jiang, Ming-Hsuan Yang, Kevin Murphy, William T. Freeman, Michael Rubinstein, Yuanzhen Li, Dilip Krishnan

Compared to pixel-space diffusion models, such as Imagen and DALL-E 2, Muse is significantly more efficient due to the use of discrete tokens and requiring fewer sampling iterations; compared to autoregressive models, such as Parti, Muse is more efficient due to the use of parallel decoding.

Ranked #1 on Text-to-Image Generation on MS-COCO (FID metric)

Language Modelling Large Language Model +1

817

Paper
Code

High Quality Entity Segmentation

no code implementations • ICCV 2023 • Lu Qi, Jason Kuen, Tiancheng Shen, Jiuxiang Gu, Wenbo Li, Weidong Guo, Jiaya Jia, Zhe Lin, Ming-Hsuan Yang

Given the high-quality and -resolution nature of the dataset, we propose CropFormer which is designed to tackle the intractability of instance-level segmentation on high-resolution images.

Image Segmentation Segmentation +1

Paper
Add Code

MiniROAD: Minimal RNN Framework for Online Action Detection

1 code implementation • ICCV 2023 • Joungbin An, Hyolim Kang, Su Ho Han, Ming-Hsuan Yang, Seon Joo Kim

Online Action Detection (OAD) is the task of identifying actions in streaming videos without access to future frames.

Ranked #1 on Online Action Detection on TVSeries

Online Action Detection

Paper
Code

Self-Supervised Super-Plane for Neural 3D Reconstruction

1 code implementation • CVPR 2023 • Botao Ye, Sifei Liu, Xueting Li, Ming-Hsuan Yang

In this work, we introduce a self-supervised super-plane constraint by exploring the free geometry cues from the predicted surface, which can further regularize the reconstruction of plane regions without any other ground truth annotations.

3D Reconstruction

Paper
Code

Beyond SOT: Tracking Multiple Generic Objects at Once

1 code implementation • 22 Dec 2022 • Christoph Mayer, Martin Danelljan, Ming-Hsuan Yang, Vittorio Ferrari, Luc van Gool, Alina Kuznetsova

Our approach achieves a 4x faster run-time in case of 10 concurrent objects compared to tracking each object independently and outperforms existing single object trackers on our new benchmark.

Attribute Object +1

3,092

Paper
Code

Hi-LASSIE: High-Fidelity Articulated Shape and Skeleton Discovery from Sparse Image Ensemble

1 code implementation • CVPR 2023 • Chun-Han Yao, Wei-Chih Hung, Yuanzhen Li, Michael Rubinstein, Ming-Hsuan Yang, Varun Jampani

Automatically estimating 3D skeleton, shape, camera viewpoints, and part articulation from sparse in-the-wild image ensembles is a severely under-constrained and challenging problem.

Paper
Code

Learning Object-level Point Augmentor for Semi-supervised 3D Object Detection

1 code implementation • 19 Dec 2022 • Cheng-Ju Ho, Chen-Hsuan Tai, Yi-Hsuan Tsai, Yen-Yu Lin, Ming-Hsuan Yang

In this work, we propose an object-level point augmentor (OPA) that performs local transformations for semi-supervised 3D object detection.

3D Object Detection Knowledge Distillation +4

Paper
Code

BEV-MAE: Bird's Eye View Masked Autoencoders for Point Cloud Pre-training in Autonomous Driving Scenarios

1 code implementation • 12 Dec 2022 • Zhiwei Lin, Yongtao Wang, Shengxiang Qi, Nan Dong, Ming-Hsuan Yang

Based on the property of outdoor point clouds in autonomous driving scenarios, i. e., the point clouds of distant objects are more sparse, we propose point density prediction to enable the 3D encoder to learn location information, which is essential for object detection.

3D Object Detection Autonomous Driving +3

Paper
Code

MAGVIT: Masked Generative Video Transformer

1 code implementation • CVPR 2023 • Lijun Yu, Yong Cheng, Kihyuk Sohn, José Lezama, Han Zhang, Huiwen Chang, Alexander G. Hauptmann, Ming-Hsuan Yang, Yuan Hao, Irfan Essa, Lu Jiang

We introduce the MAsked Generative VIdeo Transformer, MAGVIT, to tackle various video synthesis tasks with a single model.

Ranked #1 on Video Prediction on Something-Something V2

Multi-Task Learning Text-to-Video Generation +2

847

Paper
Code

Physics-based Indirect Illumination for Inverse Rendering

no code implementations • 9 Dec 2022 • Youming Deng, Xueting Li, Sifei Liu, Ming-Hsuan Yang

We present a physics-based inverse rendering method that learns the illumination, geometry, and materials of a scene from posed multi-view RGB images.

Efficient Neural Network Inverse Rendering +1

Paper
Add Code

Learning to Dub Movies via Hierarchical Prosody Models

1 code implementation • CVPR 2023 • Gaoxiang Cong, Liang Li, Yuankai Qi, ZhengJun Zha, Qi Wu, Wenyu Wang, Bin Jiang, Ming-Hsuan Yang, Qingming Huang

Given a piece of text, a video clip and a reference audio, the movie dubbing (also known as visual voice clone V2C) task aims to generate speeches that match the speaker's emotion presented in the video using the desired speaker voice as reference.

Paper
Code

Progressive Multi-resolution Loss for Crowd Counting

1 code implementation • 8 Dec 2022 • Ziheng Yan, Yuankai Qi, Guorong Li, Xinyan Liu, Weigang Zhang, Qingming Huang, Ming-Hsuan Yang

Crowd counting is usually handled in a density map regression fashion, which is supervised via a L2 loss between the predicted density map and ground truth.

Crowd Counting

Paper
Code

Consistency-Aware Anchor Pyramid Network for Crowd Localization

no code implementations • 8 Dec 2022 • Xinyan Liu, Guorong Li, Yuankai Qi, Zhenjun Han, Qingming Huang, Ming-Hsuan Yang, Nicu Sebe

Crowd localization aims to predict the spatial position of humans in a crowd scenario.

Position

Paper
Add Code

Exploiting Completeness and Uncertainty of Pseudo Labels for Weakly Supervised Video Anomaly Detection

no code implementations • CVPR 2023 • Chen Zhang, Guorong Li, Yuankai Qi, Shuhui Wang, Laiyun Qing, Qingming Huang, Ming-Hsuan Yang

Weakly supervised video anomaly detection aims to identify abnormal events in videos using only video-level labels.

Anomaly Detection Pseudo Label +1

Paper
Add Code

UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation

2 code implementations • 8 Dec 2022 • Abdelrahman Shaker, Muhammad Maaz, Hanoona Rasheed, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan

Owing to the success of transformer models, recent works study their applicability in 3D medical segmentation tasks.

Image Segmentation Medical Image Segmentation +2

283

Paper
Code

Self-supervised AutoFlow

no code implementations • CVPR 2023 • Hsin-Ping Huang, Charles Herrmann, Junhwa Hur, Erika Lu, Kyle Sargent, Austin Stone, Ming-Hsuan Yang, Deqing Sun

Recently, AutoFlow has shown promising results on learning a training set for optical flow, but requires ground truth labels in the target domain to compute its search metric.

Optical Flow Estimation

Paper
Add Code

Improving Zero-shot Generalization and Robustness of Multi-modal Models

1 code implementation • CVPR 2023 • Yunhao Ge, Jie Ren, Andrew Gallagher, Yuxiao Wang, Ming-Hsuan Yang, Hartwig Adam, Laurent Itti, Balaji Lakshminarayanan, Jiaping Zhao

We also show that our method improves across ImageNet shifted datasets, four other datasets, and other model architectures such as LiT.

Image Classification Zero-shot Generalization

Paper
Code

Exploiting Category Names for Few-Shot Classification with Vision-Language Models

no code implementations • 29 Nov 2022 • Taihong Xiao, ZiRui Wang, Liangliang Cao, Jiahui Yu, Shengyang Dai, Ming-Hsuan Yang

Vision-language foundation models pretrained on large-scale data provide a powerful tool for many visual understanding tasks.

Classification Few-Shot Image Classification

Paper
Add Code

Diffusion-Based Scene Graph to Image Generation with Masked Contrastive Pre-Training

1 code implementation • 21 Nov 2022 • Ling Yang, Zhilin Huang, Yang song, Shenda Hong, Guohao Li, Wentao Zhang, Bin Cui, Bernard Ghanem, Ming-Hsuan Yang

Generating images from graph-structured inputs, such as scene graphs, is uniquely challenging due to the difficulty of aligning nodes and connections in graphs with objects and their relations in images.

Image Generation

Paper
Code

High-Quality Entity Segmentation

1 code implementation • 10 Nov 2022 • Lu Qi, Jason Kuen, Weidong Guo, Tiancheng Shen, Jiuxiang Gu, Jiaya Jia, Zhe Lin, Ming-Hsuan Yang

It improves mask prediction by fusing high-res image crops that provide more fine-grained image details and the full image.

Image Segmentation Segmentation +2

668

Paper
Code

ScoreMix: A Scalable Augmentation Strategy for Training GANs with Limited Data

no code implementations • 27 Oct 2022 • Jie Cao, Mandi Luo, Junchi Yu, Ming-Hsuan Yang, Ran He

Then, we optimize the augmented samples by minimizing the norms of the data scores, i. e., the gradients of the log-density functions.

Data Augmentation Image Generation

Paper
Add Code

GAN-based Facial Attribute Manipulation

no code implementations • 23 Oct 2022 • Yunfan Liu, Qi Li, Qiyao Deng, Zhenan Sun, Ming-Hsuan Yang

Facial Attribute Manipulation (FAM) aims to aesthetically modify a given face image to render desired attributes, which has received significant attention due to its broad practical applications ranging from digital entertainment to biometric forensics.

Attribute

Paper
Add Code

Diffusion Models: A Comprehensive Survey of Methods and Applications

2 code implementations • 2 Sep 2022 • Ling Yang, Zhilong Zhang, Yang song, Shenda Hong, Runsheng Xu, Yue Zhao, Yingxia Shao, Wentao Zhang, Bin Cui, Ming-Hsuan Yang

This survey aims to provide a contextualized, in-depth look at the state of diffusion models, identifying the key areas of focus and pointing to potential areas for further exploration.

Image Super-Resolution Text-to-Image Generation +1

2,679

Paper
Code

Learning Visibility for Robust Dense Human Body Estimation

1 code implementation • 23 Aug 2022 • Chun-Han Yao, Jimei Yang, Duygu Ceylan, Yi Zhou, Yang Zhou, Ming-Hsuan Yang

An alternative approach is to estimate dense vertices of a predefined template body in the image space.

Paper
Code

3D Vision with Transformers: A Survey

1 code implementation • 8 Aug 2022 • Jean Lahoud, Jiale Cao, Fahad Shahbaz Khan, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Ming-Hsuan Yang

The success of the transformer architecture in natural language processing has recently triggered attention in the computer vision field.

Pose Estimation

367

Paper
Code

Automatically Discovering Novel Visual Categories with Self-supervised Prototype Learning

1 code implementation • 1 Aug 2022 • Lu Zhang, Lu Qi, Xu Yang, Hong Qiao, Ming-Hsuan Yang, Zhiyong Liu

In the first stage, we obtain a robust feature extractor, which could serve for all images with base and novel categories.

Representation Learning Self-Supervised Learning

668

Paper
Code

Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models

no code implementations • 15 Jul 2022 • Rui Qian, Yeqing Li, Zheng Xu, Ming-Hsuan Yang, Serge Belongie, Yin Cui

Utilizing vision and language models (VLMs) pre-trained on large-scale image-text pairs is becoming a promising paradigm for open-vocabulary visual recognition.

Ranked #1 on Zero-Shot Action Recognition on HMDB51

Optical Flow Estimation Video Classification +1

Paper
Add Code

LASSIE: Learning Articulated Shapes from Sparse Image Ensemble via 3D Part Discovery

no code implementations • 7 Jul 2022 • Chun-Han Yao, Wei-Chih Hung, Yuanzhen Li, Michael Rubinstein, Ming-Hsuan Yang, Varun Jampani

In this work, we propose a practical problem setting to estimate 3D pose and shape of animals given only a few (10-30) in-the-wild images of a particular animal species (say, horse).

Paper
Add Code

FlowNAS: Neural Architecture Search for Optical Flow Estimation

1 code implementation • 4 Jul 2022 • Zhiwei Lin, TingTing Liang, Taihong Xiao, Yongtao Wang, Zhi Tang, Ming-Hsuan Yang

To address this issue, we propose a neural architecture search method named FlowNAS to automatically find the better encoder architecture for flow estimation task.

Image Classification Neural Architecture Search +1

Paper
Code

Unveiling The Mask of Position-Information Pattern Through the Mist of Image Features

no code implementations • 2 Jun 2022 • Chieh Hubert Lin, Hsin-Ying Lee, Hung-Yu Tseng, Maneesh Singh, Ming-Hsuan Yang

Recent studies show that paddings in convolutional neural networks encode absolute position information which can negatively affect the model performance for certain tasks.

Position

Paper
Add Code

Learning Enriched Features for Fast Image Restoration and Enhancement

1 code implementation • 19 Apr 2022 • Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, Ling Shao

In the former case, spatial details are preserved but the contextual information cannot be precisely encoded.

Autonomous Vehicles Deblurring +4

369

Paper
Code

An Extendable, Efficient and Effective Transformer-based Object Detector

1 code implementation • 17 Apr 2022 • Hwanjun Song, Deqing Sun, Sanghyuk Chun, Varun Jampani, Dongyoon Han, Byeongho Heo, Wonjae Kim, Ming-Hsuan Yang

Transformers have been widely used in numerous vision problems especially for visual recognition and detection.

Decoder Image Classification +5

299

Paper
Code

Autoregressive 3D Shape Generation via Canonical Mapping

1 code implementation • 5 Apr 2022 • An-Chieh Cheng, Xueting Li, Sifei Liu, Min Sun, Ming-Hsuan Yang

With the capacity of modeling long-range dependencies in sequential data, transformers have shown remarkable performances in a variety of generative tasks such as image, audio, and text generation.

3D Shape Generation Point Cloud Generation +1

Paper
Code

Neural Rendering of Humans in Novel View and Pose from Monocular Video

no code implementations • 4 Apr 2022 • Tiantian Wang, Nikolaos Sarafianos, Ming-Hsuan Yang, Tony Tung

We accomplish this by utilizing both the human pose that models the body shape as well as point clouds that partially cover the human as input.

Neural Rendering

Paper
Add Code

Adaptive Transformers for Robust Few-shot Cross-domain Face Anti-spoofing

no code implementations • 23 Mar 2022 • Hsin-Ping Huang, Deqing Sun, Yaojie Liu, Wen-Sheng Chu, Taihong Xiao, Jinwei Yuan, Hartwig Adam, Ming-Hsuan Yang

While recent face anti-spoofing methods perform well under the intra-domain setups, an effective approach needs to account for much larger appearance variations of images acquired in complex scenes with different sensors for robust performance.

Face Anti-Spoofing

Paper
Add Code

V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer

2 code implementations • 20 Mar 2022 • Runsheng Xu, Hao Xiang, Zhengzhong Tu, Xin Xia, Ming-Hsuan Yang, Jiaqi Ma

In this paper, we investigate the application of Vehicle-to-Everything (V2X) communication to improve the perception performance of autonomous vehicles.

Ranked #1 on 3D Object Detection on V2XSet

3D Object Detection Autonomous Vehicles +1

254

Paper
Code

Deep Image Deblurring: A Survey

no code implementations • 26 Jan 2022 • Kaihao Zhang, Wenqi Ren, Wenhan Luo, Wei-Sheng Lai, Bjorn Stenger, Ming-Hsuan Yang, Hongdong Li

Image deblurring is a classic problem in low-level computer vision with the aim to recover a sharp image from a blurred input image.

Deblurring Image Deblurring

Paper
Add Code

InOut: Diverse Image Outpainting via GAN Inversion

no code implementations • CVPR 2022 • Yen-Chi Cheng, Chieh Hubert Lin, Hsin-Ying Lee, Jian Ren, Sergey Tulyakov, Ming-Hsuan Yang

Existing image outpainting methods pose the problem as a conditional image-to-image translation task, often generating repetitive structures and textures by replicating the content available in the input image.

Image Outpainting Image-to-Image Translation

Paper
Add Code

Towards a Unified Foundation Model: Jointly Pre-Training Transformers on Unpaired Images and Text

no code implementations • 14 Dec 2021 • Qing Li, Boqing Gong, Yin Cui, Dan Kondratyuk, Xianzhi Du, Ming-Hsuan Yang, Matthew Brown

The experiments show that the resultant unified foundation transformer works surprisingly well on both the vision-only and text-only tasks, and the proposed knowledge distillation and gradient masking strategy can effectively lift the performance to approach the level of separately-trained models.

Image Classification Knowledge Distillation +1

Paper
Add Code

An Informative Tracking Benchmark

1 code implementation • 13 Dec 2021 • Xin Li, Qiao Liu, Wenjie Pei, Qiuhong Shen, YaoWei Wang, Huchuan Lu, Ming-Hsuan Yang

Along with the rapid progress of visual tracking, existing benchmarks become less informative due to redundancy of samples and weak discrimination between current trackers, making evaluations on all datasets extremely time-consuming.

Visual Tracking

Paper
Code

CA-SSL: Class-Agnostic Semi-Supervised Learning for Detection and Segmentation

1 code implementation • 9 Dec 2021 • Lu Qi, Jason Kuen, Zhe Lin, Jiuxiang Gu, Fengyun Rao, Dian Li, Weidong Guo, Zhen Wen, Ming-Hsuan Yang, Jiaya Jia

To improve instance-level detection/segmentation performance, existing self-supervised and semi-supervised methods extract either task-unrelated or task-specific training signals from unlabeled data.

object-detection Object Detection +2

668

Paper
Code

Contextualized Spatio-Temporal Contrastive Learning with Self-Supervision

1 code implementation • CVPR 2022 • Liangzhe Yuan, Rui Qian, Yin Cui, Boqing Gong, Florian Schroff, Ming-Hsuan Yang, Hartwig Adam, Ting Liu

Modern self-supervised learning algorithms typically enforce persistency of instance representations across views.

Action Recognition Contrastive Learning +4

76,616

Paper
Code

Exploring Temporal Granularity in Self-Supervised Video Representation Learning

no code implementations • 8 Dec 2021 • Rui Qian, Yeqing Li, Liangzhe Yuan, Boqing Gong, Ting Liu, Matthew Brown, Serge Belongie, Ming-Hsuan Yang, Hartwig Adam, Yin Cui

The training objective consists of two parts: a fine-grained temporal learning objective to maximize the similarity between corresponding temporal embeddings in the short clip and the long clip, and a persistent temporal learning objective to pull together global embeddings of the two clips.

Representation Learning Self-Supervised Learning

Paper
Add Code

MC-Blur: A Comprehensive Benchmark for Image Deblurring

2 code implementations • 1 Dec 2021 • Kaihao Zhang, Tao Wang, Wenhan Luo, Boheng Chen, Wenqi Ren, Bjorn Stenger, Wei Liu, Hongdong Li, Ming-Hsuan Yang

Blur artifacts can seriously degrade the visual quality of images, and numerous deblurring methods have been proposed for specific scenarios.

Benchmarking Deblurring +1

144

Paper
Code

Exploring Cross-Video and Cross-Modality Signals for Weakly-Supervised Audio-Visual Video Parsing

1 code implementation • NeurIPS 2021 • Yan-Bo Lin, Hung-Yu Tseng, Hsin-Ying Lee, Yen-Yu Lin, Ming-Hsuan Yang

The audio-visual video parsing task aims to temporally parse a video into audio or visual event categories.

Paper
Code

Learning Continuous Environment Fields via Implicit Functions

no code implementations • ICLR 2022 • Xueting Li, Shalini De Mello, Xiaolong Wang, Ming-Hsuan Yang, Jan Kautz, Sifei Liu

We propose a novel scene representation that encodes reaching distance -- the distance between any position in the scene to a goal along a feasible trajectory.

Position Trajectory Prediction

Paper
Add Code

Video Frame Interpolation Transformer

1 code implementation • CVPR 2022 • Zhihao Shi, Xiangyu Xu, Xiaohong Liu, Jun Chen, Ming-Hsuan Yang

Existing methods for video interpolation heavily rely on deep convolution neural networks, and thus suffer from their intrinsic limitations, such as content-agnostic kernel weights and restricted receptive field.

Video Frame Interpolation

Paper
Code

Learning Discriminative Shrinkage Deep Networks for Image Deconvolution

1 code implementation • 27 Nov 2021 • Pin-Hung Kuo, Jinshan Pan, Shao-Yi Chien, Ming-Hsuan Yang

Most existing methods usually formulate the non-blind deconvolution problem into a maximum-a-posteriori framework and address it by manually designing kinds of regularization terms and data terms of the latent clear images.

Image Deconvolution Image Restoration

Paper
Code

Hierarchical Modular Network for Video Captioning

1 code implementation • CVPR 2022 • Hanhua Ye, Guorong Li, Yuankai Qi, Shuhui Wang, Qingming Huang, Ming-Hsuan Yang

(II) Predicate level, which learns the actions conditioned on highlighted objects and is supervised by the predicate in captions.

Representation Learning Sentence +1

Paper
Code

Class-agnostic Object Detection with Multi-modal Transformer

1 code implementation • 22 Nov 2021 • Muhammad Maaz, Hanoona Rasheed, Salman Khan, Fahad Shahbaz Khan, Rao Muhammad Anwer, Ming-Hsuan Yang

This has been a long-standing question in computer vision.

Ranked #1 on Open World Object Detection on COCO 2017 (Outdoor, Accessories, Appliance, Truck)

Class-agnostic Object Detection Object +3

294

Paper
Code

Restormer: Efficient Transformer for High-Resolution Image Restoration

11 code implementations • CVPR 2022 • Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang

Since convolutional neural networks (CNNs) perform well at learning generalizable image priors from large-scale data, these models have been extensively applied to image restoration and related tasks.

Ranked #1 on Grayscale Image Denoising on Urban100 sigma15

Color Image Denoising Deblurring +7

1,546

Paper
Code

Correcting Face Distortion in Wide-Angle Videos

no code implementations • 18 Nov 2021 • Wei-Sheng Lai, YiChang Shih, Chia-Kai Liang, Ming-Hsuan Yang

Video blogs and selfies are popular social media formats, which are often captured by wide-angle cameras to show human subjects and expanded background.

Paper
Add Code

Video Salient Object Detection via Contrastive Features and Attention Modules

no code implementations • 3 Nov 2021 • Yi-Wen Chen, Xiaojie Jin, Xiaohui Shen, Ming-Hsuan Yang

Video salient object detection aims to find the most visually distinctive objects in a video.

Contrastive Learning Object +7

Paper
Add Code

Semi-supervised Multi-task Learning for Semantics and Depth

no code implementations • 14 Oct 2021 • Yufeng Wang, Yi-Hsuan Tsai, Wei-Chih Hung, Wenrui Ding, Shuo Liu, Ming-Hsuan Yang

Multi-Task Learning (MTL) aims to enhance the model generalization by sharing representations between related tasks for better performance.

Depth Estimation Multi-Task Learning +1

Paper
Add Code

ViDT: An Efficient and Effective Fully Transformer-based Object Detector

1 code implementation • ICLR 2022 • Hwanjun Song, Deqing Sun, Sanghyuk Chun, Varun Jampani, Dongyoon Han, Byeongho Heo, Wonjae Kim, Ming-Hsuan Yang

Transformers are transforming the landscape of computer vision, especially for recognition tasks.

Ranked #12 on Object Detection on COCO 2017 val

Decoder Image Classification +3

299

Paper
Code

Burst Image Restoration and Enhancement

1 code implementation • CVPR 2022 • Akshay Dudhane, Syed Waqas Zamir, Salman Khan, Fahad Shahbaz Khan, Ming-Hsuan Yang

Our central idea is to create a set of pseudo-burst features that combine complementary information from all the input burst frames to seamlessly exchange information.

Ranked #2 on Burst Image Super-Resolution on BurstSR

Burst Image Super-Resolution Denoising +3

127

Paper
Code

Learning Contrastive Representation for Semantic Correspondence

no code implementations • 22 Sep 2021 • Taihong Xiao, Sifei Liu, Shalini De Mello, Zhiding Yu, Jan Kautz, Ming-Hsuan Yang

Dense correspondence across semantically related images has been extensively studied, but still faces two challenges: 1) large variations in appearance, scale and pose exist even for objects from the same category, and 2) labeling pixel-level dense correspondences is labor intensive and infeasible to scale.

Contrastive Learning Semantic correspondence

Paper
Add Code

Federated Multi-Target Domain Adaptation

no code implementations • 17 Aug 2021 • Chun-Han Yao, Boqing Gong, Yin Cui, Hang Qi, Yukun Zhu, Ming-Hsuan Yang

We further take the server-client and inter-client domain shifts into account and pose a domain adaptation problem with one source (centralized server data) and multiple targets (distributed client data).

Domain Adaptation Federated Learning +3

Paper
Add Code

Discovering 3D Parts from Image Collections

no code implementations • ICCV 2021 • Chun-Han Yao, Wei-Chih Hung, Varun Jampani, Ming-Hsuan Yang

Reasoning 3D shapes from 2D images is an essential yet challenging task, especially when only single-view images are at our disposal.

Object

Paper
Add Code

End-to-end Multi-modal Video Temporal Grounding

1 code implementation • NeurIPS 2021 • Yi-Wen Chen, Yi-Hsuan Tsai, Ming-Hsuan Yang

Specifically, we adopt RGB images for appearance, optical flow for motion, and depth maps for image structure.

Optical Flow Estimation Self-Supervised Learning

Paper
Code

Learning 3D Dense Correspondence via Canonical Point Autoencoder

no code implementations • NeurIPS 2021 • An-Chieh Cheng, Xueting Li, Min Sun, Ming-Hsuan Yang, Sifei Liu

We propose a canonical point autoencoder (CPAE) that predicts dense correspondences between 3D shapes of the same category.

Segmentation

Paper
Add Code

Self-Supervised Tracking via Target-Aware Data Synthesis

no code implementations • 21 Jun 2021 • Xin Li, Wenjie Pei, YaoWei Wang, Zhenyu He, Huchuan Lu, Ming-Hsuan Yang

While deep-learning based tracking methods have achieved substantial progress, they entail large-scale and high-quality annotated data for sufficient training.

Representation Learning Self-Supervised Learning +1

Paper
Add Code

Incremental False Negative Detection for Contrastive Learning

no code implementations • ICLR 2022 • Tsai-Shien Chen, Wei-Chih Hung, Hung-Yu Tseng, Shao-Yi Chien, Ming-Hsuan Yang

Self-supervised learning has recently shown great potential in vision tasks through contrastive learning, which aims to discriminate each image, or instance, in the dataset.

Contrastive Learning Self-Supervised Learning

Paper
Add Code

Large-scale Unsupervised Semantic Segmentation

3 code implementations • 6 Jun 2021 • ShangHua Gao, Zhong-Yu Li, Ming-Hsuan Yang, Ming-Ming Cheng, Junwei Han, Philip Torr

In this work, we propose a new problem of large-scale unsupervised semantic segmentation (LUSS) with a newly created benchmark dataset to help the research progress.

Ranked #1 on Unsupervised Semantic Segmentation on ImageNet-S-300

Representation Learning Segmentation +1

156

Paper
Code

Learning to Stylize Novel Views

1 code implementation • ICCV 2021 • Hsin-Ping Huang, Hung-Yu Tseng, Saurabh Saini, Maneesh Singh, Ming-Hsuan Yang

Second, we develop point cloud aggregation modules to gather the style information of the 3D scene, and then modulate the features in the point cloud with a linear transformation matrix.

Novel View Synthesis

Paper
Code

Intriguing Properties of Vision Transformers

1 code implementation • NeurIPS 2021 • Muzammal Naseer, Kanchana Ranasinghe, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang

We show and analyze the following intriguing properties of ViT: (a) Transformers are highly robust to severe occlusions, perturbations and domain shifts, e. g., retain as high as 60% top-1 accuracy on ImageNet even after randomly occluding 80% of the image content.

Few-Shot Learning Semantic Segmentation

173

Paper
Code

COMISR: Compression-Informed Video Super-Resolution

2 code implementations • ICCV 2021 • Yinxiao Li, Pengchong Jin, Feng Yang, Ce Liu, Ming-Hsuan Yang, Peyman Milanfar

Most video super-resolution methods focus on restoring high-resolution video frames from low-resolution videos without taking into account compression.

Ranked #6 on Video Super-Resolution on MSU Super-Resolution for Video Compression

Video Super-Resolution

32,882

Paper
Code

Decoupled Dynamic Filter Networks

1 code implementation • CVPR 2021 • Jingkai Zhou, Varun Jampani, Zhixiong Pi, Qiong Liu, Ming-Hsuan Yang

Inspired by recent advances in attention, DDF decouples a depth-wise dynamic filter into spatial and channel dynamic filters.

Ranked #13 on Semantic Segmentation on MCubeS

Image Classification Semantic Segmentation

210

Paper
Code

2.5D Visual Relationship Detection

1 code implementation • 26 Apr 2021 • Yu-Chuan Su, Soravit Changpinyo, Xiangning Chen, Sathish Thoppay, Cho-Jui Hsieh, Lior Shapira, Radu Soricut, Hartwig Adam, Matthew Brown, Ming-Hsuan Yang, Boqing Gong

To enable progress on this task, we create a new dataset consisting of 220k human-annotated 2. 5D relationships among 512K objects from 11K images.

Benchmarking Depth Estimation +2

Paper
Code

Understanding Synonymous Referring Expressions via Contrastive Features

1 code implementation • 20 Apr 2021 • Yi-Wen Chen, Yi-Hsuan Tsai, Ming-Hsuan Yang

While prior work usually treats each sentence and attends it to an object separately, we focus on learning a referring expression comprehension model that considers the property in synonymous sentences.

Object Referring Expression +3

Paper
Code

Weakly Supervised Object Localization and Detection: A Survey

no code implementations • 16 Apr 2021 • Dingwen Zhang, Junwei Han, Gong Cheng, Ming-Hsuan Yang

As an emerging and challenging problem in the computer vision community, weakly supervised object localization and detection plays an important role for developing new generation computer vision systems and has received significant attention in the past decade.

Object Weakly-Supervised Object Localization

Paper
Add Code

The Road to Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation

1 code implementation • ICCV 2021 • Yuankai Qi, Zizheng Pan, Yicong Hong, Ming-Hsuan Yang, Anton Van Den Hengel, Qi Wu

Vision-and-Language Navigation (VLN) requires an agent to find a path to a remote location on the basis of natural-language instructions and a set of photo-realistic panoramas.

Vision and Language Navigation Vision-Language Navigation

Paper
Code

InfinityGAN: Towards Infinite-Pixel Image Synthesis

1 code implementation • ICLR 2022 • Chieh Hubert Lin, Hsin-Ying Lee, Yen-Chi Cheng, Sergey Tulyakov, Ming-Hsuan Yang

We present a novel framework, InfinityGAN, for arbitrary-sized image generation.

Ranked #2 on Scene Generation on OSM

Image Generation Scene Generation

318

Paper
Code

Regularizing Generative Adversarial Networks under Limited Data

1 code implementation • CVPR 2021 • Hung-Yu Tseng, Lu Jiang, Ce Liu, Ming-Hsuan Yang, Weilong Yang

Recent years have witnessed the rapid progress of generative adversarial networks (GANs).

Ranked #1 on Image Generation on CIFAR-100

Data Augmentation Image Generation

163

Paper
Code

Unsupervised Sound Localization via Iterative Contrastive Learning

no code implementations • 1 Apr 2021 • Yan-Bo Lin, Hung-Yu Tseng, Hsin-Ying Lee, Yen-Yu Lin, Ming-Hsuan Yang

Sound localization aims to find the source of the audio signal in the visual scene.

Contrastive Learning

Paper
Add Code

In&Out : Diverse Image Outpainting via GAN Inversion

no code implementations • 1 Apr 2021 • Yen-Chi Cheng, Chieh Hubert Lin, Hsin-Ying Lee, Jian Ren, Sergey Tulyakov, Ming-Hsuan Yang

Image Outpainting Image-to-Image Translation +1

Paper
Add Code

ReMix: Towards Image-to-Image Translation with Limited Data

1 code implementation • CVPR 2021 • Jie Cao, Luanxuan Hou, Ming-Hsuan Yang, Ran He, Zhenan Sun

We interpolate training samples at the feature level and propose a novel content loss based on the perceptual relations among samples.

Data Augmentation Image-to-Image Translation +1

Paper
Code

Self-Attentive 3D Human Pose and Shape Estimation from Videos

no code implementations • 26 Mar 2021 • Yun-Chun Chen, Marco Piccirilli, Robinson Piramuthu, Ming-Hsuan Yang

The key insights of our method are two-fold.

Ranked #53 on 3D Human Pose Estimation on MPI-INF-3DHP

3D human pose and shape estimation

Paper
Add Code

Hybrid Neural Fusion for Full-frame Video Stabilization

2 code implementations • ICCV 2021 • Yu-Lun Liu, Wei-Sheng Lai, Ming-Hsuan Yang, Yung-Yu Chuang, Jia-Bin Huang

Existing video stabilization methods often generate visible distortion or require aggressive cropping of frame boundaries, resulting in smaller field of views.

Video Stabilization

520

Paper
Code

Multi-Stage Progressive Image Restoration

8 code implementations • CVPR 2021 • Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, Ling Shao

At each stage, we introduce a novel per-pixel adaptive design that leverages in-situ supervised attention to reweight the local features.

Ranked #3 on Spectral Reconstruction on ARAD-1K

Deblurring Decoder +4

1,546

Paper
Code

Exploiting Raw Images for Real-Scene Super-Resolution

1 code implementation • 2 Feb 2021 • Xiangyu Xu, Yongrui Ma, Wenxiu Sun, Ming-Hsuan Yang

In this paper, we study the problem of real-scene single image super-resolution to bridge the gap between synthetic data and real captured images.

Image Restoration Image Super-Resolution

Paper
Code

Learning Spatial and Spatio-Temporal Pixel Aggregations for Image and Video Denoising

3 code implementations • 26 Jan 2021 • Xiangyu Xu, Muchen Li, Wenxiu Sun, Ming-Hsuan Yang

We present a spatial pixel aggregation network and learn the pixel sampling and averaging strategies for image denoising.

Image Denoising Video Denoising

216

Paper
Code

GAN Inversion: A Survey

1 code implementation • 14 Jan 2021 • Weihao Xia, Yulun Zhang, Yujiu Yang, Jing-Hao Xue, Bolei Zhou, Ming-Hsuan Yang

GAN inversion aims to invert a given image back into the latent space of a pretrained GAN model, for the image to be faithfully reconstructed from the inverted code by the generator.

Image Manipulation Image Restoration

1,087

Paper
Code

Low Light Image Enhancement via Global and Local Context Modeling

no code implementations • 4 Jan 2021 • Aditya Arora, Muhammad Haris, Syed Waqas Zamir, Munawar Hayat, Fahad Shahbaz Khan, Ling Shao, Ming-Hsuan Yang

These contexts can be crucial towards inferring several image enhancement tasks, e. g., local and global contrast, brightness and color corrections; which requires cues from both local and global spatial extent.

Low-Light Image Enhancement

Paper
Add Code

Benchmarking Ultra-High-Definition Image Super-Resolution

no code implementations • ICCV 2021 • Kaihao Zhang, Dongxu Li, Wenhan Luo, Wenqi Ren, Bjorn Stenger, Wei Liu, Hongdong Li, Ming-Hsuan Yang

Increasingly, modern mobile devices allow capturing images at Ultra-High-Definition (UHD) resolution, which includes 4K and 8K images.

4k 8k +3

Paper
Add Code

Video Matting via Consistency-Regularized Graph Neural Networks

no code implementations • ICCV 2021 • Tiantian Wang, Sifei Liu, Yapeng Tian, Kai Li, Ming-Hsuan Yang

In this paper, we propose to enhance the temporal coherence by Consistency-Regularized Graph Neural Networks (CRGNN) with the aid of a synthesized video matting dataset.

Image Matting Optical Flow Estimation +1

Paper
Add Code

D2-Net: Weakly-Supervised Action Localization via Discriminative Embeddings and Denoised Activations

1 code implementation • ICCV 2021 • Sanath Narayan, Hisham Cholakkal, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, Ling Shao

The proposed formulation comprises a discriminative and a denoising loss term for enhancing temporal action localization.

Ranked #3 on Weakly Supervised Action Localization on THUMOS’14

Denoising Weakly Supervised Action Localization +2

Paper
Code

Online Adaptation for Consistent Mesh Reconstruction in the Wild

no code implementations • NeurIPS 2020 • Xueting Li, Sifei Liu, Shalini De Mello, Kihwan Kim, Xiaolong Wang, Ming-Hsuan Yang, Jan Kautz

This paper presents an algorithm to reconstruct temporally consistent 3D meshes of deformable object instances from videos in the wild.

3D Reconstruction

Paper
Add Code

Unsupervised Discovery of Disentangled Manifolds in GANs

1 code implementation • 24 Nov 2020 • Yu-Ding Lu, Hsin-Ying Lee, Hung-Yu Tseng, Ming-Hsuan Yang

Interpretable generation process is beneficial to various image editing applications.

Attribute

Paper
Code

Shaping Deep Feature Space towards Gaussian Mixture for Visual Classification

no code implementations • 18 Nov 2020 • Weitao Wan, Jiansheng Chen, Cheng Yu, Tong Wu, Yuanyi Zhong, Ming-Hsuan Yang

In this work, we propose a Gaussian mixture (GM) loss function for deep neural networks for visual classification.

Classification General Classification +1

Paper
Add Code

Continuous and Diverse Image-to-Image Translation via Signed Attribute Vectors

1 code implementation • 2 Nov 2020 • Qi Mao, Hung-Yu Tseng, Hsin-Ying Lee, Jia-Bin Huang, Siwei Ma, Ming-Hsuan Yang

Generating a smooth sequence of intermediate results bridges the gap of two different domains, facilitating the morphing effect across domains.

Attribute Image-to-Image Translation +1

Paper
Code

Unsupervised Domain Adaptation for Spatio-Temporal Action Localization

no code implementations • 19 Oct 2020 • Nakul Agarwal, Yi-Ting Chen, Behzad Dariush, Ming-Hsuan Yang

Spatio-temporal action localization is an important problem in computer vision that involves detecting where and when activities occur, and therefore requires modeling of both spatial and temporal features.

object-detection Object Detection +3

Paper
Add Code

Multi-path Neural Networks for On-device Multi-domain Visual Classification

no code implementations • 10 Oct 2020 • Qifei Wang, Junjie Ke, Joshua Greaves, Grace Chu, Gabriel Bender, Luciano Sbaiz, Alec Go, Andrew Howard, Feng Yang, Ming-Hsuan Yang, Jeff Gilbert, Peyman Milanfar

This approach effectively reduces the total number of parameters and FLOPS, encouraging positive knowledge transfer while mitigating negative interference across domains.

General Classification Neural Architecture Search +1

Paper
Add Code

Every Pixel Matters: Center-aware Feature Alignment for Domain Adaptive Object Detector

1 code implementation • ECCV 2020 • Cheng-Chun Hsu, Yi-Hsuan Tsai, Yen-Yu Lin, Ming-Hsuan Yang

A domain adaptive object detector aims to adapt itself to unseen domains that may contain variations of object appearance, viewpoints or backgrounds.

Domain Adaptation

160

Paper
Code

SoDA: Multi-Object Tracking with Soft Data Association

no code implementations • 18 Aug 2020 • Wei-Chih Hung, Henrik Kretzschmar, Tsung-Yi Lin, Yuning Chai, Ruichi Yu, Ming-Hsuan Yang, Dragomir Anguelov

Robust multi-object tracking (MOT) is a prerequisite fora safe deployment of self-driving cars.

Autonomous Driving Multi-Object Tracking +2

Paper
Add Code

Learning to Caricature via Semantic Shape Transform

1 code implementation • 12 Aug 2020 • Wenqing Chu, Wei-Chih Hung, Yi-Hsuan Tsai, Yu-Ting Chang, Yijun Li, Deng Cai, Ming-Hsuan Yang

Caricature is an artistic drawing created to abstract or exaggerate facial features of a person.

Caricature

Paper
Code

Learning to See Through Obstructions with Layered Decomposition

1 code implementation • 11 Aug 2020 • Yu-Lun Liu, Wei-Sheng Lai, Ming-Hsuan Yang, Yung-Yu Chuang, Jia-Bin Huang

We present a learning-based approach for removing unwanted obstructions, such as window reflections, fence occlusions, or adherent raindrops, from a short sequence of images captured by a moving camera.

Optical Flow Estimation

Paper
Code

Spatiotemporal Contrastive Video Representation Learning

4 code implementations • CVPR 2021 • Rui Qian, Tianjian Meng, Boqing Gong, Ming-Hsuan Yang, Huisheng Wang, Serge Belongie, Yin Cui

Our representations are learned using a contrastive loss, where two augmented clips from the same short video are pulled together in the embedding space, while clips from different videos are pushed away.

Ranked #1 on Self-Supervised Action Recognition on Kinetics-600

Contrastive Learning Data Augmentation +4

76,618

Paper
Code

Weakly-Supervised Semantic Segmentation via Sub-category Exploration

1 code implementation • CVPR 2020 • Yu-Ting Chang, Qiaosong Wang, Wei-Chih Hung, Robinson Piramuthu, Yi-Hsuan Tsai, Ming-Hsuan Yang

Existing weakly-supervised semantic segmentation methods using image-level annotations typically rely on initial responses to locate object regions.

Ranked #66 on Weakly-Supervised Semantic Segmentation on PASCAL VOC 2012 val

Clustering Object +2

178

Paper
Code

Mixup-CAM: Weakly-supervised Semantic Segmentation via Uncertainty Regularization

no code implementations • 3 Aug 2020 • Yu-Ting Chang, Qiaosong Wang, Wei-Chih Hung, Robinson Piramuthu, Yi-Hsuan Tsai, Ming-Hsuan Yang

Obtaining object response maps is one important step to achieve weakly-supervised semantic segmentation using image-level labels.

Classification Data Augmentation +4

Paper
Add Code

Learnable Cost Volume Using the Cayley Representation

1 code implementation • ECCV 2020 • Taihong Xiao, Jinwei Yuan, Deqing Sun, Qifei Wang, Xin-Yu Zhang, Kehan Xu, Ming-Hsuan Yang

Cost volume is an essential component of recent deep models for optical flow estimation and is usually constructed by calculating the inner product between two feature vectors.

Optical Flow Estimation

Paper
Code

Controllable Image Synthesis via SegVAE

no code implementations • ECCV 2020 • Yen-Chi Cheng, Hsin-Ying Lee, Min Sun, Ming-Hsuan Yang

We also apply an off-the-shelf image-to-image translation model to generate realistic RGB images to better understand the quality of the synthesized semantic maps.

Conditional Image Generation Image-to-Image Translation +2

Paper
Add Code

RetrieveGAN: Image Synthesis via Differentiable Patch Retrieval

no code implementations • ECCV 2020 • Hung-Yu Tseng, Hsin-Ying Lee, Lu Jiang, Ming-Hsuan Yang, Weilong Yang

Image generation from scene description is a cornerstone technique for the controlled generation, which is beneficial to applications such as content creation and image editing.

Image Generation Retrieval

Paper
Add Code

Modeling Artistic Workflows for Image Generation and Editing

1 code implementation • ECCV 2020 • Hung-Yu Tseng, Matthew Fisher, Jingwan Lu, Yijun Li, Vladimir Kim, Ming-Hsuan Yang

People often create art by following an artistic workflow involving multiple stages that inform the overall design.

Image Generation

Paper
Code

Semi-Supervised Learning with Meta-Gradient

1 code implementation • 8 Jul 2020 • Xin-Yu Zhang, Taihong Xiao, HaoLin Jia, Ming-Ming Cheng, Ming-Hsuan Yang

In this work, we propose a simple yet effective meta-learning algorithm in semi-supervised learning.

Meta-Learning Pseudo Label

Paper
Code

WW-Nets: Dual Neural Networks for Object Detection

no code implementations • 15 May 2020 • Mohammad K. Ebrahimpour, J. Ben Falandays, Samuel Spevack, Ming-Hsuan Yang, David C. Noelle

Inspired by this structure, we have proposed an object detection framework involving the integration of a "What Network" and a "Where Network".

Object object-detection +1

Paper
Add Code

Ventral-Dorsal Neural Networks: Object Detection via Selective Attention

no code implementations • 15 May 2020 • Mohammad K. Ebrahimpour, Jiayun Li, Yen-Yun Yu, Jackson L. Reese, Azadeh Moghtaderi, Ming-Hsuan Yang, David C. Noelle

The coarse functional distinction between these streams is between object recognition -- the "what" of the signal -- and extracting location related information -- the "where" of the signal.

Image Classification Object +3

Paper
Add Code

Generalized Convolutional Forest Networks for Domain Generalization and Visual Recognition

no code implementations • ICLR 2020 • Jongbin Ryu, Gitaek Kwon, Ming-Hsuan Yang, Jongwoo Lim

When constructing random forests, it is of prime importance to ensure high accuracy and low correlation of individual tree classifiers for good performance.

Domain Generalization Image Classification

Paper
Add Code

Multi-Scale Boosted Dehazing Network with Dense Feature Fusion

1 code implementation • CVPR 2020 • Hang Dong, Jinshan Pan, Lei Xiang, Zhe Hu, Xinyi Zhang, Fei Wang, Ming-Hsuan Yang

To address the issue of preserving spatial information in the U-Net architecture, we design a dense feature fusion module using the back-projection feedback scheme.

Ranked #9 on Image Dehazing on Haze4k

Decoder Image Dehazing

324

Paper
Code

Regularizing Meta-Learning via Gradient Dropout

1 code implementation • 13 Apr 2020 • Hung-Yu Tseng, Yi-Wen Chen, Yi-Hsuan Tsai, Sifei Liu, Yen-Yu Lin, Ming-Hsuan Yang

With the growing attention on learning-to-learn new tasks using only a few examples, meta-learning has been widely used in numerous problems such as few-shot classification, reinforcement learning, and domain generalization.

Domain Generalization Meta-Learning

Paper
Code

Single-Image HDR Reconstruction by Learning to Reverse the Camera Pipeline

1 code implementation • CVPR 2020 • Yu-Lun Liu, Wei-Sheng Lai, Yu-Sheng Chen, Yi-Lung Kao, Ming-Hsuan Yang, Yung-Yu Chuang, Jia-Bin Huang

We model the HDRto-LDR image formation pipeline as the (1) dynamic range clipping, (2) non-linear mapping from a camera response function, and (3) quantization.

Ranked #4 on Inverse-Tone-Mapping on MSU HDR Video Reconstruction Benchmark

HDR Reconstruction Inverse-Tone-Mapping +2

520

Paper
Code

Learning to See Through Obstructions

1 code implementation • CVPR 2020 • Yu-Lun Liu, Wei-Sheng Lai, Ming-Hsuan Yang, Yung-Yu Chuang, Jia-Bin Huang

We present a learning-based approach for removing unwanted obstructions, such as window reflections, fence occlusions or raindrops, from a short sequence of images captured by a moving camera.

Optical Flow Estimation Reflection Removal

997

Paper
Code

Deep Semantic Matching with Foreground Detection and Cycle-Consistency

no code implementations • 31 Mar 2020 • Yun-Chun Chen, Po-Hsiang Huang, Li-Yu Yu, Jia-Bin Huang, Ming-Hsuan Yang, Yen-Yu Lin

Establishing dense semantic correspondences between object instances remains a challenging problem due to background clutter, significant scale and pose differences, and large intra-class variations.

Paper
Add Code

TapLab: A Fast Framework for Semantic Video Segmentation Tapping into Compressed-Domain Knowledge

1 code implementation • 30 Mar 2020 • Junyi Feng, Songyuan Li, Xi Li, Fei Wu, Qi Tian, Ming-Hsuan Yang, Haibin Ling

Real-time semantic video segmentation is a challenging task due to the strict requirements of inference speed.

Image Segmentation Semantic Segmentation +2

Paper
Code

Rethinking Class-Balanced Methods for Long-Tailed Visual Recognition from a Domain Adaptation Perspective

1 code implementation • CVPR 2020 • Muhammad Abdullah Jamal, Matthew Brown, Ming-Hsuan Yang, Liqiang Wang, Boqing Gong

Object frequency in the real world often follows a power law, leading to a mismatch between datasets with long-tailed class distributions seen by a machine learning model and our expectation of the model to perform well on all classes.

Ranked #27 on Long-tail Learning on Places-LT

Domain Adaptation Long-tail Learning +1

Paper
Code

Collaborative Distillation for Ultra-Resolution Universal Style Transfer

1 code implementation • CVPR 2020 • Huan Wang, Yijun Li, Yuehai Wang, Haoji Hu, Ming-Hsuan Yang

In this work, we present a new knowledge distillation method (named Collaborative Distillation) for encoder-decoder based neural style transfer to reduce the convolutional filters.

Decoder Knowledge Distillation +1

185

Paper
Code

CycleISP: Real Image Restoration via Improved Data Synthesis

8 code implementations • CVPR 2020 • Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, Ling Shao

This is mainly because the AWGN is not adequate for modeling the real camera noise which is signal-dependent and heavily transformed by the camera imaging pipeline.

Ranked #10 on Image Denoising on DND (using extra training data)

Image Denoising Image Restoration

1,546

Paper
Code

Learning Enriched Features for Real Image Restoration and Enhancement

12 code implementations • ECCV 2020 • Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, Ling Shao

With the goal of recovering high-quality image content from its degraded version, image restoration enjoys numerous applications, such as in surveillance, computational photography, medical imaging, and remote sensing.

Ranked #5 on Spectral Reconstruction on ARAD-1K

Image Denoising Image Enhancement +2

1,546

Paper
Code

Self-supervised Single-view 3D Reconstruction via Semantic Consistency

1 code implementation • ECCV 2020 • Xueting Li, Sifei Liu, Kihwan Kim, Shalini De Mello, Varun Jampani, Ming-Hsuan Yang, Jan Kautz

To the best of our knowledge, we are the first to try and solve the single-view reconstruction problem without a category-specific template mesh or semantic keypoints.

3D Reconstruction Object +1

226

Paper
Code

Gated Fusion Network for Degraded Image Super Resolution

1 code implementation • 2 Mar 2020 • Xinyi Zhang, Hang Dong, Zhe Hu, Wei-Sheng Lai, Fei Wang, Ming-Hsuan Yang

To address this problem, we propose a dual-branch convolutional neural network to extract base features and recovered features separately.

Image Super-Resolution

Paper
Code

Structured Sparsification with Joint Optimization of Group Convolution and Channel Shuffle

1 code implementation • 19 Feb 2020 • Xin-Yu Zhang, Kai Zhao, Taihong Xiao, Ming-Ming Cheng, Ming-Hsuan Yang

Recent advances in convolutional neural networks(CNNs) usually come with the expense of excessive computational overhead and memory footprint.

Network Pruning

Paper
Code

Weakly-Supervised Semantic Segmentation by Iterative Affinity Learning

no code implementations • 19 Feb 2020 • Xiang Wang, Sifei Liu, Huimin Ma, Ming-Hsuan Yang

In this paper, we propose an iterative algorithm to learn such pairwise relations, which consists of two branches, a unary segmentation network which learns the label probabilities for each pixel, and a pairwise affinity network which learns affinity matrix and refines the probability map generated from the unary network.

Segmentation Weakly supervised Semantic Segmentation +1

Paper
Add Code

Cross-Domain Few-Shot Classification via Learned Feature-Wise Transformation

1 code implementation • ICLR 2020 • Hung-Yu Tseng, Hsin-Ying Lee, Jia-Bin Huang, Ming-Hsuan Yang

Few-shot classification aims to recognize novel categories with only few labeled images in each class.

Ranked #6 on Cross-Domain Few-Shot on CUB

Classification Cross-Domain Few-Shot +2

319

Paper
Code

Exploiting Semantics for Face Image Deblurring

no code implementations • 19 Jan 2020 • Ziyi Shen, Wei-Sheng Lai, Tingfa Xu, Jan Kautz, Ming-Hsuan Yang

Specifically, we first use a coarse deblurring network to reduce the motion blur on the input face image.

Deblurring Face Recognition +1

Paper
Add Code

Visual Question Answering on 360° Images

no code implementations • 10 Jan 2020 • Shih-Han Chou, Wei-Lun Chao, Wei-Sheng Lai, Min Sun, Ming-Hsuan Yang

We then study two different VQA models on VQA 360, including one conventional model that takes an equirectangular image (with intrinsic distortion) as input and one dedicated model that first projects a 360 image onto cubemaps and subsequently aggregates the information from multiple spatial resolutions.

Question Answering Visual Question Answering

Paper
Add Code

CrDoCo: Pixel-level Domain Transfer with Cross-Domain Consistency

no code implementations • CVPR 2019 • Yun-Chun Chen, Yen-Yu Lin, Ming-Hsuan Yang, Jia-Bin Huang

Unsupervised domain adaptation algorithms aim to transfer the knowledge learned from one domain to another (e. g., synthetic to real images).

Data Augmentation Image-to-Image Translation +3

Paper
Add Code

RC-DARTS: Resource Constrained Differentiable Architecture Search

no code implementations • 30 Dec 2019 • Xiaojie Jin, Jiang Wang, Joshua Slocum, Ming-Hsuan Yang, Shengyang Dai, Shuicheng Yan, Jiashi Feng

In this paper, we propose the resource constrained differentiable architecture search (RC-DARTS) method to learn architectures that are significantly smaller and faster while achieving comparable accuracy.

Image Classification One-Shot Learning

Paper
Add Code

Controllable and Progressive Image Extrapolation

no code implementations • 25 Dec 2019 • Yijun Li, Lu Jiang, Ming-Hsuan Yang

Image extrapolation aims at expanding the narrow field of view of a given image patch.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.