Search Results for author: Haoyu Ma

Found 36 papers, 15 papers with code

MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers

no code implementations19 Dec 2023 Haoyu Ma, Shahin Mahdizadehaghdam, Bichen Wu, Zhipeng Fan, YuChao Gu, Wenliang Zhao, Lior Shapira, Xiaohui Xie

Recent advances in generative AI have significantly enhanced image and video editing, particularly in the context of text prompt control.

Video Editing

Instance Tracking in 3D Scenes from Egocentric Videos

1 code implementation7 Dec 2023 Yunhan Zhao, Haoyu Ma, Shu Kong, Charless Fowlkes

We explore this problem by first introducing a new benchmark dataset, consisting of RGB and depth videos, per-frame camera pose, and instance-level annotations in both 2D camera and 3D world coordinates.

Human-Object Interaction Detection Object Tracking

CVTHead: One-shot Controllable Head Avatar with Vertex-feature Transformer

1 code implementation11 Nov 2023 Haoyu Ma, Tong Zhang, Shanlin Sun, Xiangyi Yan, Kun Han, Xiaohui Xie

Reconstructing personalized animatable head avatars has significant implications in the fields of AR/VR.

Neural Rendering

HarmonyDream: Task Harmonization Inside World Models

no code implementations30 Sep 2023 Haoyu Ma, Jialong Wu, Ningya Feng, Chenjun Xiao, Dong Li, Jianye Hao, Jianmin Wang, Mingsheng Long

Model-based reinforcement learning (MBRL) holds the promise of sample-efficient learning by utilizing a world model, which models how the environment works and typically encompasses components for two tasks: observation modeling and reward modeling.

Atari Games 100k Model-based Reinforcement Learning +1

Light Field Diffusion for Single-View Novel View Synthesis

no code implementations20 Sep 2023 Yifeng Xiong, Haoyu Ma, Shanlin Sun, Kun Han, Hao Tang, Xiaohui Xie

Starting from the camera pose matrices, LFD transforms them into light field encoding, with the same shape as the reference image, to describe the direction of each ray.

Denoising Novel View Synthesis +1

Hybrid-CSR: Coupling Explicit and Implicit Shape Representation for Cortical Surface Reconstruction

no code implementations23 Jul 2023 Shanlin Sun, Thanh-Tung Le, Chenyu You, Hao Tang, Kun Han, Haoyu Ma, Deying Kong, Xiangyi Yan, Xiaohui Xie

We present Hybrid-CSR, a geometric deep-learning model that combines explicit and implicit shape representations for cortical surface reconstruction.

Surface Reconstruction

Pre-training Contextualized World Models with In-the-wild Videos for Reinforcement Learning

1 code implementation NeurIPS 2023 Jialong Wu, Haoyu Ma, Chaoyi Deng, Mingsheng Long

To tackle this issue, we introduce Contextualized World Models (ContextWM) that explicitly separate context and dynamics modeling to overcome the complexity and diversity of in-the-wild videos and facilitate knowledge transfer between distinct scenes.

Autonomous Driving Model-based Reinforcement Learning +3

OPDN: Omnidirectional Position-aware Deformable Network for Omnidirectional Image Super-Resolution

no code implementations26 Apr 2023 Xiaopeng Sun, Weiqi Li, Zhenyu Zhang, Qiufang Ma, Xuhan Sheng, Ming Cheng, Haoyu Ma, Shijie Zhao, Jian Zhang, Junlin Li, Li Zhang

Model A aims to enhance the feature extraction ability of 360{\deg} image positional information, while Model B further focuses on the high-frequency information of 360{\deg} images.

Image Super-Resolution Position

Localized Region Contrast for Enhancing Self-Supervised Learning in Medical Image Segmentation

no code implementations6 Apr 2023 Xiangyi Yan, Junayed Naushad, Chenyu You, Hao Tang, Shanlin Sun, Kun Han, Haoyu Ma, James Duncan, Xiaohui Xie

In this paper, we propose a novel contrastive learning framework that integrates Localized Region Contrast (LRC) to enhance existing self-supervised pre-training methods for medical image segmentation.

Contrastive Learning Image Segmentation +5

I2F: A Unified Image-to-Feature Approach for Domain Adaptive Semantic Segmentation

no code implementations3 Jan 2023 Haoyu Ma, Xiangru Lin, Yizhou Yu

This paper proposes a novel UDA pipeline for semantic segmentation that unifies image-level and feature-level adaptation.

Segmentation Semantic Segmentation +1

Peeling the Onion: Hierarchical Reduction of Data Redundancy for Efficient Vision Transformer Training

1 code implementation19 Nov 2022 Zhenglun Kong, Haoyu Ma, Geng Yuan, Mengshu Sun, Yanyue Xie, Peiyan Dong, Xin Meng, Xuan Shen, Hao Tang, Minghai Qin, Tianlong Chen, Xiaolong Ma, Xiaohui Xie, Zhangyang Wang, Yanzhi Wang

Vision transformers (ViTs) have recently obtained success in many applications, but their intensive computation and heavy memory usage at both training and inference time limit their generalization.

Identity-Aware Hand Mesh Estimation and Personalization from RGB Images

1 code implementation22 Sep 2022 Deying Kong, Linguang Zhang, Liangjian Chen, Haoyu Ma, Xiangyi Yan, Shanlin Sun, Xingwei Liu, Kun Han, Xiaohui Xie

In this paper, we propose an identity-aware hand mesh estimation model, which can incorporate the identity information represented by the intrinsic shape parameters of the subject.

PPT: token-Pruned Pose Transformer for monocular and multi-view human pose estimation

1 code implementation16 Sep 2022 Haoyu Ma, Zhe Wang, Yifei Chen, Deying Kong, Liangjian Chen, Xingwei Liu, Xiangyi Yan, Hao Tang, Xiaohui Xie

In this paper, we propose the token-Pruned Pose Transformer (PPT) for 2D human pose estimation, which can locate a rough human mask and performs self-attention only within selected tokens.

Ranked #17 on 3D Human Pose Estimation on Human3.6M (using extra training data)

2D Human Pose Estimation 3D Human Pose Estimation

Auto-ViT-Acc: An FPGA-Aware Automatic Acceleration Framework for Vision Transformer with Mixed-Scheme Quantization

no code implementations10 Aug 2022 Zhengang Li, Mengshu Sun, Alec Lu, Haoyu Ma, Geng Yuan, Yanyue Xie, Hao Tang, Yanyu Li, Miriam Leeser, Zhangyang Wang, Xue Lin, Zhenman Fang

Compared with state-of-the-art ViT quantization work (algorithmic approach only without hardware acceleration), our quantization achieves 0. 47% to 1. 36% higher Top-1 accuracy under the same bit-width.


Training Your Sparse Neural Network Better with Any Mask

1 code implementation26 Jun 2022 Ajay Jaiswal, Haoyu Ma, Tianlong Chen, Ying Ding, Zhangyang Wang

Pruning large neural networks to create high-quality, independently trainable sparse masks, which can maintain similar performance to their dense counterparts, is very desirable due to the reduced space and time complexity.

Diffeomorphic Image Registration with Neural Velocity Field

no code implementations25 Feb 2022 Kun Han, Shanlin Sun, Xiangyi Yan, Chenyu You, Hao Tang, Junayed Naushad, Haoyu Ma, Deying Kong, Xiaohui Xie

Here we propose a new optimization-based method named DNVF (Diffeomorphic Image Registration with Neural Velocity Field) which utilizes deep neural network to model the space of admissible transformations.

Image Registration

Sparsity Winning Twice: Better Robust Generalization from More Efficient Training

1 code implementation ICLR 2022 Tianlong Chen, Zhenyu Zhang, Pengjun Wang, Santosh Balachandra, Haoyu Ma, Zehao Wang, Zhangyang Wang

We introduce two alternatives for sparse adversarial training: (i) static sparsity, by leveraging recent results from the lottery ticket hypothesis to identify critical sparse subnetworks arising from the early training; (ii) dynamic sparsity, by allowing the sparse subnetwork to adaptively adjust its connectivity pattern (while sticking to the same sparsity ratio) throughout training.

VAQF: Fully Automatic Software-Hardware Co-Design Framework for Low-Bit Vision Transformer

no code implementations17 Jan 2022 Mengshu Sun, Haoyu Ma, Guoliang Kang, Yifan Jiang, Tianlong Chen, Xiaolong Ma, Zhangyang Wang, Yanzhi Wang

To the best of our knowledge, this is the first time quantization has been incorporated into ViT acceleration on FPGAs with the help of a fully automatic framework to guide the quantization strategy on the software side and the accelerator implementations on the hardware side given the target frame rate.


Over-the-Air Aggregation for Federated Learning: Waveform Superposition and Prototype Validation

no code implementations27 Oct 2021 Huayan Guo, Yifan Zhu, Haoyu Ma, Vincent K. N. Lau, Kaibin Huang, Xiaofan Li, Huabin Nong, Mingyu Zhou

In this paper, we develop an orthogonal-frequency-division-multiplexing (OFDM)-based over-the-air (OTA) aggregation solution for wireless federated learning (FL).

Federated Learning

AFTer-UNet: Axial Fusion Transformer UNet for Medical Image Segmentation

no code implementations20 Oct 2021 Xiangyi Yan, Hao Tang, Shanlin Sun, Haoyu Ma, Deying Kong, Xiaohui Xie

One has to either downsample the image or use cropped local patches to reduce GPU memory usage, which limits its performance.

Image Segmentation Medical Image Segmentation +3

TransFusion: Cross-view Fusion with Transformer for 3D Human Pose Estimation

1 code implementation18 Oct 2021 Haoyu Ma, Liangjian Chen, Deying Kong, Zhe Wang, Xingwei Liu, Hao Tang, Xiangyi Yan, Yusheng Xie, Shih-Yao Lin, Xiaohui Xie

The 3D position encoding guided by the epipolar field provides an efficient way of encoding correspondences between pixels of different views.

Ranked #20 on 3D Human Pose Estimation on Human3.6M (using extra training data)

3D Human Pose Estimation 3D Pose Estimation

Stingy Teacher: Sparse Logits Suffice to Fail Knowledge Distillation

no code implementations29 Sep 2021 Haoyu Ma, Yifan Huang, Tianlong Chen, Hao Tang, Chenyu You, Zhangyang Wang, Xiaohui Xie

However, it is unclear why the distorted distribution of the logits is catastrophic to the student model.

Knowledge Distillation

SGE net: Video object detection with squeezed GRU and information entropy map

no code implementations14 Jun 2021 Rui Su, Wenjing Huang, Haoyu Ma, Xiaowei Song, Jinglu Hu

Compared with object detection of static images, video object detection is more challenging due to the motion of objects, while providing rich temporal information.

Object object-detection +1

Undistillable: Making A Nasty Teacher That CANNOT teach students

1 code implementation ICLR 2021 Haoyu Ma, Tianlong Chen, Ting-Kuei Hu, Chenyu You, Xiaohui Xie, Zhangyang Wang

Knowledge Distillation (KD) is a widely used technique to transfer knowledge from pre-trained teacher models to (usually more lightweight) student models.

Knowledge Distillation

Spending Your Winning Lottery Better After Drawing It

1 code implementation8 Jan 2021 Ajay Kumar Jaiswal, Haoyu Ma, Tianlong Chen, Ying Ding, Zhangyang Wang

In this paper, we demonstrate that it is unnecessary for spare retraining to strictly inherit those properties from the dense network.

Knowledge Distillation

SIA-GCN: A Spatial Information Aware Graph Neural Network with 2D Convolutions for Hand Pose Estimation

no code implementations25 Sep 2020 Deying Kong, Haoyu Ma, Xiaohui Xie

In this paper, we extend GNNs along two directions: a) allowing features at each node to be represented by 2D spatial confidence maps instead of 1D vectors; and b) proposing an efficient operation to integrate information from neighboring nodes through 2D convolutions with different learnable kernels at each edge.

Hand Pose Estimation

Real-MFF: A Large Realistic Multi-focus Image Dataset with Ground Truth

no code implementations28 Mar 2020 Juncheng Zhang, Qingmin Liao, Shaojun Liu, Haoyu Ma, Wenming Yang, Jing-Hao Xue

In this letter, we introduce a large and realistic multi-focus dataset called Real-MFF, which contains 710 pairs of source images with corresponding ground truth images.

Rotation-invariant Mixed Graphical Model Network for 2D Hand Pose Estimation

no code implementations5 Feb 2020 Deying Kong, Haoyu Ma, Yifei Chen, Xiaohui Xie

In this paper, we propose a new architecture named Rotation-invariant Mixed Graphical Model Network (R-MGMN) to solve the problem of 2D hand pose estimation from a monocular RGB image.

Hand Pose Estimation

Nonparametric Structure Regularization Machine for 2D Hand Pose Estimation

1 code implementation24 Jan 2020 Yifei Chen, Haoyu Ma, Deying Kong, Xiangyi Yan, Jianbao Wu, Wei Fan, Xiaohui Xie

We propose a novel Nonparametric Structure Regularization Machine (NSRM) for 2D hand pose estimation, adopting a cascade multi-task architecture to learn hand structure and keypoint representations jointly.

Hand Pose Estimation

An α-Matte Boundary Defocus Model Based Cascaded Network for Multi-focus Image Fusion

2 code implementations29 Oct 2019 Haoyu Ma, Qingmin Liao, Juncheng Zhang, Shaojun Liu, Jing-Hao Xue

Based on this {\alpha}-matte defocus model and the generated data, a cascaded boundary aware convolutional network termed MMF-Net is proposed and trained, aiming to achieve clearer fusion results around the FDB.

Adaptive Graphical Model Network for 2D Handpose Estimation

1 code implementation18 Sep 2019 Deying Kong, Yifei Chen, Haoyu Ma, Xiangyi Yan, Xiaohui Xie

In this paper, we propose a new architecture called Adaptive Graphical Model Network (AGMN) to tackle the task of 2D hand pose estimation from a monocular RGB image.

Hand Pose Estimation

Boundary Aware Multi-Focus Image Fusion Using Deep Neural Network

no code implementations30 Mar 2019 Haoyu Ma, Juncheng Zhang, Shaojun Liu, Qingmin Liao

Since it is usually difficult to capture an all-in-focus image of a 3D scene directly, various multi-focus image fusion methods are employed to generate it from several images focusing at different depths.

Cannot find the paper you are looking for? You can Submit a new open access paper.