Search Results for author: Huimin Ma

Found 42 papers, 13 papers with code

A2RNet: Adversarial Attack Resilient Network for Robust Infrared and Visible Image Fusion

1 code implementation13 Dec 2024 Jiawei Li, Hongwei Yu, Jiansheng Chen, Xinlong Ding, Jinlong Wang, JinYuan Liu, Bochao Zou, Huimin Ma

Infrared and visible image fusion (IVIF) is a crucial technique for enhancing visual performance by integrating unique information from different modalities into one fused image.

Adversarial Attack Infrared And Visible Image Fusion

TopV-Nav: Unlocking the Top-View Spatial Reasoning Potential of MLLM for Zero-shot Object Navigation

no code implementations25 Nov 2024 Linqing Zhong, Chen Gao, Zihan Ding, Yue Liao, Huimin Ma, Shifeng Zhang, Xu Zhou, Si Liu

Such a goal-oriented exploration heavily relies on the ability to perceive, understand, and reason based on the spatial information of the environment.

Spatial Reasoning

CreDes: Causal Reasoning Enhancement and Dual-End Searching for Solving Long-Range Reasoning Problems using LLMs

no code implementations2 Oct 2024 Kangsheng Wang, Xiao Zhang, Hao liu, Songde Han, Huimin Ma, Tianyu Hu

Large language models (LLMs) have demonstrated limitations in handling combinatorial optimization problems involving long-range reasoning, partially due to causal hallucinations and huge search space.

Combinatorial Optimization

Exploring Information-Theoretic Metrics Associated with Neural Collapse in Supervised Training

no code implementations25 Sep 2024 Kun Song, Zhiquan Tan, Bochao Zou, Jiansheng Chen, Huimin Ma, Weiran Huang

Experiments show that matrix entropy cannot solely describe the interaction of the information content of data representation and classification head weights but it can effectively reflect the similarity and clustering behavior of the data.

Classification cross-modal alignment

CSCE: Boosting LLM Reasoning by Simultaneous Enhancing of Causal Significance and Consistency

no code implementations20 Sep 2024 Kangsheng Wang, Xiao Zhang, Juntao Lyu, Tianyu Hu, Huimin Ma

Chain-based reasoning methods like chain of thought (CoT) play a rising role in solving reasoning tasks for large language models (LLMs).

Synergistic Spotting and Recognition of Micro-Expression via Temporal State Transition

1 code implementation15 Sep 2024 Bochao Zou, Zizheng Guo, Wenfeng Qin, Xin Li, Kangsheng Wang, Huimin Ma

The analysis of micro-expressions generally involves two main tasks: spotting micro-expression intervals in long videos and recognizing the emotions associated with these intervals.

Classification

Unlocking Attributes' Contribution to Successful Camouflage: A Combined Textual and VisualAnalysis Strategy

1 code implementation22 Aug 2024 Hong Zhang, Yixuan Lyu, Qian Yu, Hanyang Liu, Huimin Ma, Ding Yuan, Yifan Yang

In the domain of Camouflaged Object Segmentation (COS), despite continuous improvements in segmentation performance, the underlying mechanisms of effective camouflage remain poorly understood, akin to a black box.

Attribute Camouflaged Object Segmentation +1

Unveiling the Dynamics of Information Interplay in Supervised Learning

no code implementations6 Jun 2024 Kun Song, Zhiquan Tan, Bochao Zou, Huimin Ma, Weiran Huang

In this paper, we use matrix information theory as an analytical tool to analyze the dynamics of the information interplay between data representations and classification head vectors in the supervised learning process.

Linear Mode Connectivity

ODGEN: Domain-specific Object Detection Data Generation with Diffusion Models

no code implementations24 May 2024 Jingyuan Zhu, Shiyu Li, Yuxuan Liu, Ping Huang, Jiulong Shan, Huimin Ma, Jian Yuan

Given a domain-specific object detection dataset, we first fine-tune a pre-trained diffusion model on both cropped foreground objects and entire images to fit target distributions.

Object object-detection +1

RhythmMamba: Fast Remote Physiological Measurement with Arbitrary Length Videos

1 code implementation9 Apr 2024 Bochao Zou, Zizheng Guo, Xiaocheng Hu, Huimin Ma

Remote photoplethysmography (rPPG) is a non-contact method for detecting physiological signals from facial videos, holding great potential in various applications such as healthcare, affective computing, and anti-spoofing.

Mamba

Isolated Diffusion: Optimizing Multi-Concept Text-to-Image Generation Training-Freely with Isolated Diffusion Guidance

no code implementations25 Mar 2024 Jingyuan Zhu, Huimin Ma, Jiansheng Chen, Jian Yuan

This paper presents a general approach for text-to-image diffusion models to address the mutual interference between different subjects and their attachments in complex scenes, pursuing better text-image consistency.

object-detection Object Detection +1

RhythmFormer: Extracting Patterned rPPG Signals based on Periodic Sparse Attention

1 code implementation20 Feb 2024 Bochao Zou, Zizheng Guo, Jiansheng Chen, Junbao Zhuo, Weiran Huang, Huimin Ma

Remote photoplethysmography (rPPG) is a non-contact method for detecting physiological signals based on facial videos, holding high potential in various applications.

DomainStudio: Fine-Tuning Diffusion Models for Domain-Driven Image Generation using Limited Data

1 code implementation25 Jun 2023 Jingyuan Zhu, Huimin Ma, Jiansheng Chen, Jian Yuan

Typical diffusion models and modern large-scale conditional generative models like text-to-image generative models are vulnerable to overfitting when fine-tuned on extremely limited data.

Denoising Diversity +1

Few-shot 3D Shape Generation

no code implementations19 May 2023 Jingyuan Zhu, Huimin Ma, Jiansheng Chen, Jian Yuan

Our approach only needs the silhouettes of few-shot target samples as training data to learn target geometry distributions and achieve generated shapes with diverse topology and textures.

3D Shape Generation Diversity +2

Enhancing Short-Term Wind Speed Forecasting using Graph Attention and Frequency-Enhanced Mechanisms

no code implementations19 May 2023 Hao liu, Huimin Ma, Tianyu Hu

In this paper, a Graph-attentive Frequency-enhanced Spatial-Temporal Wind Speed Forecasting model based on graph attention and frequency-enhanced mechanisms, i. e., GFST-WSF, is proposed to improve the accuracy of short-term wind speed forecasting.

Graph Attention

MotionVideoGAN: A Novel Video Generator Based on the Motion Space Learned from Image Pairs

1 code implementation6 Mar 2023 Jingyuan Zhu, Huimin Ma, Jiansheng Chen, Jian Yuan

We present MotionVideoGAN, a novel video generator synthesizing videos based on the motion space learned by pre-trained image pair generators.

Motion Generation Unconditional Video Generation

Gestalt-Guided Image Understanding for Few-Shot Learning

1 code implementation8 Feb 2023 Kun Song, Yuchen Wu, Jiansheng Chen, Tianyu Hu, Huimin Ma

Due to the scarcity of available data, deep learning does not perform well on few-shot learning tasks.

Few-Shot Learning

Few-shot Image Generation with Diffusion Models

1 code implementation7 Nov 2022 Jingyuan Zhu, Huimin Ma, Jiansheng Chen, Jian Yuan

Then we fine-tune DDPMs pre-trained on large source domains to solve the overfitting problem when training data is limited.

Denoising Diversity +2

Few-shot Image Generation via Masked Discrimination

1 code implementation27 Oct 2022 Jingyuan Zhu, Huimin Ma, Jiansheng Chen, Jian Yuan

It strengthens global image discrimination and guides adapted GANs to preserve more information learned from source domains for higher image quality.

Diversity Image Generation

Variational Automatic Curriculum Learning for Sparse-Reward Cooperative Multi-Agent Problems

1 code implementation NeurIPS 2021 Jiayu Chen, Yuanxin Zhang, Yuanfan Xu, Huimin Ma, Huazhong Yang, Jiaming Song, Yu Wang, Yi Wu

We motivate our paradigm through a variational perspective, where the learning objective can be decomposed into two terms: task learning on the current task distribution, and curriculum update to a new task distribution.

Multi-agent Reinforcement Learning

Video Frame Interpolation via Structure-Motion based Iterative Fusion

no code implementations11 May 2021 Xi Li, Meng Cao, Yingying Tang, Scott Johnston, Zhendong Hong, Huimin Ma, Jiulong Shan

Inspired by the observation that audiences have different visual preferences on foreground and background objects, we for the first time propose to use saliency masks in the evaluation processes of the task of video frame interpolation.

Optical Flow Estimation Video Frame Interpolation

Defending Against Universal Adversarial Patches by Clipping Feature Norms

no code implementations ICCV 2021 Cheng Yu, Jiansheng Chen, Youze Xue, Yuyang Liu, Weitao Wan, Jiayu Bao, Huimin Ma

Physical-world adversarial attacks based on universal adversarial patches have been proved to be able to mislead deep convolutional neural networks (CNNs), exposing the vulnerability of real-world visual classification systems based on CNNs.

Unsupervised segmentation via semantic-apparent feature fusion

no code implementations21 May 2020 Xi Li, Huimin Ma, Hongbing Ma, Yidong Wang

In order to solve this problem, the research proposes an unsupervised foreground segmentation method based on semantic-apparent feature fusion (SAFF).

Foreground Segmentation Segmentation

Weakly-Supervised Semantic Segmentation by Iterative Affinity Learning

no code implementations19 Feb 2020 Xiang Wang, Sifei Liu, Huimin Ma, Ming-Hsuan Yang

In this paper, we propose an iterative algorithm to learn such pairwise relations, which consists of two branches, a unary segmentation network which learns the label probabilities for each pixel, and a pairwise affinity network which learns affinity matrix and refines the probability map generated from the unary network.

Segmentation Weakly supervised Semantic Segmentation +1

Semantic Head Enhanced Pedestrian Detection in a Crowd

no code implementations27 Nov 2019 Ruiqi Lu, Huimin Ma

Pedestrian detection in the crowd is a challenging task because of intra-class occlusion.

Head Detection Pedestrian Detection

Occluded Pedestrian Detection with Visible IoU and Box Sign Predictor

no code implementations26 Nov 2019 Ruiqi Lu, Huimin Ma

Training a robust classifier and an accurate box regressor are difficult for occluded pedestrian detection.

Pedestrian Detection

WSOD with PSNet and Box Regression

no code implementations26 Nov 2019 Sheng Yi, Xi Li, Huimin Ma

To solve this problem, we added the box regression module to the weakly supervised object detection network and proposed a proposal scoring network (PSNet) to supervise it.

Object object-detection +3

Pretrain Soft Q-Learning with Imperfect Demonstrations

no code implementations9 May 2019 Xiaoqin Zhang, Yunfei Li, Huimin Ma, Xiong Luo

Pretraining reinforcement learning methods with demonstrations has been an important concept in the study of reinforcement learning since a large amount of computing power is spent on online simulations with existing reinforcement learning algorithms.

Q-Learning reinforcement-learning +2

Weakly-Supervised Semantic Segmentation by Iteratively Mining Common Object Features

no code implementations CVPR 2018 Xiang Wang, ShaoDi You, Xi Li, Huimin Ma

Then in the top-down step, the refined object regions are used as supervision to train the segmentation network and to predict object masks.

General Classification Object +3

Driving maneuvers prediction based on cognition-driven and data-driven method

no code implementations8 May 2018 Dong Zhou, Huimin Ma, Yuhan Dong

To overcome this challenge, we propose a novel method that combines both the cognition-driven model and the data-driven model.

Pretraining Deep Actor-Critic Reinforcement Learning Algorithms With Expert Demonstrations

no code implementations31 Jan 2018 Xiaoqin Zhang, Huimin Ma

We apply our method to two of the typical actor-critic reinforcement learning algorithms, DDPG and ACER, and demonstrate with experiments that our method not only outperforms the RL algorithms without pretraining process, but also is more simulation efficient.

Deep Reinforcement Learning reinforcement-learning +1

Edge Preserving and Multi-Scale Contextual Neural Network for Salient Object Detection

no code implementations29 Aug 2016 Xiang Wang, Huimin Ma, Xiaozhi Chen, ShaoDi You

In this paper, we propose a novel edge preserving and multi-scale contextual neural network for salient object detection.

Object object-detection +3

Improving Object Proposals With Multi-Thresholding Straddling Expansion

no code implementations CVPR 2015 Xiaozhi Chen, Huimin Ma, Xiang Wang, Zhichen Zhao

Based on the characteristics of superpixel tightness distribution, we propose an effective method, namely multi-thresholding straddling expansion (MTSE) to reduce localization bias via fast diversification.

Object object-detection +1

Cannot find the paper you are looking for? You can Submit a new open access paper.