Search Results for author: Ming Yang

Found 129 papers, 39 papers with code

Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO

no code implementations27 May 2025 Muzhi Zhu, Hao Zhong, Canyu Zhao, Zongze Du, Zheng Huang, MingYu Liu, Hao Chen, Cheng Zou, Jingdong Chen, Ming Yang, Chunhua Shen

However, despite the importance of active perception in embodied intelligence, there is little to no exploration of how MLLMs can be equipped with or learn active perception capabilities.

Autonomous Driving Decision Making +2

Weather-Magician: Reconstruction and Rendering Framework for 4D Weather Synthesis In Real Time

no code implementations26 May 2025 Chen Sang, Yeqiang Qian, Jiale Zhang, Chunxiang Wang, Ming Yang

For tasks such as urban digital twins, VR/AR/game scene design, or creating synthetic films, the traditional industrial approach often involves manually modeling scenes and using various rendering engines to complete the rendering process.

Automated Text-to-Table for Reasoning-Intensive Table QA: Pipeline Design and Benchmarking Insights

1 code implementation26 May 2025 Shi-Yu Tian, Zhi Zhou, Wei Dong, Ming Yang, Kun-Yang Yu, Zi-Jian Cheng, Lan-Zhe Guo, Yu-Feng Li

Reasoning with tabular data holds increasing importance in modern applications, yet comprehensive evaluation methodologies for reasoning-intensive Table Question Answering (QA) tasks remain nascent.

Benchmarking Question Answering

GMatch: Geometry-Constrained Feature Matching for RGB-D Object Pose Estimation

no code implementations22 May 2025 Ming Yang, Haoran Li

We present GMatch, a learning-free feature matcher designed for robust 6DoF object pose estimation, addressing common local ambiguities in sparse feature matching.

Pose Estimation

JAEGER: Dual-Level Humanoid Whole-Body Controller

no code implementations10 May 2025 Ziluo Ding, Haobin Jiang, Yuxuan Wang, Zhenguo Sun, Yu Zhang, Xiaojie Niu, Ming Yang, Weishuai Zeng, Xinrun Xu, Zongqing Lu

This paper presents JAEGER, a dual-level whole-body controller for humanoid robots that addresses the challenges of training a more robust and versatile policy.

From Mapping to Composing: A Two-Stage Framework for Zero-shot Composed Image Retrieval

no code implementations25 Apr 2025 Yabing Wang, Zhuotao Tian, Qingpei Guo, Zheng Qin, Sanping Zhou, Ming Yang, Le Wang

In the second stage, we optimize the text encoder using a small amount of synthetic triplet data, enabling it to effectively extract compositional semantics by combining pseudo-word tokens with modification text for accurate target image retrieval.

Image Retrieval Retrieval +1

Single-loop Algorithms for Stochastic Non-convex Optimization with Weakly-Convex Constraints

no code implementations21 Apr 2025 Ming Yang, Gang Li, Quanqi Hu, Qihang Lin, Tianbao Yang

Constrained optimization with multiple functional inequality constraints has significant applications in machine learning.

Continual Learning Fairness

Knowledge Rectification for Camouflaged Object Detection: Unlocking Insights from Low-Quality Data

no code implementations28 Mar 2025 Juwei Guan, Xiaolin Fang, Donghyun Kim, Haotian Gong, Tongxin Zhu, Zhen Ling, Ming Yang

Low-quality data often suffer from insufficient image details, introducing an extra implicit aspect of camouflage that complicates camouflaged object detection (COD).

Diversity object-detection +2

Learning-based 3D Reconstruction in Autonomous Driving: A Comprehensive Survey

no code implementations17 Mar 2025 Liewen Liao, Weihao Yan, Ming Yang, Songan Zhang

Learning-based 3D reconstruction has emerged as a transformative technique in autonomous driving, enabling precise modeling of both dynamic and static environments through advanced neural representations.

3D Reconstruction Autonomous Driving +2

SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories

1 code implementation CVPR 2025 Muzhi Zhu, Yuzhuo Tian, Hao Chen, Chunluan Zhou, Qingpei Guo, Yang Liu, Ming Yang, Chunhua Shen

While MLLMs have demonstrated adequate image understanding capabilities, they still struggle with pixel-level comprehension, limiting their practical applications.

Decision Making Interactive Segmentation +4

Versatile Multimodal Controls for Expressive Talking Human Animation

no code implementations10 Mar 2025 Zheng Qin, Ruobing Zheng, Yabing Wang, Tianqi Li, Zixin Zhu, Sanping Zhou, Ming Yang, Le Wang

AI-generated content faces similar requirements, where users not only need automatic generation of lip synchronization and basic gestures from audio input but also desire semantically accurate and expressive body movement that can be ``directly guided'' through text descriptions.

Human Animation

BMIP: Bi-directional Modality Interaction Prompt Learning for VLM

no code implementations14 Jan 2025 Song-Lin Lv, Yu-Yang Chen, Zhi Zhou, Ming Yang, Lan-Zhe Guo

Vision-language models (VLMs) have exhibited remarkable generalization capabilities, and prompt learning for VLMs has attracted great attention for the ability to adapt pre-trained VLMs to specific downstream tasks.

Domain Generalization Prompt Learning

Reversing Flow for Image Restoration

no code implementations CVPR 2025 Haina Qin, Wenyang Luo, Libin Wang, Dandan Zheng, Jingdong Chen, Ming Yang, Bing Li, Weiming Hu

Image restoration aims to recover high-quality (HQ) images from degraded low-quality (LQ) ones by reversing the effects of degradation.

Image Restoration

Referencing Where to Focus: Improving VisualGrounding with Referential Query

no code implementations26 Dec 2024 Yabing Wang, Zhuotao Tian, Qingpei Guo, Zheng Qin, Sanping Zhou, Ming Yang, Le Wang

It consists of the query adaption module that can be seamlessly integrated into CLIP and generate the referential query to provide the prior context for decoder, along with a task-specific decoder.

Decoder Visual Grounding

Cross-View Image Set Geo-Localization

no code implementations25 Dec 2024 Qiong Wu, Panwang Xia, Lei Yu, Yi Liu, Mingtao Xiong, Liheng Zhong, Jingdong Chen, Ming Yang, Yongjun Zhang, Yi Wan

Therefore, we propose a novel task: Cross-View Image Set Geo-Localization (Set-CVGL), which gathers multiple images with diverse perspectives as a query set for localization.

geo-localization

GraphicsDreamer: Image to 3D Generation with Physical Consistency

no code implementations18 Dec 2024 Pei Chen, Fudong Wang, Yixuan Tong, Jingdong Chen, Ming Yang, Minghui Yang

Recently, the surge of efficient and automated 3D AI-generated content (AIGC) methods has increasingly illuminated the path of transforming human imagination into complex 3D structures.

3D Generation Image to 3D

Cross-View Geo-Localization with Street-View and VHR Satellite Imagery in Decentrality Settings

2 code implementations16 Dec 2024 Panwang Xia, Lei Yu, Yi Wan, Qiong Wu, Peiqi Chen, Liheng Zhong, Yongxiang Yao, Dong Wei, Xinyi Liu, Lixiang Ru, Yingying Zhang, Jiangwei Lao, Jingdong Chen, Ming Yang, Yongjun Zhang

To address this limitation, we introduce DReSS (Decentrality Related Street-view and Satellite-view dataset), a novel dataset designed to evaluate cross-view geo-localization with a large geographic scope and diverse landscapes, emphasizing the decentrality issue.

Disaster Response geo-localization +1

STDHL: Spatio-Temporal Dynamic Hypergraph Learning for Wind Power Forecasting

no code implementations16 Dec 2024 Xiaochong Dong, Xuemin Zhang, Ming Yang, Shengwei Mei

This model uses a hypergraph structure to represent spatial features among wind farms.

Decoder

Cross-Modal Visual Relocalization in Prior LiDAR Maps Utilizing Intensity Textures

no code implementations2 Dec 2024 Qiyuan Shen, Hengwang Zhao, Weihao Yan, Chunxiang Wang, Tong Qin, Ming Yang

In this paper, we propose a cross-modal visual relocalization system in prior LiDAR maps utilizing intensity textures, which consists of three main modules: map projection, coarse retrieval, and fine relocalization.

3D geometry Pose Estimation +1

LokiTalk: Learning Fine-Grained and Generalizable Correspondences to Enhance NeRF-based Talking Head Synthesis

no code implementations29 Nov 2024 Tianqi Li, Ruobing Zheng, Bonan Li, ZiCheng Zhang, Meng Wang, Jingdong Chen, Ming Yang

Despite significant progress in talking head synthesis since the introduction of Neural Radiance Fields (NeRF), visual artifacts and high training costs persist as major obstacles to large-scale commercial adoption.

NeRF Transfer Learning

Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis

2 code implementations29 Nov 2024 Tianqi Li, Ruobing Zheng, Minghui Yang, Jingdong Chen, Ming Yang

Recent advances in diffusion models have endowed talking head synthesis with subtle expressions and vivid head movements, but have also led to slow inference speed and insufficient control over generated results.

Disentanglement Motion Generation +1

Panther: Illuminate the Sight of Multimodal LLMs with Instruction-Guided Visual Prompts

no code implementations21 Nov 2024 Honglin Li, Yuting Gao, Chenglu Zhu, Jingdong Chen, Ming Yang, Lin Yang

Multimodal large language models (MLLMs) are closing the gap to human visual perception capability rapidly, while, still lag behind on attending to subtle images details or locating small objects precisely, etc.

Decoder

DynFocus: Dynamic Cooperative Network Empowers LLMs with Video Understanding

no code implementations CVPR 2025 Yudong Han, Qingpei Guo, Liyuan Pan, Liu Liu, Yu Guan, Ming Yang

This suggests the possibility of adopting dynamic encoding to balance detailed video information preservation with token budget reduction.

Question Answering Video Understanding

Try-On-Adapter: A Simple and Flexible Try-On Paradigm

no code implementations15 Nov 2024 Hanzhong Guo, Jianfeng Zhang, Cheng Zou, Jun Li, Meng Wang, Ruxue Wen, Pingzhong Tang, Jingdong Chen, Ming Yang

A key challenge of try-on is to generate realistic images of the model wearing the garments while preserving the details of the garments.

Virtual Try-on

LumiSculpt: A Consistency Lighting Control Network for Video Generation

no code implementations30 Oct 2024 Yuxin Zhang, Dandan Zheng, Biao Gong, Jingdong Chen, Ming Yang, WeiMing Dong, Changsheng Xu

Lighting plays a pivotal role in ensuring the naturalness of video generation, significantly influencing the aesthetic quality of the generated content.

Video Generation

Enhancing LLM Agents for Code Generation with Possibility and Pass-rate Prioritized Experience Replay

no code implementations16 Oct 2024 Yuyang Chen, Kaiyan Zhao, Yiming Wang, Ming Yang, Jian Zhang, Xiaoguang Niu

P2Value comprehensively considers the possibility of transformers' output and pass rate and can make use of the redundant resources caused by the problem that most programs collected by LLMs fail to pass any tests.

Code Generation

Animate-X: Universal Character Image Animation with Enhanced Motion Representation

no code implementations14 Oct 2024 Shuai Tan, Biao Gong, Xiang Wang, Shiwei Zhang, Dandan Zheng, Ruobing Zheng, Kecheng Zheng, Jingdong Chen, Ming Yang

Our in-depth analysis suggests to attribute this limitation to their insufficient modeling of motion, which is unable to comprehend the movement pattern of the driving video, thus imposing a pose sequence rigidly onto the target character.

Attribute Image Animation

HOTVCOM: Generating Buzzworthy Comments for Videos

no code implementations23 Sep 2024 Yuyan Chen, Yiwen Qian, Songzhou Yan, Jiyuan Jia, Zhixu Li, Yanghua Xiao, Xiaobo Li, Ming Yang, Qingpei Guo

In the era of social media video platforms, popular ``hot-comments'' play a crucial role in attracting user impressions of short-form videos, making them vital for marketing and branding purpose.

Descriptive Marketing

StyleTokenizer: Defining Image Style by a Single Instance for Controlling Diffusion Models

1 code implementation4 Sep 2024 Wen Li, Muyuan Fang, Cheng Zou, Biao Gong, Ruobing Zheng, Meng Wang, Jingdong Chen, Ming Yang

To tackle these challenges, we introduce StyleTokenizer, a zero-shot style control image generation method that aligns style representation with text representation using a style tokenizer.

Denoising Text to Image Generation +1

Social Debiasing for Fair Multi-modal LLMs

no code implementations13 Aug 2024 Harry Cheng, Yangyang Guo, Qingpei Guo, Ming Yang, Tian Gan, Liqiang Nie

Multi-modal Large Language Models (MLLMs) have advanced significantly, offering powerful vision-language understanding capabilities.

counterfactual

Egocentric Vision Language Planning

no code implementations11 Aug 2024 Zhirui Fang, Ming Yang, Weishuai Zeng, Boyu Li, Junpeng Yue, Ziluo Ding, Xiu Li, Zongqing Lu

LMMs excel in planning long-horizon tasks over symbolic abstractions but struggle with grounding in the physical world, often failing to accurately identify object positions in images.

Decision Making Optical Flow Estimation +1

ParkingE2E: Camera-based End-to-end Parking Network, from Images to Planning

1 code implementation4 Aug 2024 Changze Li, Ziheng Ji, Zhe Chen, Tong Qin, Ming Yang

Real-vehicle experiments further validate the feasibility and effectiveness of the method proposed in this paper.

Decoder Imitation Learning

POA: Pre-training Once for Models of All Sizes

1 code implementation2 Aug 2024 Yingying Zhang, Xin Guo, Jiangwei Lao, Lei Yu, Lixiang Ru, Jian Wang, Guo Ye, Huimei He, Jingdong Chen, Ming Yang

Once pre-trained, POA allows the extraction of pre-trained models of diverse sizes for downstream tasks.

All Representation Learning

Accelerating Pre-training of Multimodal LLMs via Chain-of-Sight

no code implementations22 Jul 2024 Ziyuan Huang, Kaixiang Ji, Biao Gong, Zhiwu Qing, Qinglong Zhang, Kecheng Zheng, Jian Wang, Jingdong Chen, Ming Yang

This paper introduces Chain-of-Sight, a vision-language bridge module that accelerates the pre-training of Multimodal Large Language Models (MLLMs).

MapLocNet: Coarse-to-Fine Feature Registration for Visual Re-Localization in Navigation Maps

no code implementations11 Jul 2024 Hang Wu, Zhenghao Zhang, Siyuan Lin, Xiangru Mu, Qiang Zhao, Ming Yang, Tong Qin

Robust localization is the cornerstone of autonomous driving, especially in challenging urban environments where GPS signals suffer from multipath errors.

Autonomous Driving Image Registration

BLOS-BEV: Navigation Map Enhanced Lane Segmentation Network, Beyond Line of Sight

no code implementations11 Jul 2024 Hang Wu, Zhenghao Zhang, Siyuan Lin, Tong Qin, Jin Pan, Qiang Zhao, Chunjing Xu, Ming Yang

In this paper, we propose BLOS-BEV, a novel BEV segmentation model that incorporates SD maps for accurate beyond line-of-sight perception, up to 200m.

Autonomous Driving BEV Segmentation +1

Domain Generalizable Knowledge Tracing via Concept Aggregation and Relation-Based Attention

no code implementations2 Jul 2024 Yuquan Xie, Wanqi Yang, Jinyu Wei, Ming Yang, Yang Gao

To address this issue, we propose a domain generalization approach for knowledge tracing, where existing education systems are considered source domains, and new education systems with limited data are considered target domains.

Domain Generalization Knowledge Tracing +1

Multi-level Reliable Guidance for Unpaired Multi-view Clustering

no code implementations1 Jul 2024 Like Xin, Wanqi Yang, Lei Wang, Ming Yang

In cross-view learning, reliable view guidance enhances the confidence of the cluster structures in other views.

Clustering Incomplete multi-view clustering

Crowd-Sourced NeRF: Collecting Data from Production Vehicles for 3D Street View Reconstruction

no code implementations24 Jun 2024 Tong Qin, Changze Li, Haoyang Ye, Shaowei Wan, Minzhen Li, Hongwei Liu, Ming Yang

This approach solves the key problem of large-scale reconstruction, that is where the data comes from and how to use them.

3D Reconstruction NeRF +1

SS-ADA: A Semi-Supervised Active Domain Adaptation Framework for Semantic Segmentation

1 code implementation17 Jun 2024 Weihao Yan, Yeqiang Qian, Yueyuan Li, Tao Li, Chunxiang Wang, Ming Yang

In this paper, we propose a novel semi-supervised active domain adaptation (SS-ADA) framework for semantic segmentation that employs an image-level acquisition strategy.

Active Learning Domain Adaptation +2

Monocular Localization with Semantics Map for Autonomous Vehicles

no code implementations6 Jun 2024 Jixiang Wan, Xudong Zhang, Shuzhou Dong, Yuwei Zhang, Yuchen Yang, Ruoxi Wu, Ye Jiang, Jijunnan Li, Jinquan Lin, Ming Yang

To balance efficiency and accuracy, we propose a novel lightweight visual semantic localization algorithm that employs stable semantic features instead of low-level texture features.

Autonomous Driving Computational Efficiency +1

DeCoOp: Robust Prompt Tuning with Out-of-Distribution Detection

1 code implementation1 Jun 2024 Zhi Zhou, Ming Yang, Jiang-Xin Shi, Lan-Zhe Guo, Yu-Feng Li

In this paper, we explore a problem setting called Open-world Prompt Tuning (OPT), which involves tuning prompts on base classes and evaluating on a combination of base and new classes.

Out-of-Distribution Detection

HOPE: A Reinforcement Learning-based Hybrid Policy Path Planner for Diverse Parking Scenarios

2 code implementations31 May 2024 Mingyang Jiang, Yueyuan Li, Songan Zhang, Siyuan Chen, Chunxiang Wang, Ming Yang

This novel solution integrates a reinforcement learning agent with Reeds-Shepp curves, enabling effective planning across diverse scenarios.

Autonomous Driving reinforcement-learning +1

AMFD: Distillation via Adaptive Multimodal Fusion for Multispectral Pedestrian Detection

1 code implementation21 May 2024 Zizhao Chen, Yeqiang Qian, Xiaoxiao Yang, Chunxiang Wang, Ming Yang

This increased inference time has hindered the widespread employment of multispectral pedestrian detection in embedded devices for autonomous systems.

Knowledge Distillation Pedestrian Detection

Unpaired Multi-view Clustering via Reliable View Guidance

no code implementations27 Apr 2024 Like Xin, Wanqi Yang, Lei Wang, Ming Yang

We assume that the view with a good cluster structure is the reliable view, which acts as a supervisor to guide the clustering of the other views.

Clustering Incomplete multi-view clustering

SHE-Net: Syntax-Hierarchy-Enhanced Text-Video Retrieval

no code implementations22 Apr 2024 Xuzheng Yu, Chen Jiang, Xingning Dong, Tian Gan, Ming Yang, Qingpei Guo

In particular, text-video retrieval, which aims to find the top matching videos given text descriptions from a vast video corpus, is an essential function, the primary challenge of which is to bridge the modality gap.

Retrieval Video Retrieval

Cross-to-merge training with class balance strategy for learning with noisy labels

1 code implementation Expert Systems with Applications 2024 Qian Zhang, Yi Zhu, Ming Yang, Ge Jin, YingWen Zhu, Qiu Chen

Although sample selection is a mainstream method in the field of learning with noisy labels, which aims to mitigate the impact of noisy labels during model training, the testing performance of these methods exhibits significant fluctuations across different noise rates and types.

Learning with noisy labels

Zippo: Zipping Color and Transparency Distributions into a Single Diffusion Model

no code implementations17 Mar 2024 Kangyang Xie, BinBin Yang, Hao Chen, Meng Wang, Cheng Zou, Hui Xue, Ming Yang, Chunhua Shen

Beyond the superiority of the text-to-image diffusion model in generating high-quality images, recent studies have attempted to uncover its potential for adapting the learned semantic knowledge to visual perception tasks.

Image Generation

One-Step Multi-View Clustering Based on Transition Probability

no code implementations3 Mar 2024 Wenhui Zhao, Quanxue Gao, Guangfei Li, Cheng Deng, Ming Yang

Despite their successes, current methods lack interpretability in the clustering process and do not sufficiently consider the complementary information across different views.

Clustering

Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis

1 code implementation CVPR 2024 ZiCheng Zhang, Ruobing Zheng, Ziwen Liu, Congying Han, Tianqi Li, Meng Wang, Tiande Guo, Jingdong Chen, Bonan Li, Ming Yang

Recent works in implicit representations, such as Neural Radiance Fields (NeRF), have advanced the generation of realistic and animatable head avatars from video sequences.

NeRF

Anchor-free Clustering based on Anchor Graph Factorization

no code implementations24 Feb 2024 Shikun Mei, Fangfang Li, Quanxue Gao, Ming Yang

Additionally, we evolve the concept of the membership matrix between cluster centers and samples in FKM into an anchor graph encompassing multiple anchor points and samples.

Clustering

A Survey for Foundation Models in Autonomous Driving

no code implementations2 Feb 2024 Haoxiang Gao, Zhongruo Wang, Yaqian Li, Kaiwen Long, Ming Yang, Yiqing Shen

The advent of foundation models has revolutionized the fields of natural language processing and computer vision, paving the way for their application in autonomous driving (AD).

3D Object Detection Autonomous Driving +4

M2-RAAP: A Multi-Modal Recipe for Advancing Adaptation-based Pre-training towards Effective and Efficient Zero-shot Video-text Retrieval

1 code implementation31 Jan 2024 Xingning Dong, Zipeng Feng, Chunluan Zhou, Xuzheng Yu, Ming Yang, Qingpei Guo

We then summarize this empirical study into the M2-RAAP recipe, where our technical contributions lie in 1) the data filtering and text re-writing pipeline resulting in 1M high-quality bilingual video-text pairs, 2) the replacement of video inputs with key-frames to accelerate pre-training, and 3) the Auxiliary-Caption-Guided (ACG) strategy to enhance video features.

Text Retrieval Video-Text Retrieval

SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment

no code implementations4 Jan 2024 Ziping Ma, Furong Xu, Jian Liu, Ming Yang, Qingpei Guo

To achieve multimodal alignment from both global and local perspectives, this paper proposes Symmetrizing Contrastive Captioners (SyCoCa), which introduces bidirectional interactions on images and texts across the global and local representation levels.

Image Captioning image-classification +9

Evolutionary Alternating Direction Method of Multipliers for Constrained Multi-Objective Optimization with Unknown Constraints

no code implementations2 Jan 2024 Shuang Li, Ke Li, Wei Li, Ming Yang

Constrained multi-objective optimization problems (CMOPs) pervade real-world applications in science, engineering, and design.

Towards Better Vision-Inspired Vision-Language Models

no code implementations CVPR 2024 Yun-Hao Cao, Kaixiang Ji, Ziyuan Huang, Chuanyang Zheng, Jiajia Liu, Jian Wang, Jingdong Chen, Ming Yang

In this paper we present a vision-inspired vision-language connection module dubbed as VIVL which efficiently exploits the vision cue for VL models.

Tactics2D: A Highly Modular and Extensible Simulator for Driving Decision-making

2 code implementations18 Nov 2023 Yueyuan Li, Songan Zhang, Mingyang Jiang, Xingyuan Chen, Yeqiang Qian, Chunxiang Wang, Ming Yang

Simulation is a prospective method for generating diverse and realistic traffic scenarios to aid in the development of driving decision-making systems.

Autonomous Driving Decision Making +4

Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs

1 code implementation CVPR 2024 Shiyu Xuan, Qingpei Guo, Ming Yang, Shiliang Zhang

Specifically, we present a new method for constructing the instruction tuning dataset at a low cost by leveraging annotations in existing datasets.

Referring Expression

Dual-Modal Attention-Enhanced Text-Video Retrieval with Triplet Partial Margin Contrastive Learning

1 code implementation20 Sep 2023 Chen Jiang, Hong Liu, Xuzheng Yu, Qing Wang, Yuan Cheng, Jia Xu, Zhongyi Liu, Qingpei Guo, Wei Chu, Ming Yang, Yuan Qi

We thereby present a new Triplet Partial Margin Contrastive Learning (TPM-CL) module to construct partial order triplet samples by automatically generating fine-grained hard negatives for matched text-video pairs.

Contrastive Learning Retrieval +4

EVE: Efficient zero-shot text-based Video Editing with Depth Map Guidance and Temporal Consistency Constraints

no code implementations21 Aug 2023 Yutao Chen, Xingning Dong, Tian Gan, Chunluan Zhou, Ming Yang, Qingpei Guo

Compared with images, we conjecture that videos necessitate more constraints to preserve the temporal consistency during editing.

Video Editing

Stability and Generalization of Stochastic Compositional Gradient Descent Algorithms

no code implementations7 Jul 2023 Ming Yang, Xiyuan Wei, Tianbao Yang, Yiming Ying

Then, we establish the compositional uniform stability results for two popular stochastic compositional gradient descent algorithms, namely SCGD and SCSC.

Learning Theory Meta-Learning

Low-Light Image Enhancement by Learning Contrastive Representations in Spatial and Frequency Domains

no code implementations23 Mar 2023 Yi Huang, Xiaoguang Tu, Gui Fu, Tingting Liu, Bokai Liu, Ming Yang, Ziliang Feng

Images taken under low-light conditions tend to suffer from poor visibility, which can decrease image quality and even reduce the performance of the downstream tasks.

Contrastive Learning Low-Light Image Enhancement

Spatial Attention and Syntax Rule Enhanced Tree Decoder for Offine Handwritten Mathematical Expression Recognition

no code implementations13 Mar 2023 Zihao Lin, Jinrong Li, Fan Yang, Shuangping Huang, Xu Yang, Jianmin Lin, Ming Yang

In this paper, we propose a novel model called Spatial Attention and Syntax Rule Enhanced Tree Decoder (SS-TD), which is equipped with spatial attention mechanism to alleviate the prediction error of tree structure and use syntax masks (obtained from the transformation of syntax rules) to constrain the occurrence of ungrammatical mathematical expression.

Decoder

High-level semantic feature matters few-shot unsupervised domain adaptation

no code implementations5 Jan 2023 Lei Yu, Wanqi Yang, Shengqi Huang, Lei Wang, Ming Yang

However, the goal of FS-UDA and FSL are relevant yet distinct, since FS-UDA aims to classify the samples in target domain rather than source domain.

Few-Shot Learning Unsupervised Domain Adaptation +1

Efficient Generalization Improvement Guided by Random Weight Perturbation

1 code implementation21 Nov 2022 Tao Li, Weihao Yan, Zehao Lei, Yingwen Wu, Kun Fang, Ming Yang, Xiaolin Huang

To fully uncover the great potential of deep neural networks (DNNs), various learning algorithms have been developed to improve the model's generalization ability.

FedSiam-DA: Dual-aggregated Federated Learning via Siamese Network under Non-IID Data

no code implementations17 Nov 2022 Ming Yang, Yanhan Wang, Xin Wang, Zhenyong Zhang, Xiaoming Wu, Peng Cheng

Federated learning is a distributed learning that allows each client to keep the original data locally and only upload the parameters of the local model to the server.

Contrastive Learning Federated Learning

Tensor Robust PCA with Nonconvex and Nonlocal Regularization

1 code implementation4 Nov 2022 Xiaoyu Geng, Qiang Guo, Shuaixiong Hui, Ming Yang, Caiming Zhang

To this end, we integrate nonlocal self-similarity into N-TRPCA, and further develop a nonconvex and nonlocal TRPCA (NN-TRPCA) model.

Double-Ended Palindromic Trees: A Linear-Time Data Structure and Its Applications

no code implementations5 Oct 2022 Qisheng Wang, Ming Yang, Xinrui Zhu

eertree) is a linear-size data structure that provides access to all palindromic substrings of a string.

SUNet: Scale-aware Unified Network for Panoptic Segmentation

no code implementations7 Sep 2022 Weihao Yan, Yeqiang Qian, Chunxiang Wang, Ming Yang

Panoptic segmentation combines the advantages of semantic and instance segmentation, which can provide both pixel-level and instance-level environmental perception information for intelligent vehicles.

Instance Segmentation Panoptic Segmentation +1

Threshold-adaptive Unsupervised Focal Loss for Domain Adaptation of Semantic Segmentation

1 code implementation23 Aug 2022 Weihao Yan, Yeqiang Qian, Chunxiang Wang, Ming Yang

In stage one, we design a threshold-adaptative unsupervised focal loss to regularize the prediction in the target domain, which has a mild gradient neutralization mechanism and mitigates the problem that hard samples are barely optimized in entropy-based methods.

Data Augmentation Segmentation +2

BAANet: Learning Bi-directional Adaptive Attention Gates for Multispectral Pedestrian Detection

no code implementations4 Dec 2021 Xiaoxiao Yang, Yeqian Qiang, Huijie Zhu, Chunxiang Wang, Ming Yang

Thermal infrared (TIR) image has proven effectiveness in providing temperature cues to the RGB features for multispectral pedestrian detection.

Pedestrian Detection Specificity

Self-supervised Contrastive Attributed Graph Clustering

no code implementations15 Oct 2021 Wei Xia, Quanxue Gao, Ming Yang, Xinbo Gao

Thus, for the OOS nodes, SCAGC can directly calculate their clustering labels.

Attribute Clustering +3

Group-based Interleaved Pipeline Parallelism for Large-scale DNN Training

1 code implementation ICLR 2022 Pengcheng Yang, XiaoMing Zhang, Wenpeng Zhang, Ming Yang, Hong Wei

The recent trend of using large-scale deep neural networks (DNN) to boost performance has propelled the development of the parallel pipelining technique for efficient DNN training, which has resulted in the development of several prominent pipelines such as GPipe, PipeDream, and PipeDream-2BW.

Recall and Learn: A Memory-augmented Solver for Math Word Problems

1 code implementation Findings (EMNLP) 2021 Shifeng Huang, Jiawei Wang, Jiao Xu, Da Cao, Ming Yang

Specifically, given a math word problem, the model first retrieves similar questions by a memory module and then encodes the unsolved problem and each retrieved question using a representation module.

Math Math Word Problem Solving

Few-shot Unsupervised Domain Adaptation with Image-to-class Sparse Similarity Encoding

no code implementations6 Aug 2021 Shengqi Huang, Wanqi Yang, Lei Wang, Luping Zhou, Ming Yang

Inspired by the recent local descriptor based few-shot learning (FSL), our general UDA model is fully built upon local descriptors (LDs) for image classification and domain adaptation.

Few-Shot Learning image-classification +2

Momentum Accelerates the Convergence of Stochastic AUPRC Maximization

no code implementations2 Jul 2021 Guanghui Wang, Ming Yang, Lijun Zhang, Tianbao Yang

In this paper, we further improve the stochastic optimization of AURPC by (i) developing novel stochastic momentum methods with a better iteration complexity of $O(1/\epsilon^4)$ for finding an $\epsilon$-stationary solution; and (ii) designing a novel family of stochastic adaptive methods with the same iteration complexity, which enjoy faster convergence in practice.

imbalanced classification Stochastic Optimization

Back-tracing Representative Points for Voting-based 3D Object Detection in Point Clouds

1 code implementation CVPR 2021 Bowen Cheng, Lu Sheng, Shaoshuai Shi, Ming Yang, Dong Xu

Inspired by the back-tracing strategy in the conventional Hough voting methods, in this work, we introduce a new 3D object detection method, named as Back-tracing Representative Points Network (BRNet), which generatively back-traces the representative points from the vote centers and also revisits complementary seed points around these generated points, so as to better capture the fine local structural features surrounding the potential objects from the raw point clouds.

3D Object Detection Object +1

Evolved Massive Stars at Low-metallicity IV. Using 1.6 $μ$m "H-bump" to identify red supergiant stars: a case study of NGC 6822

no code implementations21 Jan 2021 Ming Yang, Alceste Z. Bonanos, Biwei Jiang, Man I Lam, Jian Gao, Panagiotis Gavras, Grigoris Maravelias, Shu Wang, Xiao-Dian Chen, Frank Tramper, Yi Ren, Zoi T. Spetsieri

Further separating RSG candidates from the rest of the LSG candidates is done by using semi-empirical criteria on NIR CMDs and resulted in 323 RSG candidates.

Solar and Stellar Astrophysics Astrophysics of Galaxies

Deep View Synthesis via Self-Consistent Generative Network

1 code implementation19 Jan 2021 Zhuoman Liu, Wei Jia, Ming Yang, Peiyao Luo, Yong Guo, Mingkui Tan

To address the above issues, in this paper, we propose a novel deep generative model, called Self-Consistent Generative Network (SCGN), which synthesizes novel views from the given input views without explicitly exploiting the geometric information.

Stacked Homography Transformations for Multi-View Pedestrian Detection

no code implementations ICCV 2021 Liangchen Song, Jialian Wu, Ming Yang, Qian Zhang, Yuan Li, Junsong Yuan

This task is confronted with two challenges: how to establish the 3D correspondences from views to the BEV map and how to assemble occupancy information across views.

Ranked #2 on Multiview Detection on CVCS (MODA (1m) metric)

Multiview Detection Pedestrian Detection

Probabilistic Latent Factor Model for Collaborative Filtering with Bayesian Inference

1 code implementation7 Dec 2020 Jiansheng Fang, Xiaoqing Zhang, Yan Hu, Yanwu Xu, Ming Yang, Jiang Liu

Latent Factor Model (LFM) is one of the most successful methods for Collaborative filtering (CF) in the recommendation system, in which both users and items are projected into a joint latent factor space.

Bayesian Inference Collaborative Filtering +1

MAFF-Net: Filter False Positive for 3D Vehicle Detection with Multi-modal Adaptive Feature Fusion

no code implementations23 Sep 2020 Zehan Zhang, Ming Zhang, Zhidong Liang, Xian Zhao, Ming Yang, Wenming Tan, ShiLiang Pu

Experimental results on the KITTI dataset demonstrate significant improvement in filtering false positive over the approach using only point cloud data.

Autonomous Driving vehicle detection

Class Distribution Alignment for Adversarial Domain Adaptation

no code implementations20 Apr 2020 Wanqi Yang, Tong Ling, Chengmei Yang, Lei Wang, Yinghuan Shi, Luping Zhou, Ming Yang

To address this issue, we propose a novel approach called Conditional ADversarial Image Translation (CADIT) to explicitly align the class distributions given samples between the two domains.

General Classification Translation +1

Map-Enhanced Ego-Lane Detection in the Missing Feature Scenarios

no code implementations2 Apr 2020 Xiaoliang Wang, Yeqiang Qian, Chunxiang Wang, Ming Yang

As one of the most important tasks in autonomous driving systems, ego-lane detection has been extensively studied and has achieved impressive results in many scenarios.

Autonomous Driving Lane Detection

Monocular Pedestrian Orientation Estimation Based on Deep 2D-3D Feedforward

1 code implementation24 Sep 2019 Chenchen Zhao, Yeqiang Qian, Ming Yang

The 2D and 3D dimensions of pedestrians are determined from the camera captures and further utilized through two feedforward links connected to the orientation estimator.

Autonomous Driving Collision Avoidance

SSAP: Single-Shot Instance Segmentation With Affinity Pyramid

2 code implementations ICCV 2019 Naiyu Gao, Yanhu Shan, Yupei Wang, Xin Zhao, Yinan Yu, Ming Yang, Kaiqi Huang

Moreover, incorporating with the learned affinity pyramid, a novel cascaded graph partition module is presented to sequentially generate instances from coarse to fine.

Instance Segmentation Segmentation +1

Attention Guided Network for Retinal Image Segmentation

2 code implementations25 Jul 2019 Shihao Zhang, Huazhu Fu, Yuguang Yan, Yubing Zhang, Qingyao Wu, Ming Yang, Mingkui Tan, Yanwu Xu

Learning structural information is critical for producing an ideal result in retinal image segmentation.

Image Segmentation Segmentation +1

RFBNet: Deep Multimodal Networks with Residual Fusion Blocks for RGB-D Semantic Segmentation

no code implementations29 Jun 2019 Liuyuan Deng, Ming Yang, Tianyi Li, Yuesheng He, Chunxiang Wang

To instantiate this structure, the paper proposes a residual fusion block (RFB) to formulate the interdependences of the encoders.

Semantic Segmentation

Resolution-invariant Person Re-Identification

1 code implementation24 Jun 2019 Shunan Mao, Shiliang Zhang, Ming Yang

RIFE adopts two feature extraction streams weighted by a dual-attention block to learn features for low and high resolution images, respectively.

Person Re-Identification Super-Resolution

A Novel Demodulation and Estimation Algorithm for Blackout Communication: Extract Principal Components with Deep Learning

no code implementations27 May 2019 Haoyan Liu, Yanming Liu, Ming Yang, Xiaoping Li

For reentry or near space communication, owing to the influence of the time-varying plasma sheath channel environment, the received IQ baseband signals are severely rotated on the constellation.

Bi-Directional Cascade Network for Perceptual Edge Detection

2 code implementations CVPR 2019 Jianzhong He, Shiliang Zhang, Ming Yang, Yanhu Shan, Tiejun Huang

Exploiting multi-scale representations is critical to improve edge detection for objects at different scales.

Edge Detection

The Period-Luminosity Relations of Red Supergiants in M33 and M31

no code implementations20 Feb 2019 Yi Ren, B. W. Jiang, Ming Yang, Jian Gao

The period-luminosity (P-L) relation is analyzed for the RSGs in the fundamental mode.

Solar and Stellar Astrophysics Astrophysics of Galaxies

Generating Synthesized Computed Tomography (CT) from Cone-Beam Computed Tomography (CBCT) using CycleGAN for Adaptive Radiation Therapy

no code implementations31 Oct 2018 Xiao Liang, Liyuan Chen, Dan Nguyen, Zhiguo Zhou, Xuejun Gu, Ming Yang, Jing Wang, Steve Jiang

Dose calculation accuracy using sCT images has been improved over the original CBCT images, with the average Gamma Index passing rate increased from 95. 4% to 97. 4% for 1 mm/1% criteria.

Medical Physics

Deep Reinforcement Learning with Iterative Shift for Visual Tracking

no code implementations ECCV 2018 Liangliang Ren, Xin Yuan, Jiwen Lu, Ming Yang, Jie Zhou

Visual tracking is confronted by the dilemma to locate a target both}accurately and efficiently, and make decisions online whether and how to adapt the appearance model or even restart tracking.

Deep Reinforcement Learning Motion Estimation +5

Instance-level Human Parsing via Part Grouping Network

1 code implementation ECCV 2018 Ke Gong, Xiaodan Liang, Yicheng Li, Yimin Chen, Ming Yang, Liang Lin

Instance-level human parsing towards real-world human analysis scenarios is still under-explored due to the absence of sufficient data resources and technical difficulty in parsing multiple instances in a single pass.

Edge Detection Human Parsing +2

BSN: Boundary Sensitive Network for Temporal Action Proposal Generation

17 code implementations ECCV 2018 Tianwei Lin, Xu Zhao, Haisheng Su, Chongjing Wang, Ming Yang

Temporal action proposal generation is an important yet challenging problem, since temporal proposals with rich action content are indispensable for analysing real-world videos with long duration and high proportion irrelevant content.

Action Detection Temporal Action Proposal Generation

Conditional Generative Adversarial Network for Structured Domain Adaptation

no code implementations CVPR 2018 Weixiang Hong, Zhenzhen Wang, Ming Yang, Junsong Yuan

In recent years, deep neural nets have triumphed over many computer vision problems, including semantic segmentation, which is a critical task in emerging autonomous driving and medical image diagnostics applications.

Autonomous Driving Domain Adaptation +2

Restricted Deformable Convolution based Road Scene Semantic Segmentation Using Surround View Cameras

no code implementations2 Jan 2018 Liuyuan Deng, Ming Yang, Hao Li, Tianyi Li, Bing Hu, Chunxiang Wang

Finally, an RDC based semantic segmentation model is built; the model is trained for real-world surround view images through a multi-task learning architecture by combining real-world images with transformed images.

Autonomous Driving Multi-Task Learning +2

Joint Calibration of Panoramic Camera and Lidar Based on Supervised Learning

no code implementations9 Sep 2017 Mingwei Cao, Ming Yang, Chunxiang Wang, Yeqiang Qian, Bing Wang

In view of contemporary panoramic camera-laser scanner system, the traditional calibration method is not suitable for panoramic cameras whose imaging model is extremely nonlinear.

Translation

A Multi-model Combination Approach for Probabilistic Wind Power Forecasting

no code implementations13 Feb 2017 You Lin, Ming Yang, Can Wan, Jianhui Wang, Yonghua Song

Therefore, a novel multi-model combination (MMC) approach for short-term probabilistic wind generation forecasting is proposed in this paper to exploit the advantages of different forecasting models.

Density Estimation

A Survey of Multi-View Representation Learning

no code implementations3 Oct 2016 Yingming Li, Ming Yang, Zhongfei Zhang

Consequently, we first review the representative methods and theories of multi-view representation learning based on the perspective of alignment, such as correlation-based alignment.

Representation Learning Survey

Top-N Recommendation on Graphs

1 code implementation27 Sep 2016 Zhao Kang, Chong Peng, Ming Yang, Qiang Cheng

To alleviate this problem, this paper proposes a simple recommendation algorithm that fully exploits the similarity information among users and items and intrinsic structural information of the user-item matrix.

Collaborative Filtering Recommendation Systems

Compressing Deep Convolutional Networks using Vector Quantization

no code implementations18 Dec 2014 Yunchao Gong, Liu Liu, Ming Yang, Lubomir Bourdev

In this paper, we tackle this model storage issue by investigating information theoretical vector quantization methods for compressing the parameters of CNNs.

Classification Clustering +7

Web-Scale Training for Face Identification

no code implementations CVPR 2015 Yaniv Taigman, Ming Yang, Marc'Aurelio Ranzato, Lior Wolf

Scaling machine learning methods to very large datasets has attracted considerable attention in recent years, thanks to easy access to ubiquitous sensing and data from the web.

Face Identification Face Recognition +1

Cannot find the paper you are looking for? You can Submit a new open access paper.