no code implementations • 27 May 2025 • Muzhi Zhu, Hao Zhong, Canyu Zhao, Zongze Du, Zheng Huang, MingYu Liu, Hao Chen, Cheng Zou, Jingdong Chen, Ming Yang, Chunhua Shen
However, despite the importance of active perception in embodied intelligence, there is little to no exploration of how MLLMs can be equipped with or learn active perception capabilities.
no code implementations • 26 May 2025 • Chen Sang, Yeqiang Qian, Jiale Zhang, Chunxiang Wang, Ming Yang
For tasks such as urban digital twins, VR/AR/game scene design, or creating synthetic films, the traditional industrial approach often involves manually modeling scenes and using various rendering engines to complete the rendering process.
1 code implementation • 26 May 2025 • Shi-Yu Tian, Zhi Zhou, Wei Dong, Ming Yang, Kun-Yang Yu, Zi-Jian Cheng, Lan-Zhe Guo, Yu-Feng Li
Reasoning with tabular data holds increasing importance in modern applications, yet comprehensive evaluation methodologies for reasoning-intensive Table Question Answering (QA) tasks remain nascent.
no code implementations • 22 May 2025 • Ming Yang, Haoran Li
We present GMatch, a learning-free feature matcher designed for robust 6DoF object pose estimation, addressing common local ambiguities in sparse feature matching.
no code implementations • 14 May 2025 • Xiaoyang Yu, Xiaoming Wu, Xin Wang, Dongrun Li, Ming Yang, Peng Cheng
To overcome this challenge, we propose a novel federated segmentation framework that strikes class consistency, termed FedSaaS.
no code implementations • 10 May 2025 • Ziluo Ding, Haobin Jiang, Yuxuan Wang, Zhenguo Sun, Yu Zhang, Xiaojie Niu, Ming Yang, Weishuai Zeng, Xinrun Xu, Zongqing Lu
This paper presents JAEGER, a dual-level whole-body controller for humanoid robots that addresses the challenges of training a more robust and versatile policy.
no code implementations • 25 Apr 2025 • Yabing Wang, Zhuotao Tian, Qingpei Guo, Zheng Qin, Sanping Zhou, Ming Yang, Le Wang
In the second stage, we optimize the text encoder using a small amount of synthetic triplet data, enabling it to effectively extract compositional semantics by combining pseudo-word tokens with modification text for accurate target image retrieval.
no code implementations • 21 Apr 2025 • Ming Yang, Gang Li, Quanqi Hu, Qihang Lin, Tianbao Yang
Constrained optimization with multiple functional inequality constraints has significant applications in machine learning.
no code implementations • 28 Mar 2025 • Juwei Guan, Xiaolin Fang, Donghyun Kim, Haotian Gong, Tongxin Zhu, Zhen Ling, Ming Yang
Low-quality data often suffer from insufficient image details, introducing an extra implicit aspect of camouflage that complicates camouflaged object detection (COD).
no code implementations • 17 Mar 2025 • Liewen Liao, Weihao Yan, Ming Yang, Songan Zhang
Learning-based 3D reconstruction has emerged as a transformative technique in autonomous driving, enabling precise modeling of both dynamic and static environments through advanced neural representations.
1 code implementation • CVPR 2025 • Muzhi Zhu, Yuzhuo Tian, Hao Chen, Chunluan Zhou, Qingpei Guo, Yang Liu, Ming Yang, Chunhua Shen
While MLLMs have demonstrated adequate image understanding capabilities, they still struggle with pixel-level comprehension, limiting their practical applications.
no code implementations • 10 Mar 2025 • Zheng Qin, Ruobing Zheng, Yabing Wang, Tianqi Li, Zixin Zhu, Sanping Zhou, Ming Yang, Le Wang
AI-generated content faces similar requirements, where users not only need automatic generation of lip synchronization and basic gestures from audio input but also desire semantically accurate and expressive body movement that can be ``directly guided'' through text descriptions.
no code implementations • 26 Feb 2025 • Qingpei Guo, Kaiyou Song, Zipeng Feng, Ziping Ma, Qinglong Zhang, Sirui Gao, Xuzheng Yu, Yunxiao Sun, Tai-WeiChang, Jingdong Chen, Ming Yang, Jun Zhou
We present M2-omni, a cutting-edge, open-source omni-MLLM that achieves competitive performance to GPT-4o.
no code implementations • 23 Feb 2025 • Xiaofeng Han, Xiaochen Chu, Tao Chao, Ming Yang, Miqing Li
ATM-MOEA/D uses an archive to gradually approximate the shape of the Pareto front during the search.
no code implementations • 21 Feb 2025 • Yue Sun, Yeqiang Qian, Chunxiang Wang, Ming Yang
Safety and reliability are crucial for the public acceptance of autonomous driving.
no code implementations • 14 Jan 2025 • Song-Lin Lv, Yu-Yang Chen, Zhi Zhou, Ming Yang, Lan-Zhe Guo
Vision-language models (VLMs) have exhibited remarkable generalization capabilities, and prompt learning for VLMs has attracted great attention for the ability to adapt pre-trained VLMs to specific downstream tasks.
no code implementations • CVPR 2025 • Qi Zhu, Jiangwei Lao, Deyi Ji, Junwei Luo, Kang Wu, Yingying Zhang, Lixiang Ru, Jian Wang, Jingdong Chen, Ming Yang, Dong Liu, Feng Zhao
Open-world interpretation aims to accurately localize and recognize all objects within images by vision-language models (VLMs).
no code implementations • CVPR 2025 • Haina Qin, Wenyang Luo, Libin Wang, Dandan Zheng, Jingdong Chen, Ming Yang, Bing Li, Weiming Hu
Image restoration aims to recover high-quality (HQ) images from degraded low-quality (LQ) ones by reversing the effects of degradation.
no code implementations • 26 Dec 2024 • Yabing Wang, Zhuotao Tian, Qingpei Guo, Zheng Qin, Sanping Zhou, Ming Yang, Le Wang
It consists of the query adaption module that can be seamlessly integrated into CLIP and generate the referential query to provide the prior context for decoder, along with a task-specific decoder.
no code implementations • 25 Dec 2024 • Qiong Wu, Panwang Xia, Lei Yu, Yi Liu, Mingtao Xiong, Liheng Zhong, Jingdong Chen, Ming Yang, Yongjun Zhang, Yi Wan
Therefore, we propose a novel task: Cross-View Image Set Geo-Localization (Set-CVGL), which gathers multiple images with diverse perspectives as a query set for localization.
no code implementations • 18 Dec 2024 • Pei Chen, Fudong Wang, Yixuan Tong, Jingdong Chen, Ming Yang, Minghui Yang
Recently, the surge of efficient and automated 3D AI-generated content (AIGC) methods has increasingly illuminated the path of transforming human imagination into complex 3D structures.
2 code implementations • 16 Dec 2024 • Panwang Xia, Lei Yu, Yi Wan, Qiong Wu, Peiqi Chen, Liheng Zhong, Yongxiang Yao, Dong Wei, Xinyi Liu, Lixiang Ru, Yingying Zhang, Jiangwei Lao, Jingdong Chen, Ming Yang, Yongjun Zhang
To address this limitation, we introduce DReSS (Decentrality Related Street-view and Satellite-view dataset), a novel dataset designed to evaluate cross-view geo-localization with a large geographic scope and diverse landscapes, emphasizing the decentrality issue.
no code implementations • 16 Dec 2024 • Xiaochong Dong, Xuemin Zhang, Ming Yang, Shengwei Mei
This model uses a hypergraph structure to represent spatial features among wind farms.
no code implementations • CVPR 2025 • Shuwei Shi, Biao Gong, Xi Chen, Dandan Zheng, Shuai Tan, Zizheng Yang, Yuyuan Li, Jingwen He, Kecheng Zheng, Jingdong Chen, Ming Yang, Yinqiang Zheng
We then present a new I2V model, named MotionStone, developed with the decoupled motion estimator.
no code implementations • CVPR 2025 • Shuai Tan, Biao Gong, Yutong Feng, Kecheng Zheng, Dandan Zheng, Shuwei Shi, Yujun Shen, Jingdong Chen, Ming Yang
Text serves as the key control signal in video generation due to its narrative nature.
no code implementations • 2 Dec 2024 • Qiyuan Shen, Hengwang Zhao, Weihao Yan, Chunxiang Wang, Tong Qin, Ming Yang
In this paper, we propose a cross-modal visual relocalization system in prior LiDAR maps utilizing intensity textures, which consists of three main modules: map projection, coarse retrieval, and fine relocalization.
no code implementations • 29 Nov 2024 • Tianqi Li, Ruobing Zheng, Bonan Li, ZiCheng Zhang, Meng Wang, Jingdong Chen, Ming Yang
Despite significant progress in talking head synthesis since the introduction of Neural Radiance Fields (NeRF), visual artifacts and high training costs persist as major obstacles to large-scale commercial adoption.
2 code implementations • 29 Nov 2024 • Tianqi Li, Ruobing Zheng, Minghui Yang, Jingdong Chen, Ming Yang
Recent advances in diffusion models have endowed talking head synthesis with subtle expressions and vivid head movements, but have also led to slow inference speed and insufficient control over generated results.
no code implementations • 21 Nov 2024 • Honglin Li, Yuting Gao, Chenglu Zhu, Jingdong Chen, Ming Yang, Lin Yang
Multimodal large language models (MLLMs) are closing the gap to human visual perception capability rapidly, while, still lag behind on attending to subtle images details or locating small objects precisely, etc.
no code implementations • CVPR 2025 • Yudong Han, Qingpei Guo, Liyuan Pan, Liu Liu, Yu Guan, Ming Yang
This suggests the possibility of adopting dynamic encoding to balance detailed video information preservation with token budget reduction.
no code implementations • 15 Nov 2024 • Hanzhong Guo, Jianfeng Zhang, Cheng Zou, Jun Li, Meng Wang, Ruxue Wen, Pingzhong Tang, Jingdong Chen, Ming Yang
A key challenge of try-on is to generate realistic images of the model wearing the garments while preserving the details of the garments.
no code implementations • 11 Nov 2024 • Xiaolong Wang, Lei Yu, Yingying Zhang, Jiangwei Lao, Lixiang Ru, Liheng Zhong, Jingdong Chen, Yu Zhang, Ming Yang
To address this limitation, this paper concentrates on enhancing the fine-matching module in the semi-dense matching framework.
no code implementations • 30 Oct 2024 • Yuxin Zhang, Dandan Zheng, Biao Gong, Jingdong Chen, Ming Yang, WeiMing Dong, Changsheng Xu
Lighting plays a pivotal role in ensuring the naturalness of video generation, significantly influencing the aesthetic quality of the generated content.
2 code implementations • 21 Oct 2024 • Kamal Al-Sabahi, Kang Yang, Wangwang Liu, Guanyu Jiang, Xian Li, Ming Yang
For this reason, the attention has been shifted to the non-autoregressive or sequence tagging models.
no code implementations • 16 Oct 2024 • Yuyang Chen, Kaiyan Zhao, Yiming Wang, Ming Yang, Jian Zhang, Xiaoguang Niu
P2Value comprehensively considers the possibility of transformers' output and pass rate and can make use of the redundant resources caused by the problem that most programs collected by LLMs fail to pass any tests.
no code implementations • 14 Oct 2024 • Shuai Tan, Biao Gong, Xiang Wang, Shiwei Zhang, Dandan Zheng, Ruobing Zheng, Kecheng Zheng, Jingdong Chen, Ming Yang
Our in-depth analysis suggests to attribute this limitation to their insufficient modeling of motion, which is unable to comprehend the movement pattern of the driving video, thus imposing a pose sequence rigidly onto the target character.
no code implementations • 3 Oct 2024 • Yueyuan Li, Mingyang Jiang, Songan Zhang, Wei Yuan, Chunxiang Wang, Ming Yang
Dynamic and interactive traffic scenarios pose significant challenges for autonomous driving systems.
no code implementations • 23 Sep 2024 • Yuyan Chen, Yiwen Qian, Songzhou Yan, Jiyuan Jia, Zhixu Li, Yanghua Xiao, Xiaobo Li, Ming Yang, Qingpei Guo
In the era of social media video platforms, popular ``hot-comments'' play a crucial role in attracting user impressions of short-form videos, making them vital for marketing and branding purpose.
1 code implementation • 4 Sep 2024 • Wen Li, Muyuan Fang, Cheng Zou, Biao Gong, Ruobing Zheng, Meng Wang, Jingdong Chen, Ming Yang
To tackle these challenges, we introduce StyleTokenizer, a zero-shot style control image generation method that aligns style representation with text representation using a style tokenizer.
no code implementations • 13 Aug 2024 • Harry Cheng, Yangyang Guo, Qingpei Guo, Ming Yang, Tian Gan, Liqiang Nie
Multi-modal Large Language Models (MLLMs) have advanced significantly, offering powerful vision-language understanding capabilities.
no code implementations • 11 Aug 2024 • Zhirui Fang, Ming Yang, Weishuai Zeng, Boyu Li, Junpeng Yue, Ziluo Ding, Xiu Li, Zongqing Lu
LMMs excel in planning long-horizon tasks over symbolic abstractions but struggle with grounding in the physical world, often failing to accurately identify object positions in images.
1 code implementation • 4 Aug 2024 • Changze Li, Ziheng Ji, Zhe Chen, Tong Qin, Ming Yang
Real-vehicle experiments further validate the feasibility and effectiveness of the method proposed in this paper.
1 code implementation • 2 Aug 2024 • Yingying Zhang, Xin Guo, Jiangwei Lao, Lei Yu, Lixiang Ru, Jian Wang, Guo Ye, Huimei He, Jingdong Chen, Ming Yang
Once pre-trained, POA allows the extraction of pre-trained models of diverse sizes for downstream tasks.
no code implementations • 22 Jul 2024 • Ziyuan Huang, Kaixiang Ji, Biao Gong, Zhiwu Qing, Qinglong Zhang, Kecheng Zheng, Jian Wang, Jingdong Chen, Ming Yang
This paper introduces Chain-of-Sight, a vision-language bridge module that accelerates the pre-training of Multimodal Large Language Models (MLLMs).
no code implementations • 11 Jul 2024 • Hang Wu, Zhenghao Zhang, Siyuan Lin, Xiangru Mu, Qiang Zhao, Ming Yang, Tong Qin
Robust localization is the cornerstone of autonomous driving, especially in challenging urban environments where GPS signals suffer from multipath errors.
no code implementations • 11 Jul 2024 • Hang Wu, Zhenghao Zhang, Siyuan Lin, Tong Qin, Jin Pan, Qiang Zhao, Chunjing Xu, Ming Yang
In this paper, we propose BLOS-BEV, a novel BEV segmentation model that incorporates SD maps for accurate beyond line-of-sight perception, up to 200m.
no code implementations • 4 Jul 2024 • Xuerong Zhang, Li Huang, Jing Lv, Ming Yang
Semi-supervised learning is attracting blooming attention, due to its success in combining unlabeled data.
no code implementations • 2 Jul 2024 • Yuquan Xie, Wanqi Yang, Jinyu Wei, Ming Yang, Yang Gao
To address this issue, we propose a domain generalization approach for knowledge tracing, where existing education systems are considered source domains, and new education systems with limited data are considered target domains.
no code implementations • 2 Jul 2024 • Shulei Qiu, Wanqi Yang, Ming Yang
In HFRP, we fuse the channel features and the spatial features.
no code implementations • 1 Jul 2024 • Like Xin, Wanqi Yang, Lei Wang, Ming Yang
In cross-view learning, reliable view guidance enhances the confidence of the cluster structures in other views.
no code implementations • 1 Jul 2024 • Hanwen Su, Ge Song, Kai Huang, Jiyan Wang, Ming Yang
In this paper, we study the problem of zero-shot sketch-based image retrieval (ZS-SBIR).
no code implementations • 24 Jun 2024 • Tong Qin, Changze Li, Haoyang Ye, Shaowei Wan, Minzhen Li, Hongwei Liu, Ming Yang
This approach solves the key problem of large-scale reconstruction, that is where the data comes from and how to use them.
1 code implementation • 17 Jun 2024 • Weihao Yan, Yeqiang Qian, Yueyuan Li, Tao Li, Chunxiang Wang, Ming Yang
In this paper, we propose a novel semi-supervised active domain adaptation (SS-ADA) framework for semantic segmentation that employs an image-level acquisition strategy.
no code implementations • 6 Jun 2024 • Jixiang Wan, Xudong Zhang, Shuzhou Dong, Yuwei Zhang, Yuchen Yang, Ruoxi Wu, Ye Jiang, Jijunnan Li, Jinquan Lin, Ming Yang
To balance efficiency and accuracy, we propose a novel lightweight visual semantic localization algorithm that employs stable semantic features instead of low-level texture features.
1 code implementation • 1 Jun 2024 • Zhi Zhou, Ming Yang, Jiang-Xin Shi, Lan-Zhe Guo, Yu-Feng Li
In this paper, we explore a problem setting called Open-world Prompt Tuning (OPT), which involves tuning prompts on base classes and evaluating on a combination of base and new classes.
2 code implementations • 31 May 2024 • Mingyang Jiang, Yueyuan Li, Songan Zhang, Siyuan Chen, Chunxiang Wang, Ming Yang
This novel solution integrates a reinforcement learning agent with Reeds-Shepp curves, enabling effective planning across diverse scenarios.
no code implementations • 25 May 2024 • Zhenzhong Wang, Zehui Lin, WanYu Lin, Ming Yang, Minggang Zeng, Kay Chen Tan
Providing explainable molecular property predictions is critical for many scientific domains, such as drug discovery and material science.
1 code implementation • 21 May 2024 • Zizhao Chen, Yeqiang Qian, Xiaoxiao Yang, Chunxiang Wang, Ming Yang
This increased inference time has hindered the widespread employment of multispectral pedestrian detection in embedded devices for autonomous systems.
no code implementations • 27 Apr 2024 • Like Xin, Wanqi Yang, Lei Wang, Ming Yang
We assume that the view with a good cluster structure is the reliable view, which acts as a supervisor to guide the clustering of the other views.
no code implementations • 22 Apr 2024 • Xuzheng Yu, Chen Jiang, Xingning Dong, Tian Gan, Ming Yang, Qingpei Guo
In particular, text-video retrieval, which aims to find the top matching videos given text descriptions from a vast video corpus, is an essential function, the primary challenge of which is to bridge the modality gap.
1 code implementation • Expert Systems with Applications 2024 • Qian Zhang, Yi Zhu, Ming Yang, Ge Jin, YingWen Zhu, Qiu Chen
Although sample selection is a mainstream method in the field of learning with noisy labels, which aims to mitigate the impact of noisy labels during model training, the testing performance of these methods exhibits significant fluctuations across different noise rates and types.
Ranked #3 on
Learning with noisy labels
on Clothing1M
no code implementations • 17 Mar 2024 • Kangyang Xie, BinBin Yang, Hao Chen, Meng Wang, Cheng Zou, Hui Xue, Ming Yang, Chunhua Shen
Beyond the superiority of the text-to-image diffusion model in generating high-quality images, recent studies have attempted to uncover its potential for adapting the learned semantic knowledge to visual perception tasks.
no code implementations • 3 Mar 2024 • Wenhui Zhao, Quanxue Gao, Guangfei Li, Cheng Deng, Ming Yang
Despite their successes, current methods lack interpretability in the clustering process and do not sufficiently consider the complementary information across different views.
1 code implementation • CVPR 2024 • ZiCheng Zhang, Ruobing Zheng, Ziwen Liu, Congying Han, Tianqi Li, Meng Wang, Tiande Guo, Jingdong Chen, Bonan Li, Ming Yang
Recent works in implicit representations, such as Neural Radiance Fields (NeRF), have advanced the generation of realistic and animatable head avatars from video sequences.
no code implementations • 24 Feb 2024 • Shikun Mei, Fangfang Li, Quanxue Gao, Ming Yang
Additionally, we evolve the concept of the membership matrix between cluster centers and samples in FKM into an anchor graph encompassing multiple anchor points and samples.
no code implementations • 2 Feb 2024 • Haoxiang Gao, Zhongruo Wang, Yaqian Li, Kaiwen Long, Ming Yang, Yiqing Shen
The advent of foundation models has revolutionized the fields of natural language processing and computer vision, paving the way for their application in autonomous driving (AD).
1 code implementation • 31 Jan 2024 • Xingning Dong, Zipeng Feng, Chunluan Zhou, Xuzheng Yu, Ming Yang, Qingpei Guo
We then summarize this empirical study into the M2-RAAP recipe, where our technical contributions lie in 1) the data filtering and text re-writing pipeline resulting in 1M high-quality bilingual video-text pairs, 2) the replacement of video inputs with key-frames to accelerate pre-training, and 3) the Auxiliary-Caption-Guided (ACG) strategy to enhance video features.
1 code implementation • 29 Jan 2024 • Qingpei Guo, Furong Xu, Hanxiao Zhang, Wang Ren, Ziping Ma, Lin Ju, Jian Wang, Jingdong Chen, Ming Yang
Vision-language foundation models like CLIP have revolutionized the field of artificial intelligence.
Ranked #1 on
Zero-Shot Transfer Image Classification
on ImageNet
(using extra training data)
Zero-Shot Cross-Modal Retrieval
Zero-shot Image Retrieval
+3
no code implementations • 4 Jan 2024 • Ziping Ma, Furong Xu, Jian Liu, Ming Yang, Qingpei Guo
To achieve multimodal alignment from both global and local perspectives, this paper proposes Symmetrizing Contrastive Captioners (SyCoCa), which introduces bidirectional interactions on images and texts across the global and local representation levels.
no code implementations • 2 Jan 2024 • Shuang Li, Ke Li, Wei Li, Ming Yang
Constrained multi-objective optimization problems (CMOPs) pervade real-world applications in science, engineering, and design.
no code implementations • CVPR 2024 • Yun-Hao Cao, Kaixiang Ji, Ziyuan Huang, Chuanyang Zheng, Jiajia Liu, Jian Wang, Jingdong Chen, Ming Yang
In this paper we present a vision-inspired vision-language connection module dubbed as VIVL which efficiently exploits the vision cue for VL models.
1 code implementation • CVPR 2024 • Xin Guo, Jiangwei Lao, Bo Dang, Yingying Zhang, Lei Yu, Lixiang Ru, Liheng Zhong, Ziyuan Huang, Kang Wu, Dingxiang Hu, Huimei He, Jian Wang, Jingdong Chen, Ming Yang, Yongjun Zhang, Yansheng Li
Prior studies on Remote Sensing Foundation Model (RSFM) reveal immense potential towards a generic model for Earth Observation.
Ranked #1 on
Zero-shot Classification (unified classes)
on AID
1 code implementation • 22 Nov 2023 • Weihao Yan, Yeqiang Qian, Xingyuan Chen, Hanyang Zhuang, Chunxiang Wang, Ming Yang
It involves Semantic-Guided Mask Labeling, which assigns semantic labels to unlabeled SAM masks using UDA pseudo-labels.
no code implementations • 18 Nov 2023 • Yueyuan Li, Wei Yuan, Songan Zhang, Weihao Yan, Qiyuan Shen, Chunxiang Wang, Ming Yang
Simulators play a crucial role in autonomous driving, offering significant time, cost, and labor savings.
2 code implementations • 18 Nov 2023 • Yueyuan Li, Songan Zhang, Mingyang Jiang, Xingyuan Chen, Yeqiang Qian, Chunxiang Wang, Ming Yang
Simulation is a prospective method for generating diverse and realistic traffic scenarios to aid in the development of driving decision-making systems.
1 code implementation • CVPR 2024 • Shiyu Xuan, Qingpei Guo, Ming Yang, Shiliang Zhang
Specifically, we present a new method for constructing the instruction tuning dataset at a low cost by leveraging annotations in existing datasets.
1 code implementation • 20 Sep 2023 • Chen Jiang, Hong Liu, Xuzheng Yu, Qing Wang, Yuan Cheng, Jia Xu, Zhongyi Liu, Qingpei Guo, Wei Chu, Ming Yang, Yuan Qi
We thereby present a new Triplet Partial Margin Contrastive Learning (TPM-CL) module to construct partial order triplet samples by automatically generating fine-grained hard negatives for matched text-video pairs.
Ranked #4 on
Video Retrieval
on MSR-VTT-1kA
no code implementations • 21 Aug 2023 • Yutao Chen, Xingning Dong, Tian Gan, Chunluan Zhou, Ming Yang, Qingpei Guo
Compared with images, we conjecture that videos necessitate more constraints to preserve the temporal consistency during editing.
no code implementations • 7 Jul 2023 • Ming Yang, Xiyuan Wei, Tianbao Yang, Yiming Ying
Then, we establish the compositional uniform stability results for two popular stochastic compositional gradient descent algorithms, namely SCGD and SCSC.
no code implementations • 23 Mar 2023 • Yi Huang, Xiaoguang Tu, Gui Fu, Tingting Liu, Bokai Liu, Ming Yang, Ziliang Feng
Images taken under low-light conditions tend to suffer from poor visibility, which can decrease image quality and even reduce the performance of the downstream tasks.
no code implementations • 21 Mar 2023 • Xiangchen Cheng, Wei Tang, Ming Yang, Li Jin
Signal-free intersections are a representative application of smart and connected vehicle technologies.
no code implementations • 13 Mar 2023 • Zihao Lin, Jinrong Li, Fan Yang, Shuangping Huang, Xu Yang, Jianmin Lin, Ming Yang
In this paper, we propose a novel model called Spatial Attention and Syntax Rule Enhanced Tree Decoder (SS-TD), which is equipped with spatial attention mechanism to alleviate the prediction error of tree structure and use syntax masks (obtained from the transformation of syntax rules) to constrain the occurrence of ungrammatical mathematical expression.
no code implementations • 5 Jan 2023 • Lei Yu, Wanqi Yang, Shengqi Huang, Lei Wang, Ming Yang
However, the goal of FS-UDA and FSL are relevant yet distinct, since FS-UDA aims to classify the samples in target domain rather than source domain.
1 code implementation • 21 Nov 2022 • Tao Li, Weihao Yan, Zehao Lei, Yingwen Wu, Kun Fang, Ming Yang, Xiaolin Huang
To fully uncover the great potential of deep neural networks (DNNs), various learning algorithms have been developed to improve the model's generalization ability.
no code implementations • 17 Nov 2022 • Ming Yang, Yanhan Wang, Xin Wang, Zhenyong Zhang, Xiaoming Wu, Peng Cheng
Federated learning is a distributed learning that allows each client to keep the original data locally and only upload the parameters of the local model to the server.
1 code implementation • 4 Nov 2022 • Xiaoyu Geng, Qiang Guo, Shuaixiong Hui, Ming Yang, Caiming Zhang
To this end, we integrate nonlocal self-similarity into N-TRPCA, and further develop a nonconvex and nonlocal TRPCA (NN-TRPCA) model.
no code implementations • 5 Oct 2022 • Qisheng Wang, Ming Yang, Xinrui Zhu
eertree) is a linear-size data structure that provides access to all palindromic substrings of a string.
no code implementations • 7 Sep 2022 • Weihao Yan, Yeqiang Qian, Chunxiang Wang, Ming Yang
Panoptic segmentation combines the advantages of semantic and instance segmentation, which can provide both pixel-level and instance-level environmental perception information for intelligent vehicles.
1 code implementation • 23 Aug 2022 • Weihao Yan, Yeqiang Qian, Chunxiang Wang, Ming Yang
In stage one, we design a threshold-adaptative unsupervised focal loss to regularize the prediction in the target domain, which has a mild gradient neutralization mechanism and mitigates the problem that hard samples are barely optimized in entropy-based methods.
no code implementations • 4 Dec 2021 • Xiaoxiao Yang, Yeqian Qiang, Huijie Zhu, Chunxiang Wang, Ming Yang
Thermal infrared (TIR) image has proven effectiveness in providing temperature cues to the RGB features for multispectral pedestrian detection.
no code implementations • 15 Oct 2021 • Wei Xia, Quanxue Gao, Ming Yang, Xinbo Gao
Thus, for the OOS nodes, SCAGC can directly calculate their clustering labels.
1 code implementation • ICLR 2022 • Pengcheng Yang, XiaoMing Zhang, Wenpeng Zhang, Ming Yang, Hong Wei
The recent trend of using large-scale deep neural networks (DNN) to boost performance has propelled the development of the parallel pipelining technique for efficient DNN training, which has resulted in the development of several prominent pipelines such as GPipe, PipeDream, and PipeDream-2BW.
1 code implementation • Findings (EMNLP) 2021 • Shifeng Huang, Jiawei Wang, Jiao Xu, Da Cao, Ming Yang
Specifically, given a math word problem, the model first retrieves similar questions by a memory module and then encodes the unsolved problem and each retrieved question using a representation module.
Ranked #8 on
Math Word Problem Solving
on Math23K
no code implementations • 6 Aug 2021 • Shengqi Huang, Wanqi Yang, Lei Wang, Luping Zhou, Ming Yang
Inspired by the recent local descriptor based few-shot learning (FSL), our general UDA model is fully built upon local descriptors (LDs) for image classification and domain adaptation.
no code implementations • 2 Jul 2021 • Guanghui Wang, Ming Yang, Lijun Zhang, Tianbao Yang
In this paper, we further improve the stochastic optimization of AURPC by (i) developing novel stochastic momentum methods with a better iteration complexity of $O(1/\epsilon^4)$ for finding an $\epsilon$-stationary solution; and (ii) designing a novel family of stochastic adaptive methods with the same iteration complexity, which enjoy faster convergence in practice.
1 code implementation • CVPR 2021 • Bowen Cheng, Lu Sheng, Shaoshuai Shi, Ming Yang, Dong Xu
Inspired by the back-tracing strategy in the conventional Hough voting methods, in this work, we introduce a new 3D object detection method, named as Back-tracing Representative Points Network (BRNet), which generatively back-traces the representative points from the vote centers and also revisits complementary seed points around these generated points, so as to better capture the fine local structural features surrounding the potential objects from the raw point clouds.
Ranked #19 on
3D Object Detection
on SUN-RGBD val
1 code implementation • CVPR 2021 • Jialian Wu, Jiale Cao, Liangchen Song, Yu Wang, Ming Yang, Junsong Yuan
Most online multi-object trackers perform object detection stand-alone in a neural net without any input from tracking.
Ranked #1 on
Instance Segmentation
on nuScenes
no code implementations • 11 Mar 2021 • Xingyu Jiang, Mingyang Qin, Xinjian Wei, Zhongpei Feng, Jiezun Ke, Haipeng Zhu, Fucong Chen, Liping Zhang, Li Xu, Xu Zhang, Ruozhou Zhang, Zhongxu Wei, Peiyu Xiong, Qimei Liang, Chuanying Xi, Zhaosheng Wang, Jie Yuan, Beiyi Zhu, Kun Jiang, Ming Yang, Junfeng Wang, Jiangping Hu, Tao Xiang, Brigitte Leridon, Rong Yu, Qihong Chen, Kui Jin, Zhongxian Zhao
Iron selenide (FeSe) - the structurally simplest iron-based superconductor, has attracted tremendous interest in the past years.
Superconductivity
no code implementations • 21 Jan 2021 • Ming Yang, Alceste Z. Bonanos, Biwei Jiang, Man I Lam, Jian Gao, Panagiotis Gavras, Grigoris Maravelias, Shu Wang, Xiao-Dian Chen, Frank Tramper, Yi Ren, Zoi T. Spetsieri
Further separating RSG candidates from the rest of the LSG candidates is done by using semi-empirical criteria on NIR CMDs and resulted in 323 RSG candidates.
Solar and Stellar Astrophysics Astrophysics of Galaxies
1 code implementation • 19 Jan 2021 • Zhuoman Liu, Wei Jia, Ming Yang, Peiyao Luo, Yong Guo, Mingkui Tan
To address the above issues, in this paper, we propose a novel deep generative model, called Self-Consistent Generative Network (SCGN), which synthesizes novel views from the given input views without explicitly exploiting the geometric information.
no code implementations • ICCV 2021 • Liangchen Song, Jialian Wu, Ming Yang, Qian Zhang, Yuan Li, Junsong Yuan
This task is confronted with two challenges: how to establish the 3D correspondences from views to the BEV map and how to assemble occupancy information across views.
Ranked #2 on
Multiview Detection
on CVCS
(MODA (1m) metric)
1 code implementation • 7 Dec 2020 • Jiansheng Fang, Xiaoqing Zhang, Yan Hu, Yanwu Xu, Ming Yang, Jiang Liu
Latent Factor Model (LFM) is one of the most successful methods for Collaborative filtering (CF) in the recommendation system, in which both users and items are projected into a joint latent factor space.
no code implementations • 23 Sep 2020 • Zehan Zhang, Ming Zhang, Zhidong Liang, Xian Zhao, Ming Yang, Wenming Tan, ShiLiang Pu
Experimental results on the KITTI dataset demonstrate significant improvement in filtering false positive over the approach using only point cloud data.
no code implementations • 20 Apr 2020 • Wanqi Yang, Tong Ling, Chengmei Yang, Lei Wang, Yinghuan Shi, Luping Zhou, Ming Yang
To address this issue, we propose a novel approach called Conditional ADversarial Image Translation (CADIT) to explicitly align the class distributions given samples between the two domains.
no code implementations • 2 Apr 2020 • Xiaoliang Wang, Yeqiang Qian, Chunxiang Wang, Ming Yang
As one of the most important tasks in autonomous driving systems, ego-lane detection has been extensively studied and has achieved impressive results in many scenarios.
1 code implementation • 24 Sep 2019 • Chenchen Zhao, Yeqiang Qian, Ming Yang
The 2D and 3D dimensions of pedestrians are determined from the camera captures and further utilized through two feedforward links connected to the orientation estimator.
2 code implementations • ICCV 2019 • Naiyu Gao, Yanhu Shan, Yupei Wang, Xin Zhao, Yinan Yu, Ming Yang, Kaiqi Huang
Moreover, incorporating with the learned affinity pyramid, a novel cascaded graph partition module is presented to sequentially generate instances from coarse to fine.
2 code implementations • 25 Jul 2019 • Shihao Zhang, Huazhu Fu, Yuguang Yan, Yubing Zhang, Qingyao Wu, Ming Yang, Mingkui Tan, Yanwu Xu
Learning structural information is critical for producing an ideal result in retinal image segmentation.
no code implementations • 29 Jun 2019 • Liuyuan Deng, Ming Yang, Tianyi Li, Yuesheng He, Chunxiang Wang
To instantiate this structure, the paper proposes a residual fusion block (RFB) to formulate the interdependences of the encoders.
Ranked #3 on
Semantic Segmentation
on ScanNetV2
1 code implementation • 24 Jun 2019 • Shunan Mao, Shiliang Zhang, Ming Yang
RIFE adopts two feature extraction streams weighted by a dual-attention block to learn features for low and high resolution images, respectively.
no code implementations • 27 May 2019 • Haoyan Liu, Yanming Liu, Ming Yang, Xiaoping Li
For reentry or near space communication, owing to the influence of the time-varying plasma sheath channel environment, the received IQ baseband signals are severely rotated on the constellation.
2 code implementations • CVPR 2019 • Jianzhong He, Shiliang Zhang, Ming Yang, Yanhu Shan, Tiejun Huang
Exploiting multi-scale representations is critical to improve edge detection for objects at different scales.
Ranked #2 on
Edge Detection
on BRIND
no code implementations • 20 Feb 2019 • Yi Ren, B. W. Jiang, Ming Yang, Jian Gao
The period-luminosity (P-L) relation is analyzed for the RSGs in the fundamental mode.
Solar and Stellar Astrophysics Astrophysics of Galaxies
no code implementations • 14 Feb 2019 • Zhidong Liang, Ming Yang, Chunxiang Wang
As a result, our framework can output both the semantic prediction and the instance prediction.
3D Instance Segmentation
3D Semantic Instance Segmentation
+2
no code implementations • 31 Oct 2018 • Xiao Liang, Liyuan Chen, Dan Nguyen, Zhiguo Zhou, Xuejun Gu, Ming Yang, Jing Wang, Steve Jiang
Dose calculation accuracy using sCT images has been improved over the original CBCT images, with the average Gamma Index passing rate increased from 95. 4% to 97. 4% for 1 mm/1% criteria.
Medical Physics
no code implementations • ECCV 2018 • Liangliang Ren, Xin Yuan, Jiwen Lu, Ming Yang, Jie Zhou
Visual tracking is confronted by the dilemma to locate a target both}accurately and efficiently, and make decisions online whether and how to adapt the appearance model or even restart tracking.
1 code implementation • ECCV 2018 • Ke Gong, Xiaodan Liang, Yicheng Li, Yimin Chen, Ming Yang, Liang Lin
Instance-level human parsing towards real-world human analysis scenarios is still under-explored due to the absence of sufficient data resources and technical difficulty in parsing multiple instances in a single pass.
Ranked #6 on
Human Part Segmentation
on CIHP
17 code implementations • ECCV 2018 • Tianwei Lin, Xu Zhao, Haisheng Su, Chongjing Wang, Ming Yang
Temporal action proposal generation is an important yet challenging problem, since temporal proposals with rich action content are indispensable for analysing real-world videos with long duration and high proportion irrelevant content.
Ranked #3 on
Temporal Action Proposal Generation
on THUMOS' 14
no code implementations • CVPR 2018 • Weixiang Hong, Zhenzhen Wang, Ming Yang, Junsong Yuan
In recent years, deep neural nets have triumphed over many computer vision problems, including semantic segmentation, which is a critical task in emerging autonomous driving and medical image diagnostics applications.
no code implementations • CVPR 2018 • Jingwen Chen, Jia-Wei Chen, Hongyang Chao, Ming Yang
In this paper, we consider a typical image blind denoising problem, which is to remove unknown noise from noisy images.
no code implementations • 2 Jan 2018 • Liuyuan Deng, Ming Yang, Hao Li, Tianyi Li, Bing Hu, Chunxiang Wang
Finally, an RDC based semantic segmentation model is built; the model is trained for real-world surround view images through a multi-task learning architecture by combining real-world images with transformed images.
no code implementations • 9 Sep 2017 • Mingwei Cao, Ming Yang, Chunxiang Wang, Yeqiang Qian, Bing Wang
In view of contemporary panoramic camera-laser scanner system, the traditional calibration method is not suitable for panoramic cameras whose imaging model is extremely nonlinear.
no code implementations • 13 Feb 2017 • You Lin, Ming Yang, Can Wan, Jianhui Wang, Yonghua Song
Therefore, a novel multi-model combination (MMC) approach for short-term probabilistic wind generation forecasting is proposed in this paper to exploit the advantages of different forecasting models.
no code implementations • 3 Oct 2016 • Yingming Li, Ming Yang, Zhongfei Zhang
Consequently, we first review the representative methods and theories of multi-view representation learning based on the perspective of alignment, such as correlation-based alignment.
1 code implementation • 27 Sep 2016 • Zhao Kang, Chong Peng, Ming Yang, Qiang Cheng
To alleviate this problem, this paper proposes a simple recommendation algorithm that fully exploits the similarity information among users and items and intrinsic structural information of the user-item matrix.
no code implementations • 18 Dec 2014 • Yunchao Gong, Liu Liu, Ming Yang, Lubomir Bourdev
In this paper, we tackle this model storage issue by investigating information theoretical vector quantization methods for compressing the parameters of CNNs.
4 code implementations • Conference on Computer Vision and Pattern Recognition (CVPR) 2014 • Yaniv Taigman, Ming Yang, Marc’ Aurelio Ranzato, Lior Wolf
In modern face recognition, the conventional pipeline consists of four stages: detect => align => represent => classify.
Ranked #1 on
3D Face Modelling
on LFW
no code implementations • CVPR 2015 • Yaniv Taigman, Ming Yang, Marc'Aurelio Ranzato, Lior Wolf
Scaling machine learning methods to very large datasets has attracted considerable attention in recent years, thanks to easy access to ubiquitous sensing and data from the web.