no code implementations • 17 Apr 2025 • Yongqian Peng, Yuxi Ma, Mengmeng Wang, Yuxuan Wang, Yizhou Wang, Chi Zhang, Yixin Zhu, Zilong Zheng
The ability to combine existing concepts into novel ideas stands as a fundamental hallmark of human intelligence.
no code implementations • 31 Dec 2024 • Yipeng Kang, Junqi Wang, Yexin Li, Fangwei Zhong, Xue Feng, Mengmeng Wang, Wenming Tu, Quansen Wang, Hengli Li, Zilong Zheng
As large language models (LLMs) become increasingly integrated into critical applications, aligning their behavior with human values presents significant challenges.
no code implementations • 14 Dec 2024 • Dengyang Jiang, Haoyu Wang, Lei Zhang, Wei Wei, Guang Dai, Mengmeng Wang, Jingdong Wang, Yanning Zhang
Pre-training backbone networks on a general annotated dataset (e. g., ImageNet) that comprises numerous manually collected images with category annotations has proven to be indispensable for enhancing the generalization capacity of downstream visual tasks.
no code implementations • 13 Dec 2024 • Mengmeng Wang, Teli Ma, Shuo Xin, Xiaojun Hou, Jiazheng Xing, Guang Dai, Jingdong Wang, Yong liu
Specifically, we first review three types of mainstream single-modal VOT, including RGB, thermal infrared and point cloud tracking.
no code implementations • 19 Nov 2024 • Teli Ma, Zifan Wang, Jiaming Zhou, Mengmeng Wang, Junwei Liang
To address these limitations, we propose GLOVER, a unified Generalizable Open-Vocabulary Affordance Reasoning framework, which fine-tunes the Large Language Models (LLMs) to predict visual affordance of graspable object parts within RGB feature space.
Common Sense Reasoning
Human-Object Interaction Detection
+2
no code implementations • 24 Oct 2024 • Haonan Lin, Mengmeng Wang, Jiahao Wang, Wenbin An, Yan Chen, Yong liu, Feng Tian, Guang Dai, Jingdong Wang, Qianying Wang
To resolve this, we introduce the Logistic Schedule, a novel noise schedule designed to eliminate singularities, improve inversion stability, and provide a better noise space for image editing.
no code implementations • 29 Sep 2024 • Haonan Lin, Wenbin An, Jiahao Wang, Yan Chen, Feng Tian, Mengmeng Wang, Guang Dai, Qianying Wang, Jingdong Wang
Recent advancements have shown promise in applying traditional Semi-Supervised Learning strategies to the task of Generalized Category Discovery (GCD).
no code implementations • 23 Sep 2024 • Chun Xu, Mengmeng Wang, Yan Ren, Shaolin Zhu
Additionally, there is an F1 improvement of 0. 05% and 1. 06% on the Rest15 and Rest16 datasets, respectively.
Aspect-Based Sentiment Analysis
Aspect-Category-Opinion-Sentiment Quadruple Extraction
no code implementations • 7 Sep 2024 • Jiahao Wang, Caixia Yan, Weizhan Zhang, Haonan Lin, Mengmeng Wang, Guang Dai, Tieliang Gong, Hao Sun, Jingdong Wang
For these issues, we pioneer a novel task, Layout-to-Consistent-Image (L2CI) generation, which produces consistent and compositional images in accordance with the given layout conditions and text prompts.
1 code implementation • 29 Aug 2024 • Juntao Jiang, Mengmeng Wang, Huizhong Tian, Lingbo Cheng, Yong liu
While large models have achieved significant progress in computer vision, challenges such as optimization complexity, the intricacy of transformer architectures, computational constraints, and practical application demands highlight the importance of simpler model designs in medical image segmentation.
1 code implementation • 24 Jun 2024 • Zixia Jia, Mengmeng Wang, Baichen Tong, Song-Chun Zhu, Zilong Zheng
Recent advances in Large Language Models (LLMs) have shown inspiring achievements in constructing autonomous agents that rely on language descriptions as inputs.
no code implementations • 16 Apr 2024 • Jiahao Wang, Caixia Yan, Haonan Lin, Weizhan Zhang, Mengmeng Wang, Tieliang Gong, Guang Dai, Hao Sun
To mitigate the overfitting challenge shared by one-shot tuning pipelines, we augment the tuning with auxiliary samples and devise two inference strategies: semantic interpolation and cluster guidance.
no code implementations • CVPR 2024 • Haonan Lin, Mengmeng Wang, Yan Chen, Wenbin An, Yuzhe Yao, Guang Dai, Qianying Wang, Yong liu, Jingdong Wang
While large-scale pre-trained text-to-image models can synthesize diverse and high-quality human-centered images, novel challenges arise with a nuanced task of "identity fine editing": precisely modifying specific features of a subject while maintaining its inherent identity and context.
1 code implementation • CVPR 2024 • Xiaojun Hou, Jiazheng Xing, Yijie Qian, Yaowei Guo, Shuo Xin, JunHao Chen, Kai Tang, Mengmeng Wang, Zhengkai Jiang, Liang Liu, Yong liu
Multimodal Visual Object Tracking (VOT) has recently gained significant attention due to its robustness.
Ranked #29 on
Rgb-T Tracking
on RGBT234
no code implementations • 22 Jan 2024 • Mengmeng Wang, Jiazheng Xing, Boyuan Jiang, Jun Chen, Jianbiao Mei, Xingxing Zuo, Guang Dai, Jingdong Wang, Yong liu
In this paper, we introduce a novel Multimodal, Multi-task CLIP adapting framework named \name to address these challenges, preserving both high supervised performance and robust transferability.
1 code implementation • 10 Dec 2023 • Jianbiao Mei, Yu Yang, Mengmeng Wang, Junyu Zhu, Jongwon Ra, Yukai Ma, Laijian Li, Yong liu
In this paper, we adopt the dense-sparse-dense design and propose a one-stage camera-based SSC framework, termed SGN, to propagate semantics from the semantic-aware seed voxels to the whole scene based on spatial geometry cues.
no code implementations • 4 Dec 2023 • Chengyou Jia, Minnan Luo, Xiaojun Chang, Zhuohang Dang, Mingfei Han, Mengmeng Wang, Guang Dai, Sizhe Dang, Jingdong Wang
To realize this, we innovatively blend video models with Large Language Models (LLMs) to devise Action-conditioned Prompts.
1 code implementation • 8 Nov 2023 • Jiaqi Li, Mengmeng Wang, Zilong Zheng, Muhan Zhang
In this paper, we present LooGLE, a Long Context Generic Language Evaluation benchmark for LLMs' long context understanding.
no code implementations • ICCV 2023 • Teli Ma, Mengmeng Wang, Jimin Xiao, Huifeng Wu, Yong liu
In this paper, we forsake the conventional Siamese paradigm and propose a novel single-branch framework, SyncTrack, synchronizing the feature extracting and matching to avoid forwarding encoder twice for template and search region as well as introducing extra parameters of matching network.
no code implementations • 21 Aug 2023 • Jun Chen, Haishan Ye, Mengmeng Wang, Tianxin Huang, Guang Dai, Ivor W. Tsang, Yong liu
This paper proposes a decentralized Riemannian conjugate gradient descent (DRCGD) method that aims at minimizing a global function over the Stiefel manifold.
no code implementations • 20 Aug 2023 • Chengyou Jia, Minnan Luo, Zhuohang Dang, Guang Dai, Xiaojun Chang, Mengmeng Wang, Jingdong Wang
Despite significant progress in Text-to-Image (T2I) generative models, even lengthy and complex text descriptions still struggle to convey detailed controls.
1 code implementation • ICCV 2023 • Jiazheng Xing, Mengmeng Wang, Yudi Ruan, Bofan Chen, Yaowei Guo, Boyu Mu, Guang Dai, Jingdong Wang, Yong liu
Class prototype construction and matching are core aspects of few-shot action recognition.
no code implementations • 3 Aug 2023 • Jiazheng Xing, Chao Xu, Mengmeng Wang, Guang Dai, Baigui Sun, Yong liu, Jingdong Wang, Jian Zhao
To tackle these issues, we introduce MA-FSAR, a framework that employs the Parameter-Efficient Fine-Tuning (PEFT) technique to enhance the CLIP visual encoder in terms of action-related temporal and semantic representations.
no code implementations • 2 Jul 2023 • Jun Chen, Shipeng Bai, Tianxin Huang, Mengmeng Wang, Guanzhong Tian, Yong liu
In this paper, we propose a data-free mixed-precision compensation (DF-MPC) method to recover the performance of an ultra-low precision quantized model without any data and fine-tuning process.
1 code implementation • 27 Jun 2023 • Jianbiao Mei, Yu Yang, Mengmeng Wang, Tianxin Huang, Xuemeng Yang, Yong liu
However, how to effectively exploit the relationships between the semantic context in semantic segmentation and geometric structure in scene completion remains under exploration.
1 code implementation • 27 Jun 2023 • Jianbiao Mei, Yu Yang, Mengmeng Wang, Xiaojun Hou, Laijian Li, Yong liu
Firstly, we propose a non-learning Sparse Instance Proposal (SIP) module with the ``sampling-shifting-grouping" scheme to directly group thing points into instances from the raw point cloud efficiently.
no code implementations • 16 May 2023 • Mengmeng Wang, Teli Ma, Xingxing Zuo, Jiajun Lv, Yong liu
Additionally, considering the sparsity characteristics of the point clouds, we design a lateral correlation pyramid structure for the encoder to keep as many points as possible by integrating hierarchical correlated features.
1 code implementation • ICCV 2023 • Zizhang Li, Xiaoyang Lyu, Yuanyuan Ding, Mengmeng Wang, Yiyi Liao, Yong liu
Recently, neural implicit surfaces have become popular for multi-view reconstruction.
1 code implementation • 10 Feb 2023 • Yuanxin Ye, Mengmeng Wang, Liang Zhou, Guangyang Lei, Jianwei Fan, Yao Qin
First, through the inner fusion property of 3D convolution, we design a new feature fusion way that can simultaneously extract and fuse the feature information from bi-temporal images.
no code implementations • 10 Feb 2023 • Mengmeng Wang, Zhiqiang Han, Peizhen Yang, Bai Zhu, Ming Hao, Jianwei Fan, Yuanxin Ye
In this letter, a novel method for change detection is proposed using neighborhood structure correlation.
no code implementations • 7 Feb 2023 • Jun Chen, Hanwen Chen, Mengmeng Wang, Guang Dai, Ivor W. Tsang, Yong liu
By introducing a partial differential equation on metrics, i. e., the Ricci flow, we establish the dynamical stability and convergence of the LNE metric with the $L^2$-norm perturbation.
no code implementations • 19 Jan 2023 • Jiazheng Xing, Mengmeng Wang, Yong liu, Boyu Mu
In this paper, we propose SloshNet, a new framework that revisits the spatial and temporal modeling for few-shot action recognition in a finer manner.
no code implementations • 17 Jan 2023 • Haoxin Chen, Mengmeng Wang, Yong liu
The locality of lane representation is the ability to modify lanes locally which can simplify parameter optimization.
1 code implementation • 17 Jul 2022 • Zizhang Li, Mengmeng Wang, Huaijin Pi, Kechun Xu, Jianbiao Mei, Yong liu
However, the redundant parameters within the network structure can cause a large model size when scaling up for desirable performance.
Ranked #5 on
Video Reconstruction
on UVG
no code implementations • 21 Dec 2021 • Jun Chen, Yuang Liu, Xiangrui Zhao, Mengmeng Wang, Yong liu
As a result, we prove that, if initial metrics have an $L^2$-norm perturbation which deviates from the Hyperbolic metric on the Poincar\'e ball, the scaled Ricci-DeTurck flow of such metrics smoothly and exponentially converges to the Hyperbolic metric.
1 code implementation • 29 Nov 2021 • Teli Ma, Shijie Geng, Mengmeng Wang, Jing Shao, Jiasen Lu, Hongsheng Li, Peng Gao, Yu Qiao
Recent advances in large-scale contrastive visual-language pretraining shed light on a new pathway for visual recognition.
Ranked #5 on
Long-tail Learning
on Places-LT
(using extra training data)
no code implementations • 21 Nov 2021 • Zizhang Li, Mengmeng Wang, Jianbiao Mei, Yong liu
Referring image segmentation is a typical multi-modal task, which aims at generating a binary mask for referent described in given language expressions.
Ranked #1 on
Referring Expression Segmentation
on G-Ref test B
no code implementations • 28 Oct 2021 • Mengmeng Wang, Xiaoqian Yang, Yong liu
Visual object tracking performance has been dramatically improved in recent years, but some severe challenges remain open, like distractors and occlusions.
2 code implementations • 17 Sep 2021 • Mengmeng Wang, Jiazheng Xing, Yong liu
Moreover, to handle the deficiency of label texts and make use of tremendous web data, we propose a new paradigm based on this multimodal learning framework for action recognition, which we dub "pre-train, prompt and fine-tune".
Ranked #2 on
Action Recognition In Videos
on Kinetics-400
2 code implementations • ICCV 2021 • Lina Liu, Xibin Song, Mengmeng Wang, Yong liu, Liangjun Zhang
Meanwhile, to guarantee that the day and night images contain the same information, the domain-separated network takes the day-time images and corresponding night-time images (generated by GAN) as input, and the private and invariant feature extractors are learned by orthogonality and similarity loss, where the domain gap can be alleviated, thus better depth maps can be expected.
1 code implementation • 1 Jun 2021 • Jianbiao Mei, Mengmeng Wang, Yeneng Lin, Yi Yuan, Yong liu
Recently, Space-Time Memory Network (STM) based methods have achieved state-of-the-art performance in semi-supervised video object segmentation (VOS).
no code implementations • 8 Feb 2021 • Guangming Yao, Yi Yuan, Tianjia Shao, Shuang Li, Shanqi Liu, Yong liu, Mengmeng Wang, Kun Zhou
The paper proposes a novel generative adversarial network for one-shot face reenactment, which can animate a single face image to a different pose-and-expression (provided by a driving image) while keeping its original appearance.
no code implementations • 5 Feb 2021 • Jilin Tang, Yi Yuan, Tianjia Shao, Yong liu, Mengmeng Wang, Kun Zhou
In this paper we tackle the problem of pose guided person image generation, which aims to transfer a person image from the source pose to a novel target pose while maintaining the source appearance.
no code implementations • ICCV 2021 • Tianxin Huang, Hao Zou, Jinhao Cui, Xuemeng Yang, Mengmeng Wang, Xiangrui Zhao, Jiangning Zhang, Yi Yuan, Yifan Xu, Yong liu
The RFE extracts multiple global features from the incomplete point clouds for different recurrent levels, and the FDC generates point clouds in a coarse-to-fine pipeline.
no code implementations • 15 Dec 2020 • Lina Liu, Xibin Song, Xiaoyang Lyu, Junwei Diao, Mengmeng Wang, Yong liu, Liangjun Zhang
Then, a refined depth map is further obtained using a residual learning strategy in the coarse-to-fine stage with a coarse depth map and color image as input.
1 code implementation • 14 Dec 2020 • Xiaoyang Lyu, Liang Liu, Mengmeng Wang, Xin Kong, Lina Liu, Yong liu, Xinxin Chen, Yi Yuan
To obtainmore accurate depth estimation in large gradient regions, itis necessary to obtain high-resolution features with spatialand semantic information.
Ranked #7 on
Unsupervised Monocular Depth Estimation
on KITTI-C
no code implementations • 15 Sep 2020 • Haisheng Su, Jing Su, Dongliang Wang, Weihao Gan, Wei Wu, Mengmeng Wang, Junjie Yan, Yu Qiao
Second, the parameter frequency distribution is further adopted to guide the student network to learn the appearance modeling process from the teacher.
1 code implementation • 26 Aug 2020 • Xin Kong, Xuemeng Yang, Guangyao Zhai, Xiangrui Zhao, Xianfang Zeng, Mengmeng Wang, Yong liu, Wanlong Li, Feng Wen
First, we propose a novel semantic graph representation for the point cloud scenes by reserving the semantic and topological information of the raw point cloud.
1 code implementation • ECCV 2020 • Jiangning Zhang, Chao Xu, Liang Liu, Mengmeng Wang, Xia Wu, Yong liu, Yunliang Jiang
The proposed DTVNet consists of two submodules: \emph{Optical Flow Encoder} (OFE) and \emph{Dynamic Video Generator} (DVG).
no code implementations • 26 May 2020 • Qinghua Chen, Yan Wang, Mengmeng Wang, Xiaomeng Li
In addition, we collected Chinese literature corpora for different historical periods from the Tang Dynasty to the present, and we dismantled the Chinese written language into three kinds of basic particles: characters, strokes and constructive parts.
no code implementations • 29 Mar 2020 • Xianfang Zeng, Yusu Pan, Mengmeng Wang, Jiangning Zhang, Yong liu
On the one hand, we adopt the deforming autoencoder to disentangle identity and pose representations.
1 code implementation • 16 Mar 2020 • Chunfang Deng, Mengmeng Wang, Liang Liu, Yong liu
Small object detection remains an unsolved challenge because it is hard to extract information of small objects with only a few pixels.
no code implementations • ICCV 2019 • Boyuan Jiang, Mengmeng Wang, Weihao Gan, Wei Wu, Junjie Yan
Spatiotemporal and motion features are two complementary and crucial information for video action recognition.
Ranked #1 on
Action Recognition In Videos
on HMDB-51
1 code implementation • CVPR 2020 • Jiangning Zhang, Xianfang Zeng, Mengmeng Wang, Yusu Pan, Liang Liu, Yong liu, Yu Ding, Changjie Fan
This paper presents a novel multi-identity face reenactment framework, named FReeNet, to transfer facial expressions from an arbitrary source face to a target face with a shared model.
no code implementations • CVPR 2017 • Mengmeng Wang, Yong liu, Zeyi Huang
Structured output support vector machine (SVM) based tracking algorithms have shown favorable performance recently.
no code implementations • 15 Mar 2017 • Mengmeng Wang, Daobilige Su, Lei Shi, Yong liu, Jaime Valls Miro
An ultrasonic sensor array is employed to provide the range information from the target person to the robot and Gaussian Process Regression is used for partial location estimation (2-D).
no code implementations • 23 Sep 2015 • Mengmeng Wang, Yong liu
A discriminative model which accounts for the matching degree of local patches is adopted via a bottom ensemble layer, and a generative model which exploits holistic templates is used to search for the object through the middle ensemble layer as well as an adaptive Kalman filter.