no code implementations • 15 Jun 2025 • Zhuoying Li, Zhu Xu, Yuxin Peng, Yang Liu
To tackle this, we introduce a new metric called Balancing Preservation and Modification (BPM), tailored for instruction-based image editing by explicitly disentangling the image into editing-relevant and irrelevant regions for specific consideration.
no code implementations • CVPR 2025 • HsiaoYuan Hsu, Yuxin Peng
To further explore PosterO's abilities under the generalized settings, we built PStylish7, the first dataset with multi-purpose posters and various-shaped elements, further offering a challenging test for advanced research.
1 code implementation • CVPR 2025 • Geng Li, Jinglin Xu, Yunzhen Zhao, Yuxin Peng
Humans can effortlessly locate desired objects in cluttered environments, relying on a cognitive mechanism known as visual search to efficiently filter out irrelevant information and focus on task-related regions.
1 code implementation • 21 Apr 2025 • Hong-Tao Yu, Xiu-Shen Wei, Yuxin Peng, Serge Belongie
Recent advancements in Large Vision-Language Models (LVLMs) have demonstrated remarkable multimodal perception capabilities, garnering significant attention.
1 code implementation • CVPR 2025 • Jiayi Gao, Zijin Yin, Changcheng Hua, Yuxin Peng, Kongming Liang, Zhanyu Ma, Jun Guo, Yang Liu
The development of Text-to-Video (T2V) generation has made motion transfer possible, enabling the control of video motion based on existing footage.
1 code implementation • CVPR 2025 • Zichen Liu, Kunlun Xu, Bing Su, Xu Zou, Yuxin Peng, Jiahuan Zhou
Recent video prompting methods attempt to adapt CLIP for video tasks by introducing learnable prompts, but they typically rely on a single static prompt for all video sequences, overlooking the diverse temporal dynamics and spatial variations that exist across frames.
1 code implementation • CVPR 2025 • Chenyu Zhang, Kunlun Xu, Zichen Liu, Yuxin Peng, Jiahuan Zhou
To address these limitations, we propose a novel transductive TTA framework, Supportive Clique-based Attribute Prompting (SCAP), which effectively combines visual and textual information to enhance adaptation by generating fine-grained attribute prompts across test batches.
1 code implementation • 25 Jan 2025 • Hulingxiao He, Geng Li, Zijun Geng, Jinglin Xu, Yuxin Peng
Multi-modal large language models (MLLMs) have shown remarkable abilities in various visual understanding tasks.
1 code implementation • CVPR 2025 • Zhenyu Cui, Jiahuan Zhou, Yuxin Peng
However, due to the long-term nature of lifelong learning, the inevitable changes in human clothes prevent the model from relying on unified discriminative information (e. g., clothing style) to match the same person in the streaming data, demanding differentiated cloth-irrelevant information.
1 code implementation • 12 Dec 2024 • Yifeng Yao, Zichen Liu, Zhenyu Cui, Yuxin Peng, Jiahuan Zhou
To prevent the loss of discriminative information during state space propagation, SVP employs lightweight selective prompters for token-wise prompt generation, ensuring adaptive activation of the update and forget gates within Mamba blocks to promote discriminative information propagation.
1 code implementation • 12 Dec 2024 • Kunlun Xu, Chenghao Jiang, Peixi Xiong, Yuxin Peng, Jiahuan Zhou
To overcome these challenges, we propose a novel paradigm that models and rehearses the distribution of the old domains to enhance knowledge consolidation during the new data learning, possessing a strong anti-forgetting capacity without storing any exemplars.
no code implementations • 10 Oct 2024 • Hulingxiao He, Yaqi Zhang, Jinglin Xu, Yuxin Peng
Plant counting is essential in every stage of agriculture, including seed breeding, germination, cultivation, fertilization, pollination yield estimation, and harvesting.
1 code implementation • 10 Oct 2024 • Hulingxiao He, Xiangteng He, Yuxin Peng, Zifei Shan, Xin Su
Recommendation models utilizing unique identities (IDs) to represent distinct users and items have dominated the recommender systems literature for over a decade.
1 code implementation • 8 Oct 2024 • Zishuo Wang, Wenhao Zhou, Jinglin Xu, Yuxin Peng
To mitigate the inaccurate region classification in OVD, we propose a new Shape-Invariant Adapter named SIA-OVD to bridge the image-region gap in the OVD task.
Ranked #10 on
Open Vocabulary Object Detection
on MSCOCO
no code implementations • journal 2024 • Hongbo Sun, Xiangteng He, Jinglin Xu, Yuxin Peng
To address the above issue, we propose a Structure Information Mining and Object-aware Feature Enhancement (SIM-OFE) method for finegrained visual categorization, which explores the visual object’s internal structure composition and appearance traits.
Ranked #5 on
Fine-Grained Image Classification
on Stanford Dogs
Fine-Grained Image Classification
Fine-Grained Visual Categorization
+1
1 code implementation • 29 Aug 2024 • Minghang Zheng, Xinhao Cai, Qingchao Chen, Yuxin Peng, Yang Liu
In this paper, we propose a Training-Free Video Temporal Grounding (TFVTG) approach that leverages the ability of pre-trained large models.
1 code implementation • 29 Aug 2024 • Minghang Zheng, Jiahua Zhang, Qingchao Chen, Yuxin Peng, Yang Liu
This method generates additional training data by synthesizing images containing multiple objects of the same category and pseudo queries based on their spatial relationships.
1 code implementation • 5 Aug 2024 • Ting Lei, Shaofeng Yin, Yuxin Peng, Yang Liu
This approach enhances the generalization of large foundation models, such as CLIP, when fine-tuned for HOI detection.
1 code implementation • CVPR 2024 • Jinglin Xu, Sibo Yin, Guohao Zhao, Zishuo Wang, Yuxin Peng
We argue that a fine-grained understanding of actions requires the model to perceive and parse actions in both time and space, which is also the key to the credibility and interpretability of the AQA technique.
Ranked #2 on
Action Quality Assessment
on FineDiving
1 code implementation • CVPR 2024 • Jinglin Xu, Yijie Guo, Yuxin Peng
We further extend FinePOSE to multi-human pose estimation.
1 code implementation • CVPR 2024 • Jinglin Xu, Guohao Zhao, Sibo Yin, Wenhao Zhou, Yuxin Peng
Fine-grained action analysis in multi-person sports is complex due to athletes' quick movements and intense physical confrontations which result in severe visual obstructions in most scenes.
1 code implementation • CVPR 2024 • Kunlun Xu, Xu Zou, Yuxin Peng, Jiahuan Zhou
Then the Distribution-oriented Prototype Generation algorithm transforms the instance-level diversity into identity-level distributions as prototypes which is further explored by the designed Prototype-based Knowledge Transfer module to enhance the knowledge anti-forgetting and acquisition capacity of the LReID model.
1 code implementation • CVPR 2024 • Qiwei Li, Yuxin Peng, Jiahuan Zhou
Non-Exemplar Class Incremental Learning (NECIL) involves learning a classification model on a sequence of data without access to exemplars from previously encountered old classes.
1 code implementation • CVPR 2024 • Zhenyu Cui, Jiahuan Zhou, Xun Wang, Manyu Zhu, Yuxin Peng
To this end we propose a Continual Compatible Representation (C2R) method which facilitates the query feature calculated by the continuously updated model to effectively retrieve the gallery feature calculated by the old model in a compatible manner.
1 code implementation • 21 Nov 2023 • Xiu-Shen Wei, Yang shen, Xuhao Sun, Peng Wang, Yuxin Peng
Our work focuses on tackling large-scale fine-grained image retrieval as ranking the images depicting the concept of interests (i. e., the same sub-category labels) highest based on the fine-grained details in the query.
1 code implementation • ICCV 2023 • Ting Lei, Fabian Caba, Qingchao Chen, Hailin Jin, Yuxin Peng, Yang Liu
This observation motivates us to design an HOI detector that can be trained even with long-tailed labeled data and can leverage existing knowledge from pre-trained models.
1 code implementation • 24 Jul 2023 • Peng Wu, Jing Liu, Xiangteng He, Yuxin Peng, Peng Wang, Yanning Zhang
In this context, we propose a novel task called Video Anomaly Retrieval (VAR), which aims to pragmatically retrieve relevant anomalous videos by cross-modalities, e. g., language descriptions and synchronous audios.
no code implementations • 19 Jun 2023 • Mingshi Yan, Zhiyong Cheng, Jing Sun, Fuming Sun, Yuxin Peng
In this paper, we propose MB-HGCN, a novel multi-behavior recommendation model that uses a hierarchical graph convolutional network to learn user and item embeddings from coarse-grained on the global level to fine-grained on the behavior-specific level.
1 code implementation • 28 Mar 2023 • Zhiyong Cheng, Sai Han, Fan Liu, Lei Zhu, Zan Gao, Yuxin Peng
Most existing multi-behavior models fail to capture such dependencies in a behavior chain for embedding learning.
1 code implementation • CVPR 2023 • HsiaoYuan Hsu, Xiangteng He, Yuxin Peng, Hao Kong, Qing Zhang
Content-aware visual-textual presentation layout aims at arranging spatial space on the given canvas for pre-defined elements, including text, logo, and underlay, which is a key to automatic template-free creative graphic design.
1 code implementation • ICCV 2023 • Yulin Pan, Xiangteng He, Biao Gong, Yiliang Lv, Yujun Shen, Yuxin Peng, Deli Zhao
Video temporal grounding aims to pinpoint a video segment that matches the query description.
1 code implementation • ICCV 2023 • Yang Liu, Jiahua Zhang, Qingchao Chen, Yuxin Peng
Visual grounding aims at localizing the target object in image which is most related to the given free-form natural language query.
1 code implementation • ICCV 2023 • Zijing Zhao, Sitong Wei, Qingchao Chen, Dehui Li, Yifan Yang, Yuxin Peng, Yang Liu
This helps the student model capture target domain characteristics and become a more data-efficient learner to gain knowledge from the limited number of pseudo boxes.
3 code implementations • 28 Sep 2022 • Xiu-Shen Wei, He-Yang Xu, Faen Zhang, Yuxin Peng, Wei Zhou
Semi-supervised few-shot learning consists in training a classifier to adapt to new tasks with limited labeled data and a fixed quantity of unlabeled data.
1 code implementation • 31 Aug 2022 • Hongbo Sun, Xiangteng He, Yuxin Peng
To address the above limitations, we propose the Structure Information Modeling Transformer (SIM-Trans) to incorporate object structure information into transformer for enhancing discriminative representation learning to contain both the appearance information and structure information.
no code implementations • 6 Jul 2022 • Minghang Zheng, Dejie Yang, Zhongjie Ye, Ting Lei, Yuxin Peng, Yang Liu
In this technical report, we briefly introduce the solutions of our team `PKU-WICT-MIPL' for the PIC Makeup Temporal Video Grounding (MTVG) Challenge in ACM-MM 2022.
1 code implementation • 10 Jan 2022 • Ansong Li, Zhiyong Cheng, Fan Liu, Zan Gao, Weili Guan, Yuxin Peng
The session embedding is then generated by aggregating the item embeddings with attention weights of each item's factors.
1 code implementation • CVPR 2022 • Minghang Zheng, Yanjie Huang, Qingchao Chen, Yuxin Peng, Yang Liu
Moreover, they train their model to distinguish positive visual-language pairs from negative ones randomly collected from other videos, ignoring the highly confusing video segments within the same video.
Ranked #8 on
Temporal Sentence Grounding
on Charades-STA
no code implementations • 11 Nov 2021 • Xiu-Shen Wei, Yi-Zhe Song, Oisin Mac Aodha, Jianxin Wu, Yuxin Peng, Jinhui Tang, Jian Yang, Serge Belongie
Fine-grained image analysis (FGIA) is a longstanding and fundamental problem in computer vision and pattern recognition, and underpins a diverse set of real-world applications.
1 code implementation • 10 Jul 2019 • Xiangteng He, Yuxin Peng, Liu Xie
To the best of our knowledge, it is the first benchmark with 4 media types for fine-grained cross-media retrieval.
no code implementations • CVPR 2019 • Junchao Zhang, Yuxin Peng
The main novelties and advantages are: (1) Bidirectional temporal graph: A bidirectional temporal graph is constructed along and reversely along the temporal order, which provides complementary ways to capture the temporal trajectories for each salient object.
no code implementations • 21 Aug 2018 • Mingkuan Yuan, Yuxin Peng
For addressing these problems, we exploit the excellent capability of generic discriminative models (e. g. VGG19), which can guide the training process of a new generative model on multiple levels to bridge the two gaps.
no code implementations • 26 Apr 2018 • Chenrui Zhang, Yuxin Peng
Video representation learning is a vital problem for classification task.
no code implementations • 26 Apr 2018 • Chenrui Zhang, Yuxin Peng
First, we propose multi-level semantic inference to boost video feature synthesis, which captures the discriminative information implied in joint visual-semantic distribution via feature-level and label-level semantic inference.
1 code implementation • 25 Apr 2018 • Jinwei Qi, Yuxin Peng, Yuxin Yuan
First, we propose visual-language relation attention model to explore both fine-grained patches and their relations of different media types.
no code implementations • CVPR 2018 • Xin Huang, Yuxin Peng
For achieving the goal, this paper proposes deep cross-media knowledge transfer (DCKT) approach, which transfers knowledge from a large-scale cross-media dataset to promote the model training on another small-scale cross-media dataset.
Multimedia
no code implementations • 7 Feb 2018 • Jian Zhang, Yuxin Peng, Mingkuan Yuan
(2) Ignore the rich information contained in the large amount of unlabeled data across different modalities, especially the margin examples that are easily to be incorrectly retrieved, which can help to model the correlations.
no code implementations • 7 Feb 2018 • Yuxin Peng, Jian Zhang, Zhaoda Ye
Inspired by the sequential decision ability of deep reinforcement learning, we propose a new Deep Reinforcement Learning approach for Image Hashing (DRLIH).
no code implementations • 1 Dec 2017 • Jian Zhang, Yuxin Peng, Mingkuan Yuan
To address the above problem, in this paper we propose an Unsupervised Generative Adversarial Cross-modal Hashing approach (UGACH), which makes full use of GAN's ability for unsupervised representation learning to exploit the underlying manifold structure of cross-modal data.
no code implementations • 9 Nov 2017 • Yuxin Peng, Yunzhen Zhao, Junchao Zhang
Recently, researchers generally adopt the deep networks to capture the static and motion information \textbf{\emph{separately}}, which mainly has two limitations: (1) Ignoring the coexistence relationship between spatial and temporal attention, while they should be jointly modelled as the spatial and temporal evolutions of video, thus discriminative video features can be extracted.
no code implementations • 14 Oct 2017 • Yuxin Peng, Jinwei Qi, Yuxin Yuan
They can not only exploit cross-modal correlation for learning common representation, but also preserve reconstruction information for capturing semantic consistency within each modality.
no code implementations • 30 Sep 2017 • Xiangteng He, Yuxin Peng, Junjie Zhao
Therefore, we propose a weakly supervised discriminative localization approach (WSDL) for fast fine-grained image classification to address the two limitations at the same time, and its main advantages are: (1) n-pathway end-to-end discriminative localization network is designed to improve classification speed, which simultaneously localizes multiple different discriminative regions for one image to boost classification accuracy, and shares full-image convolutional features generated by region proposal network to accelerate the process of generating region proposals as well as reduce the computation of convolutional operation.
no code implementations • 25 Sep 2017 • Xiangteng He, Yuxin Peng, Junjie Zhao
Existing methods generally adopt a two-stage learning framework: The first stage is to localize the discriminative regions of objects, and the second is to encode the discriminative features for training classifiers.
1 code implementation • 31 Aug 2017 • Xiangteng He, Yuxin Peng
As is known to all, when we describe the object of an image via textual descriptions, we mainly focus on the pivotal characteristics, and rarely pay attention to common characteristics as well as the background areas.
1 code implementation • 16 Aug 2017 • Yuxin Peng, Jinwei Qi, Yuxin Yuan
Effectively measuring the similarity between different modalities of data is the key of cross-modal retrieval.
no code implementations • 8 Aug 2017 • Xin Huang, Yuxin Peng, Mingkuan Yuan
Transfer learning is for relieving the problem of insufficient training data, but it mainly focuses on knowledge transfer only from large-scale datasets as single-modal source domain to single-modal target domain.
no code implementations • CVPR 2017 • Xiangteng He, Yuxin Peng
Most existing fine-grained image classification methods generally learn part detection models to obtain the semantic parts for better classification accuracy.
no code implementations • 1 Jun 2017 • Xin Huang, Yuxin Peng, Mingkuan Yuan
Knowledge in source domain cannot be directly transferred to both two different modalities in target domain, and the inherent cross-modal correlation contained in target domain provides key hints for cross-modal retrieval which should be preserved during transfer process.
no code implementations • 14 Apr 2017 • Jinwei Qi, Xin Huang, Yuxin Peng
Motivated by the strong ability of deep neural network in feature representation and comparison functions learning, we propose the Unified Network for Cross-media Similarity Metric (UNCSM) to associate cross-media shared representation learning with distance metric in a unified framework.
no code implementations • 10 Apr 2017 • Xiangteng He, Yuxin Peng
Most existing fine-grained image classification methods generally learn part detection models to obtain the semantic parts for better classification accuracy.
1 code implementation • 6 Apr 2017 • Yuxin Peng, Xiangteng He, Junjie Zhao
Both are jointly employed to exploit the subtle and local differences for distinguishing the subcategories.
no code implementations • 23 Mar 2017 • Yunzhen Zhao, Yuxin Peng
Then two streams of 3D CNN are trained individually for raw frames and optical flow on salient areas, and another 2D CNN is trained for raw frames on non-salient areas.
no code implementations • 21 Mar 2017 • Xin Huang, Yuxin Peng
The quadruplet ranking loss can model the semantically similar and dissimilar constraints to preserve cross-modal relative similarity ranking information.
no code implementations • 8 Dec 2016 • Jian Zhang, Yuxin Peng
On the other hand, different hash bits actually contribute to the image retrieval differently, and treating them equally greatly affects the retrieval accuracy of image.
no code implementations • 28 Jul 2016 • Jian Zhang, Yuxin Peng
(2) A semi-supervised deep hashing network is designed to extensively exploit both labeled and unlabeled data, in which we propose an online graph construction method to benefit from the evolving deep features during training to better capture semantic neighbors.
no code implementations • CVPR 2015 • Tianjun Xiao, Yichong Xu, Kuiyuan Yang, Jiaxing Zhang, Yuxin Peng, Zheng Zhang
Our pipeline integrates three types of attention: the bottom-up attention that propose candidate patches, the object-level top-down attention that selects relevant patches to a certain object, and the part-level top-down attention that localizes discriminative parts.
no code implementations • 22 Sep 2011 • Zhiwu Lu, Horace H. S. Ip, Yuxin Peng
This paper presents a novel pairwise constraint propagation approach by decomposing the challenging constraint propagation problem into a set of independent semi-supervised learning subproblems which can be solved in quadratic time using label propagation based on k-nearest neighbor graphs.