Search Results for author: Xu Yang

Found 114 papers, 46 papers with code

Learning Progressive Joint Propagation for Human Motion Prediction

no code implementations ECCV 2020 Yujun Cai, Lin Huang, Yiwei Wang, Tat-Jen Cham, Jianfei Cai, Junsong Yuan, Jun Liu, Xu Yang, Yiheng Zhu, Xiaohui Shen, Ding Liu, Jing Liu, Nadia Magnenat Thalmann

Last, in order to incorporate a general motion space for high-quality prediction, we build a memory-based dictionary, which aims to preserve the global motion patterns in training data to guide the predictions.

Human motion prediction motion prediction +1

Distribution-Conditional Generation: From Class Distribution to Creative Generation

no code implementations6 May 2025 Fu Feng, Yucheng Xie, Xu Yang, Jing Wang, Xin Geng

Building on this, we propose DisTok, an encoder-decoder framework that maps class distributions into a latent space and decodes them into tokens of creative concept.

Image Generation

Enhancing Multimodal In-Context Learning for Image Classification through Coreset Optimization

no code implementations19 Apr 2025 Huiyi Chen, Jiawei Peng, Kaihua Tang, Xin Geng, Xu Yang

By leveraging untapped samples from the support set, we update the keys of selected coreset samples, enabling the randomly initialized coreset to evolve into a more informative coreset under low computational cost.

Fine-Grained Image Classification In-Context Learning +2

A Graph-Enhanced DeepONet Approach for Real-Time Estimating Hydrogen-Enriched Natural Gas Flow under Variable Operations

no code implementations9 Apr 2025 Sicheng Liu, Hongchang Huang, Bo Yang, Mingxuan Cai, Xu Yang, Xinping Guan

Second, a graph-enhance branch network is proposed to incorporate pipeline topology, improving the estimation accuracy in large-scale pipeline networks.

PlatMetaX: An Integrated MATLAB platform for Meta-Black-Box Optimization

1 code implementation26 Mar 2025 Xu Yang, Rui Wang, Kaiwen Li, Wenhua Li, Tao Zhang, Fujun He

The landscape of optimization problems has become increasingly complex, necessitating the development of advanced optimization techniques.

Meta-Learning

DTA: Dual Temporal-channel-wise Attention for Spiking Neural Networks

1 code implementation13 Mar 2025 Minje Kim, Minjun Kim, Xu Yang

To the best of our knowledge, this is the first attempt to concentrate on both the correlation and dependency of temporal-channel using both identical and non-identical attention operations.

LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL

1 code implementation10 Mar 2025 Yingzhe Peng, Gongrui Zhang, Miaosen Zhang, Zhiyuan You, Jie Liu, Qipeng Zhu, Kai Yang, Xingzhong Xu, Xin Geng, Xu Yang

Enhancing reasoning in Large Multimodal Models (LMMs) faces unique challenges from the complex interplay between visual perception and logical reasoning, particularly in compact 3B-parameter architectures where architectural constraints limit reasoning capacity and modality alignment.

Logical Reasoning Multimodal Reasoning +1

AppAgentX: Evolving GUI Agents as Proficient Smartphone Users

no code implementations4 Mar 2025 Wenjia Jiang, Yangyang Zhuang, Chenxi Song, Xu Yang, Chi Zhang

This allows the agent to focus on tasks requiring more complex reasoning, while simplifying routine actions.

Speculative Ensemble: Fast Large Language Model Ensemble via Speculation

1 code implementation1 Feb 2025 Jiale Fu, Yuchu Jiang, Junkai Chen, Jiaming Fan, Xin Geng, Xu Yang

Ensemble methods enhance Large Language Models (LLMs) by combining multiple models but suffer from high computational costs.

Language Modeling Language Modelling +1

Reinforcement learning Based Automated Design of Differential Evolution Algorithm for Black-box Optimization

no code implementations22 Jan 2025 Xu Yang, Rui Wang, Kaiwen Li, Ling Wang

To address this challenge, we introduce a novel framework that employs reinforcement learning (RL) to automatically design DE for black-box optimization through meta-learning.

Evolutionary Algorithms Meta-Learning +1

STHFL: Spatio-Temporal Heterogeneous Federated Learning

no code implementations10 Jan 2025 Shunxin Guo, Hongsong Wang, Shuxia Lin, Xu Yang, Xin Geng

Federated learning is a new framework that protects data privacy and allows multiple devices to cooperate in training machine learning models.

Federated Learning

Video Repurposing from User Generated Content: A Large-scale Dataset and Benchmark

1 code implementation12 Dec 2024 Yongliang Wu, Wenbo Zhu, Jiawang Cao, Yi Lu, Bozheng Li, Weiheng Chi, Zihan Qiu, Lirian Su, Haolin Zheng, Jay Wu, Xu Yang

The demand for producing short-form videos for sharing on social media platforms has experienced significant growth in recent times.

Highlight Detection Video Summarization

BPQP: A Differentiable Convex Optimization Framework for Efficient End-to-End Learning

no code implementations28 Nov 2024 Jianming Pan, Zeqi Ye, Xiao Yang, Xu Yang, Weiqing Liu, Lewen Wang, Jiang Bian

This reformulation enables the use of first-order optimization algorithms in calculating the backward pass gradients, allowing our framework to potentially utilize any state-of-the-art solver.

Enhancing DP-SGD through Non-monotonous Adaptive Scaling Gradient Weight

no code implementations5 Nov 2024 Tao Huang, Qingyu Huang, Xin Shi, Jiayang Meng, Guolong Zheng, Xu Yang, Xun Yi

In this paper, we introduce an enhanced version of DP-SGD, named Differentially Private Per-sample Adaptive Scaling Clipping (DP-PSASC).

Gradient-Guided Conditional Diffusion Models for Private Image Reconstruction: Analyzing Adversarial Impacts of Differential Privacy and Denoising

no code implementations5 Nov 2024 Tao Huang, Jiayang Meng, Hong Chen, Guolong Zheng, Xu Yang, Xun Yi, Hua Wang

We investigate the construction of gradient-guided conditional diffusion models for reconstructing private images, focusing on the adversarial interplay between differential privacy noise and the denoising capabilities of diffusion models.

Denoising Image Generation +1

Navigating the Unknown: A Chat-Based Collaborative Interface for Personalized Exploratory Tasks

no code implementations31 Oct 2024 Yingzhe Peng, Xiaoting Qin, Zhiyang Zhang, Jue Zhang, QIngwei Lin, Xu Yang, Dongmei Zhang, Saravan Rajmohan, Qi Zhang

The rise of large language models (LLMs) has revolutionized user interactions with knowledge-based systems, enabling chatbots to synthesize vast amounts of information and assist with complex, exploratory tasks.

Chatbot

Optimal Hardening Strategy for Electricity-Hydrogen Networks with Hydrogen Leakage Risk Control against Extreme Weather

no code implementations27 Oct 2024 Sicheng Liu, Bo Yang, Xin Li, Xu Yang, Zhaojian Wang, Dafeng Zhu, Xinping Guan

However, for electricity-hydrogen distribution networks (EHDNs), the leakage risk of hydrogen should be controlled to avoid severe incidents such as explosions.

Computational Efficiency

Flexible Operation of Electricity-HCNG Networks with Variable Hydrogen Fraction: A Distributionally Robust Joint Chance-Constrained Approach

no code implementations13 Oct 2024 Sicheng Liu, Bo Yang, Xu Yang, Xin Li, Zhaojian Wang, Xinping Guan

Hydrogen-enriched compressed natural gas (HCNG) is a promising way to utilize surplus renewable energy through hydrogen electrolysis and blending it into natural gas.

Boosting Open-Vocabulary Object Detection by Handling Background Samples

no code implementations11 Oct 2024 Ruizhe Zeng, Lu Zhang, Xu Yang, Zhiyong Liu

This limitation results in suboptimal performance for open-vocabulary detectors that rely on CLIP when processing background samples.

object-detection Open-vocabulary object detection +2

GraphIC: A Graph-Based In-Context Example Retrieval Model for Multi-Step Reasoning

no code implementations3 Oct 2024 Jiale Fu, Yaqing Wang, Simeng Han, Jiaming Fan, Chen Si, Xu Yang

However, the effectiveness of ICL heavily relies on the selection of ICEs, and conventional text-based embedding methods are often inadequate for tasks that require multi-step reasoning, such as mathematical and logical problem solving.

Code Generation In-Context Learning +3

First Place Solution to the Multiple-choice Video QA Track of The Second Perception Test Challenge

no code implementations20 Sep 2024 Yingzhe Peng, Yixiao Yuan, Zitian Ao, Huapeng Zhou, Kangqi Wang, Qipeng Zhu, Xu Yang

In this report, we present our first-place solution to the Multiple-choice Video Question Answering (QA) track of The Second Perception Test Challenge.

Multiple-choice Question Answering +2

Exploring RAG-based Vulnerability Augmentation with LLMs

1 code implementation7 Aug 2024 Seyed Shayan Daneshvar, Yu Nong, Xu Yang, Shaowei Wang, Haipeng Cai

More specifically, we explore three strategies to augment both single and multi-statement vulnerabilities, with LLMs, namely Mutation, Injection, and Extension.

 Ranked #1 on Vulnerability Detection on VulScribeR (using extra training data)

Code Generation Data Augmentation +2

Collaborative Evolving Strategy for Automatic Data-Centric Development

no code implementations26 Jul 2024 Xu Yang, Haotian Chen, Wenjun Feng, Haoxue Wang, Zeqi Ye, Xinjie Shen, Xiao Yang, Shizhao Sun, Weiqing Liu, Jiang Bian

By leveraging the strong complex problem-solving capabilities of large language models (LLMs), we propose an LLM-based autonomous agent, equipped with a strategy named Collaborative Knowledge-STudying-Enhanced Evolution by Retrieval (Co-STEER), to simultaneously address all the challenges.

Scheduling

Generalized Averaging Method for Power Electronics Modeling from DC to above Half the Switching Frequency

no code implementations27 Jun 2024 Hongchang Li, Kangping Wang, Jingyang Fang, Wenjie Chen, Xu Yang

Modeling power electronic converters at frequencies close to or above half the switching frequency has been difficult due to the time-variant and discontinuous switching actions.

Zero-Shot Long-Form Video Understanding through Screenplay

no code implementations25 Jun 2024 Yongliang Wu, Bozheng Li, Jiawang Cao, Wenbo Zhu, Yi Lu, Weiheng Chi, Chuyun Xie, Haolin Zheng, Ziyue Su, Jay Wu, Xu Yang

The Long-form Video Question-Answering task requires the comprehension and analysis of extended video content to respond accurately to questions by utilizing both temporal and contextual information.

Form Question Answering +2

SimClone: Detecting Tabular Data Clones using Value Similarity

no code implementations24 Jun 2024 Xu Yang, Gopi Krishnan Rajbahadur, Dayi Lin, Shaowei Wang, Zhen Ming, Jiang

In this paper, we propose a novel method called SimClone for data clone detection in tabular datasets without relying on structural information.

Clone Detection

Machine Unlearning with Minimal Gradient Dependence for High Unlearning Ratios

no code implementations24 Jun 2024 Tao Huang, Ziyang Chen, Jiayang Meng, Qingyu Huang, Xu Yang, Xun Yi, Ibrahim Khalil

This lightweight, scalable method significantly enhances model accuracy and strengthens resistance to membership inference attacks.

Machine Unlearning

LIVE: Learnable In-Context Vector for Visual Question Answering

1 code implementation19 Jun 2024 Yingzhe Peng, Chenduo Hao, Xu Yang, Jiawei Peng, Xinting Hu, Xin Geng

However, applying ICL usually faces two major challenges: 1) using more ICDs will largely increase the inference time and 2) the performance is sensitive to the selection of ICDs.

In-Context Learning Question Answering +1

Free Performance Gain from Mixing Multiple Partially Labeled Samples in Multi-label Image Classification

no code implementations24 May 2024 Chak Fong Chong, Jielong Guo, Xu Yang, Wei Ke, Yapeng Wang

However, the powerful Mixup sample-mixing data augmentation cannot be well utilized to address this challenge, as it cannot perform linear interpolation on the unknown labels to construct augmented samples.

Benchmarking Data Augmentation +2

Exploring the Distinctiveness and Fidelity of the Descriptions Generated by Large Vision-Language Models

no code implementations26 Apr 2024 Yuhang Huang, Zihan Wu, Chongyang Gao, Jiawei Peng, Xu Yang

Large Vision-Language Models (LVLMs) are gaining traction for their remarkable ability to process and integrate visual and textual data.

Retrieval

Exploring Learngene via Stage-wise Weight Sharing for Initializing Variable-sized Models

no code implementations25 Apr 2024 Shi-Yu Xia, Wenxuan Zhu, Xu Yang, Xin Geng

When initializing variable-sized models adapting for different resource constraints, SWS achieves better results while reducing around 20x parameters stored to initialize these models and around 10x pre-training costs, in contrast to the pre-training and fine-tuning approach.

FedMPQ: Secure and Communication-Efficient Federated Learning with Multi-codebook Product Quantization

no code implementations21 Apr 2024 Xu Yang, Jiapeng Zhang, Qifeng Zhang, Zhuo Tang

In federated learning, particularly in cross-device scenarios, secure aggregation has recently gained popularity as it effectively defends against inference attacks by malicious aggregators.

Federated Learning Quantization

Towards Data-Centric Automatic R&D

no code implementations17 Apr 2024 Haotian Chen, Xinjie Shen, Zeqi Ye, Wenjun Feng, Haoxue Wang, Xiao Yang, Xu Yang, Weiqing Liu, Jiang Bian

We appeal to future work to take developing techniques for tackling automatic R&D into consideration, thus bringing the opportunities of the potential revolutionary upgrade to human productivity.

Language Modelling Large Language Model +1

Texture-Preserving Diffusion Models for High-Fidelity Virtual Try-On

1 code implementation CVPR 2024 Xu Yang, Changxing Ding, Zhibin Hong, Junhao Huang, Jin Tao, Xiangmin Xu

Second, we propose a novel diffusion-based method that predicts a precise inpainting mask based on the person and reference garment images, further enhancing the reliability of the try-on results.

Denoising Image Generation +1

DA-PFL: Dynamic Affinity Aggregation for Personalized Federated Learning

no code implementations14 Mar 2024 Xu Yang, Jiyuan Feng, Songyue Guo, Ye Wang, Ye Ding, Binxing Fang, Qing Liao

In this paper, we propose a novel Dynamic Affinity-based Personalized Federated Learning model (DA-PFL) to alleviate the class imbalanced problem during federated learning.

Personalized Federated Learning

FedHCDR: Federated Cross-Domain Recommendation with Hypergraph Signal Decoupling

1 code implementation5 Mar 2024 Hongyu Zhang, Dongyi Zheng, Lin Zhong, Xu Yang, Jiyuan Feng, Yunqing Feng, Qing Liao

Specifically, to address the data heterogeneity across domains, we introduce an approach called hypergraph signal decoupling (HSD) to decouple the user features into domain-exclusive and domain-shared features.

Contrastive Learning Data Augmentation +7

MemoNav: Working Memory Model for Visual Navigation

1 code implementation CVPR 2024 Hongxin Li, Zeyu Wang, Xu Yang, Yuran Yang, Shuqi Mei, Zhaoxiang Zhang

Subsequently, a graph attention module encodes the retained STM and the LTM to generate working memory (WM) which contains the scene features essential for efficient navigation.

Decision Making Graph Attention +3

A Lightweight Inception Boosted U-Net Neural Network for Routability Prediction

1 code implementation7 Feb 2024 Hailiang Li, Yan Huo, Yan Wang, Xu Yang, Miaohui Hao, Xiao Wang

As the modern CPU, GPU, and NPU chip design complexity and transistor counts keep increasing, and with the relentless shrinking of semiconductor technology nodes to nearly 1 nanometer, the placement and routing have gradually become the two most pivotal processes in modern very-large-scale-integrated (VLSI) circuit back-end design.

Avg SSIM

Do You Guys Want to Dance: Zero-Shot Compositional Human Dance Generation with Multiple Persons

no code implementations24 Jan 2024 Zhe Xu, Kun Wei, Xu Yang, Cheng Deng

Human dance generation (HDG) aims to synthesize realistic videos from images and sequences of driving poses.

Long-Tail Class Incremental Learning via Independent Sub-prototype Construction

no code implementations CVPR 2024 Xi Wang, Xu Yang, Jie Yin, Kun Wei, Cheng Deng

In this paper we constructed two parallel spaces simultaneously: 1) Sub-prototype space and 2) Reminiscence space to learn robust representations while alleviating forgetfulness.

class-incremental learning Class Incremental Learning +2

Unveiling the Unknown: Unleashing the Power of Unknown to Known in Open-Set Source-Free Domain Adaptation

1 code implementation CVPR 2024 Fuli Wan, Han Zhao, Xu Yang, Cheng Deng

In contrast this paper advocates that exploring unknown classes can better identify known ones and proposes a domain adaptation model to transfer knowledge on known and unknown classes jointly.

Source-Free Domain Adaptation Transfer Learning

Lever LM: Configuring In-Context Sequence to Lever Large Vision Language Models

2 code implementations15 Dec 2023 Xu Yang, Yingzhe Peng, Haoxuan Ma, Shuo Xu, Chi Zhang, Yucheng Han, Hanwang Zhang

As Archimedes famously said, ``Give me a lever long enough and a fulcrum on which to place it, and I shall move the world'', in this study, we propose to use a tiny Language Model (LM), \eg, a Transformer with 67M parameters, to lever much larger Vision-Language Models (LVLMs) with 9B parameters.

Image Captioning In-Context Learning +4

Building Variable-sized Models via Learngene Pool

no code implementations10 Dec 2023 Boyu Shi, Shiyu Xia, Xu Yang, Haokun Chen, Zhiqiang Kou, Xin Geng

To overcome these challenges, motivated by the recently proposed Learngene framework, we propose a novel method called Learngene Pool.

Transformer as Linear Expansion of Learngene

1 code implementation9 Dec 2023 Shiyu Xia, Miaosen Zhang, Xu Yang, Ruiming Chen, Haokun Chen, Xin Geng

Under the situation where we need to produce models of varying depths adapting for different resource constraints, TLEG achieves comparable results while reducing around 19x parameters stored to initialize these models and around 5x pre-training costs, in contrast to the pre-training and fine-tuning approach.

How to Configure Good In-Context Sequence for Visual Question Answering

1 code implementation CVPR 2024 Li Li, Jiawei Peng, Huiyi Chen, Chongyang Gao, Xu Yang

Inspired by the success of Large Language Models in dealing with new tasks via In-Context Learning (ICL) in NLP, researchers have also developed Large Vision-Language Models (LVLMs) with ICL capabilities.

In-Context Learning Question Answering +2

Manipulating the Label Space for In-Context Classification

no code implementations1 Dec 2023 Haokun Chen, Xu Yang, Yuhang Huang, Zihan Wu, Jing Wang, Xin Geng

Specifically, using our approach on ImageNet, we increase accuracy from 74. 70\% in a 4-shot setting to 76. 21\% with just 2 shots.

Classification Contrastive Learning +2

Category-Wise Fine-Tuning for Image Multi-label Classification with Partial Labels

2 code implementations International Conference on Neural Information Processing 2023 Chak Fong Chong, Xu Yang, Tenglong Wang, Wei Ke, Yapeng Wang

A single model submitted to the competition server for the official evaluation achieves mAUC 91. 82% on the test set, which is the highest single model score in the leaderboard and literature.

Binary Classification Multi-Label Classification +1

Rethinking Residual Connection in Training Large-Scale Spiking Neural Networks

no code implementations9 Nov 2023 Yudong Li, Yunlin Lei, Xu Yang

Spiking Neural Network (SNN) is known as the most famous brain-inspired model, but the non-differentiable spiking mechanism makes it hard to train large-scale SNNs.

Rethinking Evaluation Metrics of Open-Vocabulary Segmentaion

1 code implementation6 Nov 2023 Hao Zhou, Tiancheng Shen, Xu Yang, Hai Huang, Xiangtai Li, Lu Qi, Ming-Hsuan Yang

We benchmarked the proposed evaluation metrics on 12 open-vocabulary methods of three segmentation tasks.

Segmentation

Leveraging Large Language Model for Automatic Evolving of Industrial Data-Centric R&D Cycle

no code implementations17 Oct 2023 Xu Yang, Xiao Yang, Weiqing Liu, Jinhui Li, Peng Yu, Zeqi Ye, Jiang Bian

In the wake of relentless digital transformation, data-driven solutions are emerging as powerful tools to address multifarious industrial tasks such as forecasting, anomaly detection, planning, and even complex decision-making.

Anomaly Detection Decision Making +3

SeisT: A foundational deep learning model for earthquake monitoring tasks

1 code implementation2 Oct 2023 Sen Li, Xu Yang, Anye Cao, Changbin Wang, Yaoqi Liu, Yapeng Liu, Qiang Niu

The most significant improvements, in comparison to existing models, are observed in phase-P picking, phase-S picking, and magnitude estimation, with gains of 1. 7%, 9. 5%, and 8. 0%, respectively.

Deep Learning Out-of-Distribution Generalization

FedDCSR: Federated Cross-domain Sequential Recommendation via Disentangled Representation Learning

1 code implementation15 Sep 2023 Hongyu Zhang, Dongyi Zheng, Xu Yang, Jiyuan Feng, Qing Liao

Nonetheless, the sequence feature heterogeneity across different domains significantly impacts the overall performance of FL.

Data Augmentation Disentanglement +3

Temporal Difference Learning for High-Dimensional PIDEs with Jumps

no code implementations6 Jul 2023 Liwei Lu, Hailong Guo, Xu Yang, Yi Zhu

In this paper, we propose a deep learning framework for solving high-dimensional partial integro-differential equations (PIDEs) based on the temporal difference learning.

Genes in Intelligent Agents

1 code implementation17 Jun 2023 Fu Feng, Jing Wang, Xu Yang, Xin Geng

Inspired by the biological intelligence, artificial intelligence (AI) has devoted to building the machine intelligence.

reinforcement-learning Reinforcement Learning +1

Exploring Diverse In-Context Configurations for Image Captioning

1 code implementation NeurIPS 2023 Xu Yang, Yongliang Wu, Mingzhuo Yang, Haokun Chen, Xin Geng

After discovering that Language Models (LMs) can be good in-context few-shot learners, numerous strategies have been proposed to optimize in-context sequence configurations.

Image Captioning In-Context Learning

Learngene: Inheriting Condensed Knowledge from the Ancestry Model to Descendant Models

no code implementations3 May 2023 Qiufeng Wang, Xu Yang, Shuxia Lin, Jing Wang, Xin Geng

(i) Accumulating: the knowledge is accumulated during the continuous learning of an ancestry model.

Lifelong learning

Transforming Visual Scene Graphs to Image Captions

1 code implementation3 May 2023 Xu Yang, Jiawei Peng, Zihua Wang, Haiyang Xu, Qinghao Ye, Chenliang Li, Songfang Huang, Fei Huang, Zhangzikang Li, Yu Zhang

In TSG, we apply multi-head attention (MHA) to design the Graph Neural Network (GNN) for embedding scene graphs.

Attribute Decoder +3

SC-ML: Self-supervised Counterfactual Metric Learning for Debiased Visual Question Answering

no code implementations4 Apr 2023 Xinyao Shu, ShiYang Yan, Xu Yang, Ziheng Wu, Zhongfeng Chen, Zhenyu Lu

Unfortunately, language bias is a common problem in VQA, which refers to the model generating answers only by associating with the questions while ignoring the visual content, resulting in biased results.

counterfactual Metric Learning +2

Spatial Attention and Syntax Rule Enhanced Tree Decoder for Offine Handwritten Mathematical Expression Recognition

no code implementations13 Mar 2023 Zihao Lin, Jinrong Li, Fan Yang, Shuangping Huang, Xu Yang, Jianmin Lin, Ming Yang

In this paper, we propose a novel model called Spatial Attention and Syntax Rule Enhanced Tree Decoder (SS-TD), which is equipped with spatial attention mechanism to alleviate the prediction error of tree structure and use syntax masks (obtained from the transformation of syntax rules) to constrain the occurrence of ungrammatical mathematical expression.

Decoder

Learning Trajectory-Word Alignments for Video-Language Tasks

no code implementations ICCV 2023 Xu Yang, Zhangzikang Li, Haiyang Xu, Hanwang Zhang, Qinghao Ye, Chenliang Li, Ming Yan, Yu Zhang, Fei Huang, Songfang Huang

To amend this, we propose a novel TW-BERT to learn Trajectory-Word alignment by a newly designed trajectory-to-word (T2W) attention for solving video-language tasks.

Question Answering Retrieval +4

Adaptively Clustering Neighbor Elements for Image-Text Generation

1 code implementation5 Jan 2023 Zihua Wang, Xu Yang, Hanwang Zhang, Haiyang Xu, Ming Yan, Fei Huang, Yu Zhang

In this gradual clustering process, a parsing tree is generated which embeds the hierarchical knowledge of the input sequence.

Clustering Decoder +5

Spikeformer: A Novel Architecture for Training High-Performance Low-Latency Spiking Neural Network

1 code implementation19 Nov 2022 Yudong Li, Yunlin Lei, Xu Yang

Spiking neural networks (SNNs) have made great progress on both performance and efficiency over the last few years, but their unique working pattern makes it hard to train a high-performance low-latency SNN. Thus the development of SNNs still lags behind traditional artificial neural networks (ANNs). To compensate this gap, many extraordinary works have been proposed. Nevertheless, these works are mainly based on the same kind of network structure (i. e. CNN) and their performance is worse than their ANN counterparts, which limits the applications of SNNs. To this end, we propose a novel Transformer-based SNN, termed "Spikeformer", which outperforms its ANN counterpart on both static dataset and neuromorphic dataset and may be an alternative architecture to CNN for training high-performance SNNs. First, to deal with the problem of "data hungry" and the unstable training period exhibited in the vanilla model, we design the Convolutional Tokenizer (CT) module, which improves the accuracy of the original model on DVS-Gesture by more than 16%. Besides, in order to better incorporate the attention mechanism inside Transformer and the spatio-temporal information inherent to SNN, we adopt spatio-temporal attention (STA) instead of spatial-wise or temporal-wise attention. With our proposed method, we achieve competitive or state-of-the-art (SOTA) SNN performance on DVS-CIFAR10, DVS-Gesture, and ImageNet datasets with the least simulation time steps (i. e. low latency). Remarkably, our Spikeformer outperforms other SNNs on ImageNet by a large margin (i. e. more than 5%) and even outperforms its ANN counterpart by 3. 1% and 2. 2% on DVS-Gesture and ImageNet respectively, indicating that Spikeformer is a promising architecture for training large-scale SNNs and may be more suitable for SNNs compared to CNN. We believe that this work shall keep the development of SNNs in step with ANNs as much as possible. Code will be available.

Learning to Collocate Visual-Linguistic Neural Modules for Image Captioning

1 code implementation4 Oct 2022 Xu Yang, Hanwang Zhang, Chongyang Gao, Jianfei Cai

This is because the language is only partially observable, for which we need to dynamically collocate the modules during the process of image captioning.

Image Captioning Sentence +2

MemoNav: Selecting Informative Memories for Visual Navigation

no code implementations20 Aug 2022 Hongxin Li, Xu Yang, Yuran Yang, Shuqi Mei, Zhaoxiang Zhang

To address this limitation, we present the MemoNav, a novel memory mechanism for image-goal navigation, which retains the agent's informative short-term memory and long-term memory to improve the navigation performance on a multi-goal task.

Action Generation Graph Attention +2

Automatically Discovering Novel Visual Categories with Self-supervised Prototype Learning

1 code implementation1 Aug 2022 Lu Zhang, Lu Qi, Xu Yang, Hong Qiao, Ming-Hsuan Yang, Zhiyong Liu

In the first stage, we obtain a robust feature extractor, which could serve for all images with base and novel categories.

Representation Learning Self-Supervised Learning

Siamese Contrastive Embedding Network for Compositional Zero-Shot Learning

1 code implementation CVPR 2022 Xiangyu Li, Xu Yang, Kun Wei, Cheng Deng, Muli Yang

Some methods recognize state and object with two trained classifiers, ignoring the impact of the interaction between object and state; the other methods try to learn the joint representation of the state-object compositions, leading to the domain gap between seen and unseen composition sets.

Compositional Zero-Shot Learning Diversity +1

iExam: A Novel Online Exam Monitoring and Analysis System Based on Face Detection and Recognition

1 code implementation27 Jun 2022 Xu Yang, Daoyuan Wu, Xiao Yi, Jimmy H. M. Lee, Tan Lee

In this paper, we propose iExam, an intelligent online exam monitoring and analysis system that can not only use face detection to assist invigilators in real-time student identification, but also be able to detect common abnormal behaviors (including face disappearing, rotating faces, and replacing with a different person during the exams) via a face recognition-based post-exam video analysis.

Face Detection Face Recognition +2

Unseen Object Instance Segmentation with Fully Test-time RGB-D Embeddings Adaptation

no code implementations21 Apr 2022 Lu Zhang, Siqi Zhang, Xu Yang, Hong Qiao, Zhiyong Liu

In this paper, we emphasize the adaptation process across sim2real domains and model it as a learning problem on the BatchNorm parameters of a simulation-trained model.

Knowledge Distillation Segmentation +4

Weakly Aligned Feature Fusion for Multimodal Object Detection

no code implementations21 Apr 2022 Lu Zhang, Zhiyong Liu, Xiangyu Zhu, Zhan Song, Xu Yang, Zhen Lei, Hong Qiao

In this article, we propose a general multimodal detector named aligned region CNN (AR-CNN) to tackle the position shift problem.

Object object-detection +2

Show, Deconfound and Tell: Image Captioning With Causal Inference

1 code implementation CVPR 2022 Bing Liu, Dong Wang, Xu Yang, Yong Zhou, Rui Yao, Zhiwen Shao, Jiaqi Zhao

In the encoding stage, the IOD is able to disentangle the region-based visual features by deconfounding the visual confounder.

Causal Inference Decoder +1

Not Just Selection, but Exploration: Online Class-Incremental Continual Learning via Dual View Consistency

1 code implementation CVPR 2022 Yanan Gu, Xu Yang, Kun Wei, Cheng Deng

Unfortunately, these methods only focus on selecting samples from the memory bank for replay and ignore the adequate exploration of semantic information in the single-pass data stream, leading to poor classification accuracy.

Continual Learning

Towards End-to-End Image Compression and Analysis with Transformers

1 code implementation17 Dec 2021 Yuanchao Bai, Xu Yang, Xianming Liu, Junjun Jiang, YaoWei Wang, Xiangyang Ji, Wen Gao

Meanwhile, we propose a feature aggregation module to fuse the compressed features with the selected intermediate features of the Transformer, and feed the aggregated features to a deconvolutional neural network for image reconstruction.

Classification Image Classification +3

Auto-Encoding Score Distribution Regression for Action Quality Assessment

3 code implementations22 Nov 2021 Boyu Zhang, Jiayuan Chen, Yinfei Xu, HUI ZHANG, Xu Yang, Xin Geng

Traditionally, AQA is treated as a regression problem to learn the underlying mappings between videos and action scores.

Action Quality Assessment regression

Sliding Sequential CVAE with Time Variant Socially-aware Rethinking for Trajectory Prediction

no code implementations28 Oct 2021 Hao Zhou, Dongchun Ren, Xu Yang, Mingyu Fan, Hai Huang

First, with the continuation of time, the prediction error at each time step increases significantly, causing the final displacement error to be impossible to ignore.

Autonomous Driving Pedestrian Trajectory Prediction +4

Can AI detect pain and express pain empathy? A review from emotion recognition and a human-centered AI perspective

no code implementations8 Oct 2021 Siqi Cao, Di Fu, Xu Yang, Stefan Wermter, Xun Liu, Haiyan Wu

Furthermore, we discuss challenges for responsible evaluation of cognitive methods and computational techniques and show approaches to future work to contribute to affective assistants capable of empathy.

Emotion Recognition

Text-Driven Image Manipulation via Semantic-Aware Knowledge Transfer

no code implementations29 Sep 2021 Ziqi Zhang, Cheng Deng, Kun Wei, Xu Yang

And on this basis, a novel attribute transfer method, named semantic directional decomposition network (SDD-Net), is proposed to achieve semantic-level facial attribute transfer by latent semantic direction decomposition, improving the interpretability and editability of our method.

Attribute Image Manipulation +1

Open Set Domain Adaptation with Zero-shot Learning on Graph

no code implementations29 Sep 2021 Xinyue Zhang, Xu Yang, Zhi-Yong Liu

Thus the classification ability of the source domain is transferred to the target domain and the model can distinguish the unknown classes with prior knowledge.

Domain Adaptation Zero-Shot Learning

Auto-Parsing Network for Image Captioning and Visual Question Answering

no code implementations ICCV 2021 Xu Yang, Chongyang Gao, Hanwang Zhang, Jianfei Cai

We propose an Auto-Parsing Network (APN) to discover and exploit the input data's hidden tree structures for improving the effectiveness of the Transformer-based vision-language systems.

Image Captioning Question Answering +1

Towards Unbiased Visual Emotion Recognition via Causal Intervention

1 code implementation26 Jul 2021 Yuedong Chen, Xu Yang, Tat-Jen Cham, Jianfei Cai

In this work, we scrutinize this problem from the perspective of causal inference, where such dataset characteristic is termed as a confounder which misleads the system to learn the spurious correlation.

Causal Inference Emotion Recognition

Nearest Neighbor Matching for Deep Clustering

1 code implementation CVPR 2021 Zhiyuan Dang, Cheng Deng, Xu Yang, Kun Wei, Heng Huang

Specifically, for the local level, we match the nearest neighbors based on batch embedded features, as for the global one, we match neighbors from overall embedded features.

Clustering Deep Clustering

SelfSAGCN: Self-Supervised Semantic Alignment for Graph Convolution Network

1 code implementation CVPR 2021 Xu Yang, Cheng Deng, Zhiyuan Dang, Kun Wei, Junchi Yan

Specifically, the Identity Aggregation is applied to extract semantic features from labeled nodes, the Semantic Alignment is utilized to align node features obtained from different aspects using the class central similarity.

Representation Learning

Doubly Contrastive Deep Clustering

1 code implementation9 Mar 2021 Zhiyuan Dang, Cheng Deng, Xu Yang, Heng Huang

In this paper, we present a novel Doubly Contrastive Deep Clustering (DCDC) framework, which constructs contrastive loss over both sample and class views to obtain more discriminative features and competitive results.

Clustering Contrastive Learning +2

Causal Attention for Vision-Language Tasks

no code implementations CVPR 2021 Xu Yang, Hanwang Zhang, GuoJun Qi, Jianfei Cai

Specifically, CATT is implemented as a combination of 1) In-Sample Attention (IS-ATT) and 2) Cross-Sample Attention (CS-ATT), where the latter forcibly brings other samples into every IS-ATT, mimicking the causal intervention.

A Distributed Implementation of Steady-State Kalman Filter

no code implementations26 Jan 2021 Jiaqi Yan, Xu Yang, Yilin Mo, Keyou You

This paper studies the distributed state estimation in sensor network, where $m$ sensors are deployed to infer the $n$-dimensional state of a linear time-invariant (LTI) Gaussian system.

Incremental Embedding Learning via Zero-Shot Translation

1 code implementation31 Dec 2020 Kun Wei, Cheng Deng, Xu Yang, Maosen Li

Different from traditional incremental classification networks, the semantic gap between the embedding spaces of two adjacent tasks is the main challenge for embedding networks under incremental learning setting.

Face Recognition Image Retrieval +4

Adversarial Learning for Robust Deep Clustering

1 code implementation NeurIPS 2020 Xu Yang, Cheng Deng, Kun Wei, Junchi Yan, Wei Liu

Meanwhile, we devise an adversarial attack strategy to explore samples that easily fool the clustering layers but do not impact the performance of the deep embedding.

Adversarial Attack Clustering +1

Cloud Cover and Aurora Contamination at Dome A in 2017 from KLCAM

no code implementations7 Oct 2020 Xu Yang, Zhaohui Shang, Keliang Hu, Yi Hu, Bin Ma, Yongjiang Wang, Zihuang Cao, Michael C. B. Ashley, Wei Wang

Dome A in Antarctica has many characteristics that make it an excellent site for astronomical observations, from the optical to the terahertz.

Instrumentation and Methods for Astrophysics

Finding It at Another Side: A Viewpoint-Adapted Matching Encoder for Change Captioning

no code implementations ECCV 2020 Xiangxi Shi, Xu Yang, Jiuxiang Gu, Shafiq Joty, Jianfei Cai

In this paper, we propose a novel visual encoder to explicitly distinguish viewpoint changes from semantic changes in the change captioning task.

Reinforcement Learning (RL)

Deconfounded Image Captioning: A Causal Retrospect

no code implementations9 Mar 2020 Xu Yang, Hanwang Zhang, Jianfei Cai

Dataset bias in vision-language tasks is becoming one of the main problems which hinders the progress of our community.

Causal Inference Image Captioning

Classical limit for the varying-mass Schrödinger equation with random inhomogeneities

no code implementations12 Feb 2020 Shi Chen, Qin Li, Xu Yang

The varying-mass Schr\"odinger equation (VMSE) has been successfully applied to model electronic properties of semiconductor hetero-structures, for example, quantum dots and quantum wells.

Numerical Analysis Numerical Analysis

Automated Pavement Crack Segmentation Using U-Net-based Convolutional Neural Network

no code implementations7 Jan 2020 Stephen L. H. Lau, Edwin K. P. Chong, Xu Yang, Xin Wang

In this paper, we propose a deep learning technique based on a convolutional neural network to perform segmentation tasks on pavement crack images.

Crack Segmentation Feature Engineering +2

mu-Forcing: Training Variational Recurrent Autoencoders for Text Generation

2 code implementations24 May 2019 Dayiheng Liu, Xu Yang, Feng He, YuanYuan Chen, Jiancheng Lv

It has been previously observed that training Variational Recurrent Autoencoders (VRAE) for text generation suffers from serious uninformative latent variables problem.

Language Modeling Language Modelling +1

Deep Spectral Clustering using Dual Autoencoder Network

no code implementations CVPR 2019 Xu Yang, Cheng Deng, Feng Zheng, Junchi Yan, Wei Liu

In this paper, we propose a joint learning framework for discriminative embedding and spectral clustering.

Clustering Deep Clustering +1

Learning to Collocate Neural Modules for Image Captioning

no code implementations ICCV 2019 Xu Yang, Hanwang Zhang, Jianfei Cai

To this end, we make the following technical contributions for CNM training: 1) compact module design --- one for function words and three for visual content words (eg, noun, adjective, and verb), 2) soft module fusion and multi-step module execution, robustifying the visual reasoning in partial observation, 3) a linguistic loss for module controller being faithful to part-of-speech collocations (eg, adjective is before noun).

Decoder Image Captioning +3

Auto-Encoding Scene Graphs for Image Captioning

2 code implementations CVPR 2019 Xu Yang, Kaihua Tang, Hanwang Zhang, Jianfei Cai

We propose Scene Graph Auto-Encoder (SGAE) that incorporates the language inductive bias into the encoder-decoder image captioning framework for more human-like captions.

Decoder Image Captioning +2

Face Photo Sketch Synthesis via Larger Patch and Multiresolution Spline

no code implementations19 Sep 2015 Xu Yang

In order to get a smoother sketch, we propose a new method to reduce such jagged parts and mottled points.

A Weighted Common Subgraph Matching Algorithm

no code implementations4 Nov 2014 Xu Yang, Hong Qiao, Zhi-Yong Liu

We propose a weighted common subgraph (WCS) matching algorithm to find the most similar subgraphs in two labeled weighted graphs.

Combinatorial Optimization Graph Matching

Cannot find the paper you are looking for? You can Submit a new open access paper.