Learning Progressive Joint Propagation for Human Motion Prediction

no code implementations ECCV 2020 Yujun Cai, Lin Huang, Yiwei Wang, Tat-Jen Cham, Jianfei Cai, Junsong Yuan, Jun Liu, Xu Yang, Yiheng Zhu, Xiaohui Shen, Ding Liu, Jing Liu, Nadia Magnenat Thalmann

Last, in order to incorporate a general motion space for high-quality prediction, we build a memory-based dictionary, which aims to preserve the global motion patterns in training data to guide the predictions.

A Lightweight Inception Boosted U-Net Neural Network for Routability Prediction

1 code implementation7 Feb 2024 Hailiang Li, Yan Huo, Yan Wang, Xu Yang, Miaohui Hao, Xiao Wang

As the modern CPU, GPU, and NPU chip design complexity and transistor counts keep increasing, and with the relentless shrinking of semiconductor technology nodes to nearly 1 nanometer, the placement and routing have gradually become the two most pivotal processes in modern very-large-scale-integrated (VLSI) circuit back-end design.


Do You Guys Want to Dance: Zero-Shot Compositional Human Dance Generation with Multiple Persons

no code implementations24 Jan 2024 Zhe Xu, Kun Wei, Xu Yang, Cheng Deng

Human dance generation (HDG) aims to synthesize realistic videos from images and sequences of driving poses.

ICD-LM: Configuring Vision-Language In-Context Demonstrations by Language Modeling

1 code implementation15 Dec 2023 Yingzhe Peng, Xu Yang, Haoxuan Ma, Shuo Xu, Chi Zhang, Yucheng Han, Hanwang Zhang

Moreover, during data construction, we use the LVLM intended for ICL implementation to validate the strength of each ICD sequence, resulting in a model-specific dataset and the ICD-LM trained by this dataset is also model-specific.

Building Variable-sized Models via Learngene Pool

no code implementations10 Dec 2023 Boyu Shi, Shiyu Xia, Xu Yang, Haokun Chen, Zhiqiang Kou, Xin Geng

To overcome these challenges, motivated by the recently proposed Learngene framework, we propose a novel method called Learngene Pool.

Transformer as Linear Expansion of Learngene

1 code implementation9 Dec 2023 Shiyu Xia, Miaosen Zhang, Xu Yang, Ruiming Chen, Haokun Chen, Xin Geng

Under the situation where we need to produce models of varying depths adapting for different resource constraints, TLEG achieves comparable results while reducing around 19x parameters stored to initialize these models and around 5x pre-training costs, in contrast to the pre-training and fine-tuning approach.

How to Configure Good In-Context Sequence for Visual Question Answering

1 code implementation4 Dec 2023 Li Li, Jiawei Peng, Huiyi Chen, Chongyang Gao, Xu Yang

Inspired by the success of Large Language Models in dealing with new tasks via In-Context Learning (ICL) in NLP, researchers have also developed Large Vision-Language Models (LVLMs) with ICL capabilities.

Manipulating the Label Space for In-Context Classification

no code implementations1 Dec 2023 Haokun Chen, Xu Yang, Yuhang Huang, Zihan Wu, Jing Wang, Xin Geng

Specifically, using our approach on ImageNet, we increase accuracy from 74. 70\% in a 4-shot setting to 76. 21\% with just 2 shots.

Category-Wise Fine-Tuning for Image Multi-label Classification with Partial Labels

2 code implementations International Conference on Neural Information Processing 2023 Chak Fong Chong, Xu Yang, Tenglong Wang, Wei Ke, Yapeng Wang

A single model submitted to the competition server for the official evaluation achieves mAUC 91. 82% on the test set, which is the highest single model score in the leaderboard and literature.

ChartLlama: A Multimodal LLM for Chart Understanding and Generation

no code implementations27 Nov 2023 Yucheng Han, Chi Zhang, Xin Chen, Xu Yang, Zhibin Wang, Gang Yu, Bin Fu, Hanwang Zhang

Next, we introduce ChartLlama, a multi-modal large language model that we've trained using our created dataset.

Rethinking Residual Connection in Training Large-Scale Spiking Neural Networks

no code implementations9 Nov 2023 Yudong Li, Yunlin Lei, Xu Yang

Spiking Neural Network (SNN) is known as the most famous brain-inspired model, but the non-differentiable spiking mechanism makes it hard to train large-scale SNNs.

Rethinking Evaluation Metrics of Open-Vocabulary Segmentaion

1 code implementation6 Nov 2023 Hao Zhou, Tiancheng Shen, Xu Yang, Hai Huang, Xiangtai Li, Lu Qi, Ming-Hsuan Yang

We benchmarked the proposed evaluation metrics on 12 open-vocabulary methods of three segmentation tasks.


Leveraging Large Language Model for Automatic Evolving of Industrial Data-Centric R&D Cycle

no code implementations17 Oct 2023 Xu Yang, Xiao Yang, Weiqing Liu, Jinhui Li, Peng Yu, Zeqi Ye, Jiang Bian

In the wake of relentless digital transformation, data-driven solutions are emerging as powerful tools to address multifarious industrial tasks such as forecasting, anomaly detection, planning, and even complex decision-making.

SeisT: A foundational deep learning model for earthquake monitoring tasks

1 code implementation2 Oct 2023 Sen Li, Xu Yang, Anye Cao, Changbin Wang, Yaoqi Liu, Yapeng Liu, Qiang Niu

The most significant improvements, in comparison to existing models, are observed in phase-P picking, phase-S picking, and magnitude estimation, with gains of 1. 7%, 9. 5%, and 8. 0%, respectively.

FedDCSR: Federated Cross-domain Sequential Recommendation via Disentangled Representation Learning

1 code implementation15 Sep 2023 Hongyu Zhang, Dongyi Zheng, Xu Yang, Jiyuan Feng, Qing Liao

Nonetheless, the sequence feature heterogeneity across different domains significantly impacts the overall performance of FL.

Temporal Difference Learning for High-Dimensional PIDEs with Jumps

no code implementations6 Jul 2023 Liwei Lu, Hailong Guo, Xu Yang, Yi Zhu

In this paper, we propose a deep learning framework for solving high-dimensional partial integro-differential equations (PIDEs) based on the temporal difference learning.

Genes in Intelligent Agents

1 code implementation17 Jun 2023 Fu Feng, Jing Wang, Xu Yang, Xin Geng

Inspired by the biological intelligence, artificial intelligence (AI) has devoted to building the machine intelligence.

Exploring Diverse In-Context Configurations for Image Captioning

1 code implementation NeurIPS 2023 Xu Yang, Yongliang Wu, Mingzhuo Yang, Haokun Chen, Xin Geng

After discovering that Language Models (LMs) can be good in-context few-shot learners, numerous strategies have been proposed to optimize in-context sequence configurations.

Learngene: Inheriting Condensed Knowledge from the Ancestry Model to Descendant Models

no code implementations3 May 2023 Qiufeng Wang, Xu Yang, Shuxia Lin, Jing Wang, Xin Geng

(i) Accumulating: the knowledge is accumulated during the continuous learning of an ancestry model.

Transforming Visual Scene Graphs to Image Captions

1 code implementation3 May 2023 Xu Yang, Jiawei Peng, Zihua Wang, Haiyang Xu, Qinghao Ye, Chenliang Li, Songfang Huang, Fei Huang, Zhangzikang Li, Yu Zhang

In TSG, we apply multi-head attention (MHA) to design the Graph Neural Network (GNN) for embedding scene graphs.

SC-ML: Self-supervised Counterfactual Metric Learning for Debiased Visual Question Answering

no code implementations4 Apr 2023 Xinyao Shu, ShiYang Yan, Xu Yang, Ziheng Wu, Zhongfeng Chen, Zhenyu Lu

Unfortunately, language bias is a common problem in VQA, which refers to the model generating answers only by associating with the questions while ignoring the visual content, resulting in biased results.

Spatial Attention and Syntax Rule Enhanced Tree Decoder for Offine Handwritten Mathematical Expression Recognition

no code implementations13 Mar 2023 Zihao Lin, Jinrong Li, Fan Yang, Shuangping Huang, Xu Yang, Jianmin Lin, Ming Yang

In this paper, we propose a novel model called Spatial Attention and Syntax Rule Enhanced Tree Decoder (SS-TD), which is equipped with spatial attention mechanism to alleviate the prediction error of tree structure and use syntax masks (obtained from the transformation of syntax rules) to constrain the occurrence of ungrammatical mathematical expression.

Learning Trajectory-Word Alignments for Video-Language Tasks

no code implementations ICCV 2023 Xu Yang, Zhangzikang Li, Haiyang Xu, Hanwang Zhang, Qinghao Ye, Chenliang Li, Ming Yan, Yu Zhang, Fei Huang, Songfang Huang

To amend this, we propose a novel TW-BERT to learn Trajectory-Word alignment by a newly designed trajectory-to-word (T2W) attention for solving video-language tasks.

Spikeformer: A Novel Architecture for Training High-Performance Low-Latency Spiking Neural Network

1 code implementation19 Nov 2022 Yudong Li, Yunlin Lei, Xu Yang

Spiking neural networks (SNNs) have made great progress on both performance and efficiency over the last few years, but their unique working pattern makes it hard to train a high-performance low-latency SNN. Thus the development of SNNs still lags behind traditional artificial neural networks (ANNs). To compensate this gap, many extraordinary works have been proposed. Nevertheless, these works are mainly based on the same kind of network structure (i. e. CNN) and their performance is worse than their ANN counterparts, which limits the applications of SNNs. To this end, we propose a novel Transformer-based SNN, termed "Spikeformer", which outperforms its ANN counterpart on both static dataset and neuromorphic dataset and may be an alternative architecture to CNN for training high-performance SNNs. First, to deal with the problem of "data hungry" and the unstable training period exhibited in the vanilla model, we design the Convolutional Tokenizer (CT) module, which improves the accuracy of the original model on DVS-Gesture by more than 16%. Besides, in order to better incorporate the attention mechanism inside Transformer and the spatio-temporal information inherent to SNN, we adopt spatio-temporal attention (STA) instead of spatial-wise or temporal-wise attention. With our proposed method, we achieve competitive or state-of-the-art (SOTA) SNN performance on DVS-CIFAR10, DVS-Gesture, and ImageNet datasets with the least simulation time steps (i. e. low latency). Remarkably, our Spikeformer outperforms other SNNs on ImageNet by a large margin (i. e. more than 5%) and even outperforms its ANN counterpart by 3. 1% and 2. 2% on DVS-Gesture and ImageNet respectively, indicating that Spikeformer is a promising architecture for training large-scale SNNs and may be more suitable for SNNs compared to CNN. We believe that this work shall keep the development of SNNs in step with ANNs as much as possible. Code will be available.

Learning to Collocate Visual-Linguistic Neural Modules for Image Captioning

1 code implementation4 Oct 2022 Xu Yang, Hanwang Zhang, Chongyang Gao, Jianfei Cai

This is because the language is only partially observable, for which we need to dynamically collocate the modules during the process of image captioning.

MemoNav: Selecting Informative Memories for Visual Navigation

no code implementations20 Aug 2022 Hongxin Li, Xu Yang, Yuran Yang, Shuqi Mei, Zhaoxiang Zhang

To address this limitation, we present the MemoNav, a novel memory mechanism for image-goal navigation, which retains the agent's informative short-term memory and long-term memory to improve the navigation performance on a multi-goal task.

Action Generation Graph Attention +2

Automatically Discovering Novel Visual Categories with Self-supervised Prototype Learning

1 code implementation1 Aug 2022 Lu Zhang, Lu Qi, Xu Yang, Hong Qiao, Ming-Hsuan Yang, Zhiyong Liu

In the first stage, we obtain a robust feature extractor, which could serve for all images with base and novel categories.

Representation Learning Self-Supervised Learning

Siamese Contrastive Embedding Network for Compositional Zero-Shot Learning

1 code implementation CVPR 2022 Xiangyu Li, Xu Yang, Kun Wei, Cheng Deng, Muli Yang

Some methods recognize state and object with two trained classifiers, ignoring the impact of the interaction between object and state; the other methods try to learn the joint representation of the state-object compositions, leading to the domain gap between seen and unseen composition sets.

iExam: A Novel Online Exam Monitoring and Analysis System Based on Face Detection and Recognition

1 code implementation27 Jun 2022 Xu Yang, Daoyuan Wu, Xiao Yi, Jimmy H. M. Lee, Tan Lee

In this paper, we propose iExam, an intelligent online exam monitoring and analysis system that can not only use face detection to assist invigilators in real-time student identification, but also be able to detect common abnormal behaviors (including face disappearing, rotating faces, and replacing with a different person during the exams) via a face recognition-based post-exam video analysis.

Face Detection Face Recognition +2

Unseen Object Instance Segmentation with Fully Test-time RGB-D Embeddings Adaptation

no code implementations21 Apr 2022 Lu Zhang, Siqi Zhang, Xu Yang, Hong Qiao, Zhiyong Liu

In this paper, we emphasize the adaptation process across sim2real domains and model it as a learning problem on the BatchNorm parameters of a simulation-trained model.

Knowledge Distillation Segmentation +4

Weakly Aligned Feature Fusion for Multimodal Object Detection

no code implementations21 Apr 2022 Lu Zhang, Zhiyong Liu, Xiangyu Zhu, Zhan Song, Xu Yang, Zhen Lei, Hong Qiao

In this article, we propose a general multimodal detector named aligned region CNN (AR-CNN) to tackle the position shift problem.

Object object-detection +2

Show, Deconfound and Tell: Image Captioning With Causal Inference

1 code implementation CVPR 2022 Bing Liu, Dong Wang, Xu Yang, Yong Zhou, Rui Yao, Zhiwen Shao, Jiaqi Zhao

In the encoding stage, the IOD is able to disentangle the region-based visual features by deconfounding the visual confounder.

Not Just Selection, but Exploration: Online Class-Incremental Continual Learning via Dual View Consistency

1 code implementation CVPR 2022 Yanan Gu, Xu Yang, Kun Wei, Cheng Deng

Unfortunately, these methods only focus on selecting samples from the memory bank for replay and ignore the adequate exploration of semantic information in the single-pass data stream, leading to poor classification accuracy.

Towards End-to-End Image Compression and Analysis with Transformers

1 code implementation17 Dec 2021 Yuanchao Bai, Xu Yang, Xianming Liu, Junjun Jiang, YaoWei Wang, Xiangyang Ji, Wen Gao

Meanwhile, we propose a feature aggregation module to fuse the compressed features with the selected intermediate features of the Transformer, and feed the aggregated features to a deconvolutional neural network for image reconstruction.

Classification Image Classification +3

Auto-Encoding Score Distribution Regression for Action Quality Assessment

2 code implementations22 Nov 2021 Boyu Zhang, Jiayuan Chen, Yinfei Xu, HUI ZHANG, Xu Yang, Xin Geng

Traditionally, AQA is treated as a regression problem to learn the underlying mappings between videos and action scores.

Sliding Sequential CVAE with Time Variant Socially-aware Rethinking for Trajectory Prediction

no code implementations28 Oct 2021 Hao Zhou, Dongchun Ren, Xu Yang, Mingyu Fan, Hai Huang

First, with the continuation of time, the prediction error at each time step increases significantly, causing the final displacement error to be impossible to ignore.

Autonomous Driving Pedestrian Trajectory Prediction +3

Can AI detect pain and express pain empathy? A review from emotion recognition and a human-centered AI perspective

no code implementations8 Oct 2021 Siqi Cao, Di Fu, Xu Yang, Stefan Wermter, Xun Liu, Haiyan Wu

Furthermore, we discuss challenges for responsible evaluation of cognitive methods and computational techniques and show approaches to future work to contribute to affective assistants capable of empathy.

Open Set Domain Adaptation with Zero-shot Learning on Graph

no code implementations29 Sep 2021 Xinyue Zhang, Xu Yang, Zhi-Yong Liu

Thus the classification ability of the source domain is transferred to the target domain and the model can distinguish the unknown classes with prior knowledge.

Text-Driven Image Manipulation via Semantic-Aware Knowledge Transfer

no code implementations29 Sep 2021 Ziqi Zhang, Cheng Deng, Kun Wei, Xu Yang

And on this basis, a novel attribute transfer method, named semantic directional decomposition network (SDD-Net), is proposed to achieve semantic-level facial attribute transfer by latent semantic direction decomposition, improving the interpretability and editability of our method.

Auto-Parsing Network for Image Captioning and Visual Question Answering

no code implementations ICCV 2021 Xu Yang, Chongyang Gao, Hanwang Zhang, Jianfei Cai

We propose an Auto-Parsing Network (APN) to discover and exploit the input data's hidden tree structures for improving the effectiveness of the Transformer-based vision-language systems.

Towards Unbiased Visual Emotion Recognition via Causal Intervention

1 code implementation26 Jul 2021 Yuedong Chen, Xu Yang, Tat-Jen Cham, Jianfei Cai

In this work, we scrutinize this problem from the perspective of causal inference, where such dataset characteristic is termed as a confounder which misleads the system to learn the spurious correlation.

SelfSAGCN: Self-Supervised Semantic Alignment for Graph Convolution Network

1 code implementation CVPR 2021 Xu Yang, Cheng Deng, Zhiyuan Dang, Kun Wei, Junchi Yan

Specifically, the Identity Aggregation is applied to extract semantic features from labeled nodes, the Semantic Alignment is utilized to align node features obtained from different aspects using the class central similarity.

Nearest Neighbor Matching for Deep Clustering

1 code implementation CVPR 2021 Zhiyuan Dang, Cheng Deng, Xu Yang, Kun Wei, Heng Huang

Specifically, for the local level, we match the nearest neighbors based on batch embedded features, as for the global one, we match neighbors from overall embedded features.

Doubly Contrastive Deep Clustering

1 code implementation9 Mar 2021 Zhiyuan Dang, Cheng Deng, Xu Yang, Heng Huang

In this paper, we present a novel Doubly Contrastive Deep Clustering (DCDC) framework, which constructs contrastive loss over both sample and class views to obtain more discriminative features and competitive results.

Clustering Contrastive Learning +2

Causal Attention for Vision-Language Tasks

no code implementations CVPR 2021 Xu Yang, Hanwang Zhang, GuoJun Qi, Jianfei Cai

Specifically, CATT is implemented as a combination of 1) In-Sample Attention (IS-ATT) and 2) Cross-Sample Attention (CS-ATT), where the latter forcibly brings other samples into every IS-ATT, mimicking the causal intervention.

no code implementations26 Jan 2021 Jiaqi Yan, Xu Yang, Yilin Mo, Keyou You

This paper studies the distributed state estimation in sensor network, where $m$ sensors are deployed to infer the $n$-dimensional state of a linear time-invariant (LTI) Gaussian system.

Incremental Embedding Learning via Zero-Shot Translation

1 code implementation31 Dec 2020 Kun Wei, Cheng Deng, Xu Yang, Maosen Li

Different from traditional incremental classification networks, the semantic gap between the embedding spaces of two adjacent tasks is the main challenge for embedding networks under incremental learning setting.

Face Recognition Image Retrieval +4

Adversarial Learning for Robust Deep Clustering

1 code implementation NeurIPS 2020 Xu Yang, Cheng Deng, Kun Wei, Junchi Yan, Wei Liu

Meanwhile, we devise an adversarial attack strategy to explore samples that easily fool the clustering layers but do not impact the performance of the deep embedding.

Adversarial Attack Clustering +1

Cloud Cover and Aurora Contamination at Dome A in 2017 from KLCAM

no code implementations7 Oct 2020 Xu Yang, Zhaohui Shang, Keliang Hu, Yi Hu, Bin Ma, Yongjiang Wang, Zihuang Cao, Michael C. B. Ashley, Wei Wang

Dome A in Antarctica has many characteristics that make it an excellent site for astronomical observations, from the optical to the terahertz.

Finding It at Another Side: A Viewpoint-Adapted Matching Encoder for Change Captioning

no code implementations ECCV 2020 Xiangxi Shi, Xu Yang, Jiuxiang Gu, Shafiq Joty, Jianfei Cai

In this paper, we propose a novel visual encoder to explicitly distinguish viewpoint changes from semantic changes in the change captioning task.

Deconfounded Image Captioning: A Causal Retrospect

no code implementations9 Mar 2020 Xu Yang, Hanwang Zhang, Jianfei Cai

Dataset bias in vision-language tasks is becoming one of the main problems which hinders the progress of our community.

Classical limit for the varying-mass Schrödinger equation with random inhomogeneities

no code implementations12 Feb 2020 Shi Chen, Qin Li, Xu Yang

The varying-mass Schr\"odinger equation (VMSE) has been successfully applied to model electronic properties of semiconductor hetero-structures, for example, quantum dots and quantum wells.

Automated Pavement Crack Segmentation Using U-Net-based Convolutional Neural Network

no code implementations7 Jan 2020 Stephen L. H. Lau, Edwin K. P. Chong, Xu Yang, Xin Wang

In this paper, we propose a deep learning technique based on a convolutional neural network to perform segmentation tasks on pavement crack images.

mu-Forcing: Training Variational Recurrent Autoencoders for Text Generation

2 code implementations24 May 2019 Dayiheng Liu, Xu Yang, Feng He, YuanYuan Chen, Jiancheng Lv

It has been previously observed that training Variational Recurrent Autoencoders (VRAE) for text generation suffers from serious uninformative latent variables problem.

Deep Spectral Clustering using Dual Autoencoder Network

no code implementations CVPR 2019 Xu Yang, Cheng Deng, Feng Zheng, Junchi Yan, Wei Liu

In this paper, we propose a joint learning framework for discriminative embedding and spectral clustering.

Learning to Collocate Neural Modules for Image Captioning

no code implementations ICCV 2019 Xu Yang, Hanwang Zhang, Jianfei Cai

To this end, we make the following technical contributions for CNM training: 1) compact module design --- one for function words and three for visual content words (eg, noun, adjective, and verb), 2) soft module fusion and multi-step module execution, robustifying the visual reasoning in partial observation, 3) a linguistic loss for module controller being faithful to part-of-speech collocations (eg, adjective is before noun).

Weakly Aligned Cross-Modal Learning for Multispectral Pedestrian Detection

no code implementations ICCV 2019 Lu Zhang, Xiangyu Zhu, Xiangyu Chen, Xu Yang, Zhen Lei, Zhi-Yong Liu

In this paper, we propose a novel Aligned Region CNN (AR-CNN) to handle the weakly aligned multispectral data in an end-to-end way.


Auto-Encoding Scene Graphs for Image Captioning

2 code implementations CVPR 2019 Xu Yang, Kaihua Tang, Hanwang Zhang, Jianfei Cai

We propose Scene Graph Auto-Encoder (SGAE) that incorporates the language inductive bias into the encoder-decoder image captioning framework for more human-like captions.

Shuffle-Then-Assemble: Learning Object-Agnostic Visual Relationship Features

1 code implementation ECCV 2018 Xu Yang, Hanwang Zhang, Jianfei Cai

By "agnostic", we mean that the feature is less likely biased to the classes of paired objects.


Face Photo Sketch Synthesis via Larger Patch and Multiresolution Spline

no code implementations19 Sep 2015 Xu Yang

In order to get a smoother sketch, we propose a new method to reduce such jagged parts and mottled points.

A Weighted Common Subgraph Matching Algorithm

no code implementations4 Nov 2014 Xu Yang, Hong Qiao, Zhi-Yong Liu

We propose a weighted common subgraph (WCS) matching algorithm to find the most similar subgraphs in two labeled weighted graphs.

