Search Results for author: Richang Hong

Found 114 papers, 53 papers with code

Learning Speaker-Invariant Visual Features for Lipreading

no code implementations9 Jun 2025 Yu Li, Feng Xue, Shujie Li, Jinrui Zhang, Shuang Yang, Dan Guo, Richang Hong

Lipreading is a challenging cross-modal task that aims to convert visual lip movements into spoken text.

Disentanglement Lipreading +1

Align is not Enough: Multimodal Universal Jailbreak Attack against Multimodal Large Language Models

no code implementations2 Jun 2025 Youze Wang, WenBo Hu, Yinpeng Dong, Jing Liu, Hanwang Zhang, Richang Hong

Large Language Models (LLMs) have evolved into Multimodal Large Language Models (MLLMs), significantly enhancing their capabilities by integrating visual information and other types, thus aligning more closely with the nature of human intelligence, which processes a variety of data forms beyond just text.

Safety Alignment

Contrastive Alignment with Semantic Gap-Aware Corrections in Text-Video Retrieval

1 code implementation18 May 2025 Jian Xiao, Zijie Song, Jialong Hu, Hao Cheng, Zhenzhen Hu, Jia Li, Richang Hong

To mitigate this, we propose GARE, a Gap-Aware Retrieval framework that introduces a learnable, pair-specific increment Delta_ij between text t_i and video v_j to offload the tension from the global anchor representation.

Contrastive Learning Retrieval +1

VAEmo: Efficient Representation Learning for Visual-Audio Emotion with Knowledge Injection

1 code implementation5 May 2025 Hao Cheng, Zhiwei Zhao, Yichao He, Zhenzhen Hu, Jia Li, Meng Wang, Richang Hong

Audiovisual emotion recognition (AVER) aims to infer human emotions from nonverbal visual-audio (VA) cues, offering modality-complementary and language-agnostic advantages.

Contrastive Learning Dynamic Facial Expression Recognition +3

Invariance Matters: Empowering Social Recommendation via Graph Invariant Learning

1 code implementation14 Apr 2025 Yonghui Yang, Le Wu, Yuxin Liao, Zhuangzhuang He, Pengyang Shao, Richang Hong, Meng Wang

Graph-based social recommendation systems have shown significant promise in enhancing recommendation performance, particularly in addressing the issue of data sparsity in user behaviors.

Denoising Recommendation Systems

Communication-Efficient and Personalized Federated Foundation Model Fine-Tuning via Tri-Matrix Adaptation

no code implementations31 Mar 2025 Yongle Li, Bo Liu, Sheng Huang, Zheng Zhang, Xiaotong Yuan, Richang Hong

In federated learning, fine-tuning pre-trained foundation models poses significant challenges, particularly regarding high communication cost and suboptimal model performance due to data heterogeneity between the clients.

Federated Learning

EgoBlind: Towards Egocentric Visual Assistance for the Blind People

no code implementations11 Mar 2025 Junbin Xiao, Nanxin Huang, Hao Qiu, Zhulin Tao, Xun Yang, Richang Hong, Meng Wang, Angela Yao

We present EgoBlind, the first egocentric VideoQA dataset collected from blind individuals to evaluate the assistive capabilities of contemporary multimodal large language models (MLLMs).

SwimVG: Step-wise Multimodal Fusion and Adaption for Visual Grounding

1 code implementation24 Feb 2025 Liangtao Shi, Ting Liu, Xiantao Hu, Yue Hu, Quanjun Yin, Richang Hong

Most existing methods transfer visual/linguistic knowledge separately by fully fine-tuning uni-modal pre-trained models, followed by a simple stack of visual-language transformers for multimodal fusion.

cross-modal alignment Visual Grounding

MoLoRec: A Generalizable and Efficient Framework for LLM-Based Recommendation

no code implementations12 Feb 2025 Min Hou, Chenxi Bai, Le Wu, Hao liu, Kun Zhang, Kai Zhang, Richang Hong, Meng Wang

The first paradigm designs multi-domain or multi-task instruction data for generalizable recommendation, so as to align LLMs with general recommendation areas and deal with cold-start recommendation.

parameter-efficient fine-tuning Recommendation Systems +1

Less is More: Information Bottleneck Denoised Multimedia Recommendation

no code implementations21 Jan 2025 Yonghui Yang, Le Wu, Zhuangzhuang He, Zhengwei Wu, Richang Hong, Meng Wang

This is achieved by maximizing the mutual information between multimedia representation and recommendation tasks, while concurrently minimizing it between multimedia representation and pre-trained multimedia features.

Multimedia recommendation

Biomedical Relation Extraction via Adaptive Document-Relation Cross-Mapping and Concept Unique Identifier

no code implementations9 Jan 2025 Yufei Shang, Yanrong Guo, Shijie Hao, Richang Hong

Specifically, we propose a document-level Bio-RE framework via LLM Adaptive Document-Relation Cross-Mapping (ADRCM) Fine-Tuning and Concept Unique Identifier (CUI) Retrieval-Augmented Generation (RAG).

RAG Relation +4

Discrete to Continuous: Generating Smooth Transition Poses from Sign Language Observations

no code implementations CVPR 2025 Shengeng Tang, Jiayi He, Lechao Cheng, Jingjing Wu, Dan Guo, Richang Hong

To address this, we propose a novel framework, Sign-D2C, that employs a conditional diffusion model to synthesize contextually smooth transition frames, enabling the seamless construction of continuous sign language sequences.

Denoising

Revisiting Audio-Visual Segmentation with Vision-Centric Transformer

no code implementations CVPR 2025 Shaofei Huang, Rui Ling, Tianrui Hui, Hongyu Li, Xu Zhou, Shifeng Zhang, Si Liu, Richang Hong, Meng Wang

Audio-Visual Segmentation (AVS) aims to segment sound-producing objects in video frames based on the associated audio signal.

Linguistics-Vision Monotonic Consistent Network for Sign Language Production

no code implementations22 Dec 2024 Xu Wang, Shengeng Tang, Peipei Song, Shuo Wang, Dan Guo, Richang Hong

Sign Language Production (SLP) aims to generate sign videos corresponding to spoken language sentences, where the conversion of sign Glosses to Poses (G2P) is the key step.

Sign Language Production

Unveiling Uncertainty: A Deep Dive into Calibration and Performance of Multimodal Large Language Models

1 code implementation19 Dec 2024 Zijun Chen, WenBo Hu, Guande He, Zhijie Deng, Zheng Zhang, Richang Hong

This paper investigates representative MLLMs, focusing on their calibration across various scenarios, including before and after visual fine-tuning, as well as before and after multimodal training of the base LLMs.

Autonomous Driving Image Captioning +2

Sign-IDD: Iconicity Disentangled Diffusion for Sign Language Production

1 code implementation18 Dec 2024 Shengeng Tang, Jiayi He, Dan Guo, Yanyan Wei, Feng Li, Richang Hong

Sign-IDD incorporates a novel Iconicity Disentanglement (ID) module to bridge the gap between relative positions among joints.

Attribute Disentanglement +1

Moderating the Generalization of Score-based Generative Model

no code implementations10 Dec 2024 Wan Jiang, He Wang, Xin Zhang, Dan Guo, Zhaoxin Fan, Yunfeng Diao, Richang Hong

To fill this gap, we first examine the current 'gold standard' in Machine Unlearning (MU), i. e., re-training the model after removing the undesirable training data, and find it does not work in SGMs.

Image Inpainting Machine Unlearning +1

Discrete to Continuous: Generating Smooth Transition Poses from Sign Language Observation

no code implementations25 Nov 2024 Shengeng Tang, Jiayi He, Lechao Cheng, Jingjing Wu, Dan Guo, Richang Hong

To address this, we propose a novel framework, Sign-D2C, that employs a conditional diffusion model to synthesize contextually smooth transition frames, enabling the seamless construction of continuous sign language sequences.

Denoising

CARE Transformer: Mobile-Friendly Linear Visual Transformer via Decoupled Dual Interaction

no code implementations CVPR 2025 Yuan Zhou, Qingshan Xu, Jiequan Cui, Junbao Zhou, Jing Zhang, Richang Hong, Hanwang Zhang

In this paper, we propose a new de\textbf{C}oupled du\textbf{A}l-interactive linea\textbf{R} att\textbf{E}ntion (CARE) mechanism, revealing that features' decoupling and interaction can fully unleash the power of linear attention.

Inductive Bias

Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Model

1 code implementation16 Nov 2024 Ting Liu, Liangtao Shi, Richang Hong, Yue Hu, Quanjun Yin, Linfeng Zhang

In this paper, we propose Multi-stage Token Dropping (MustDrop) to measure the importance of each token from the whole lifecycle, including the vision encoding stage, prefilling stage, and decoding stage.

Language Modeling Language Modelling +2

DAT: Dialogue-Aware Transformer with Modality-Group Fusion for Human Engagement Estimation

1 code implementation11 Oct 2024 Jia Li, Yangchen Yu, Yin Chen, Yu Zhang, Peng Jia, Yunbo Xu, Ziqiang Li, Meng Wang, Richang Hong

Engagement estimation plays a crucial role in understanding human social behaviors, attracting increasing research interests in fields such as affective computing and human-computer interaction.

Text Proxy: Decomposing Retrieval from a 1-to-N Relationship into N 1-to-1 Relationships for Text-Video Retrieval

1 code implementation9 Oct 2024 Jian Xiao, Zhenzhen Hu, Jia Li, Richang Hong

By replacing a single text query with a series of text proxies, TV-ProxyNet not only broadens the query scope but also achieves a more precise expansion.

Text Retrieval Video-Text Retrieval

Towards Unified Multimodal Editing with Enhanced Knowledge Collaboration

1 code implementation30 Sep 2024 Kaihang Pan, Zhaoyu Fan, Juncheng Li, Qifan Yu, Hao Fei, Siliang Tang, Richang Hong, Hanwang Zhang, Qianru Sun

In this paper, we propose UniKE, a novel multimodal editing method that establishes a unified perspective and paradigm for intrinsic knowledge editing and external knowledge resorting.

knowledge editing

Static for Dynamic: Towards a Deeper Understanding of Dynamic Facial Expressions Using Static Expression Data

1 code implementation10 Sep 2024 Yin Chen, Jia Li, Yu Zhang, Zhenzhen Hu, Shiguang Shan, Meng Wang, Richang Hong

Dynamic facial expression recognition (DFER) infers emotions from the temporal evolution of expressions, unlike static facial expression recognition (SFER), which relies solely on a single snapshot.

Dynamic Facial Expression Recognition Facial Expression Recognition +1

Seeing is Believing? Enhancing Vision-Language Navigation using Visual Perturbations

no code implementations9 Sep 2024 Xuesong Zhang, Jia Li, Yunbo Xu, Zhenzhen Hu, Richang Hong

Autonomous navigation for an embodied agent guided by natural language instructions remains a formidable challenge in vision-and-language navigation (VLN).

Autonomous Navigation Diversity +2

Exploring Robust Face-Voice Matching in Multilingual Environments

no code implementations29 Jul 2024 Jiehui Tang, Xiaofei Wang, Zhen Xiao, Jiayi Liu, Xueliang Liu, Richang Hong

We focus on the impact of different languages in face-voice matching by building upon Fusion and Orthogonal Projection (FOP), introducing four key components: a dual-branch structure, dynamic sample pair weighting, robust data augmentation, and score polarization strategy.

Data Augmentation

Graph Bottlenecked Social Recommendation

1 code implementation12 Jun 2024 Yonghui Yang, Le Wu, Zihan Wang, Zhuangzhuang He, Richang Hong, Meng Wang

In this paper, we focus on learning the denoised social structure to facilitate recommendation tasks from an information bottleneck perspective.

Denoising

Path-Specific Causal Reasoning for Fairness-aware Cognitive Diagnosis

1 code implementation5 Jun 2024 Dacao Zhang, Kun Zhang, Le Wu, Mi Tian, Richang Hong, Meng Wang

Then, we design a novel attribute-oriented predictor to decouple the sensitive attributes, in which fairness-related sensitive features will be eliminated and other useful information will be retained.

Attribute cognitive diagnosis +1

Double Correction Framework for Denoising Recommendation

1 code implementation18 May 2024 Zhuangzhuang He, Yifan Wang, Yonghui Yang, Peijie Sun, Le Wu, Haoyue Bai, Jinqi Gong, Richang Hong, Min Zhang

To tackle the above limitations, we propose a Double Correction Framework for Denoising Recommendation (DCF), which contains two correction components from views of more precise sample dropping and avoiding more sparse data.

Denoising Model Optimization +1

Controllable Relation Disentanglement for Few-Shot Class-Incremental Learning

no code implementations17 Mar 2024 Yuan Zhou, Richang Hong, Yanrong Guo, Lin Liu, Shijie Hao, Hanwang Zhang

In this paper, we propose to tackle Few-Shot Class-Incremental Learning (FSCIL) from a new perspective, i. e., relation disentanglement, which means enhancing FSCIL via disentangling spurious relation between categories.

class-incremental learning Disentanglement +3

Gradient-Aware Logit Adjustment Loss for Long-tailed Classifier

1 code implementation14 Mar 2024 Fan Zhang, Wei Qin, Weijieying Ren, Lei Wang, Zetong Chen, Richang Hong

Additionally, We find that most of the solutions to long-tailed problems are still biased towards head classes in the end, and we propose a simple and post hoc prediction re-balancing strategy to further mitigate the basis toward head class.

Doubly Abductive Counterfactual Inference for Text-based Image Editing

1 code implementation CVPR 2024 Xue Song, Jiequan Cui, Hanwang Zhang, Jingjing Chen, Richang Hong, Yu-Gang Jiang

Through the lens of the formulation, we find that the crux of TBIE is that existing techniques hardly achieve a good trade-off between editability and fidelity, mainly due to the overfitting of the single-image fine-tuning.

counterfactual Counterfactual Inference +2

Few-shot Learner Parameterization by Diffusion Time-steps

1 code implementation CVPR 2024 Zhongqi Yue, Pan Zhou, Richang Hong, Hanwang Zhang, Qianru Sun

To this end, we find an inductive bias that the time-steps of a Diffusion Model (DM) can isolate the nuanced class attributes, i. e., as the forward diffusion adds noise to an image at each time-step, nuanced attributes are usually lost at an earlier time-step than the spurious attributes that are visually prominent.

Few-Shot Learning Inductive Bias

Group Multi-View Transformer for 3D Shape Analysis with Spatial Encoding

1 code implementation27 Dec 2023 Lixiang Xu, Qingzhe Cui, Richang Hong, Wei Xu, Enhong Chen, Xin Yuan, Chenglong Li, Yuanyan Tang

The large model GMViT achieves excellent 3D classification and retrieval results on the benchmark datasets ModelNet, ShapeNetCore55, and MCB.

3D Classification 3D Shape Recognition +2

Deep Ensembles Meets Quantile Regression: Uncertainty-aware Imputation for Time Series

no code implementations3 Dec 2023 Ying Liu, Peng Cui, WenBo Hu, Richang Hong

Our method not only produces accurate imputations that is robust to high missing rates, but also is computationally efficient due to the fast training of its non-generative model.

Imputation Missing Values +2

One-bit Supervision for Image Classification: Problem, Solution, and Beyond

no code implementations26 Nov 2023 Hengtong Hu, Lingxi Xie, Xinyue Hue, Richang Hong, Qi Tian

An intriguing property of the setting is that the burden of annotation largely alleviates in comparison to offering the accurate label.

Active Learning image-classification +3

Clarity ChatGPT: An Interactive and Adaptive Processing System for Image Restoration and Enhancement

no code implementations20 Nov 2023 Yanyan Wei, Zhao Zhang, Jiahuan Ren, Xiaogang Xu, Richang Hong, Yi Yang, Shuicheng Yan, Meng Wang

The generalization capability of existing image restoration and enhancement (IRE) methods is constrained by the limited pre-trained datasets, making it difficult to handle agnostic inputs such as different degradation levels and scenarios beyond their design scopes.

Image Restoration Language Modelling

Boundary Discretization and Reliable Classification Network for Temporal Action Detection

1 code implementation10 Oct 2023 Zhenying Fang, Jun Yu, Richang Hong

Furthermore, the reliable classification module (RCM) predicts reliable global action categories to reduce false positives.

Action Detection

Embedded Heterogeneous Attention Transformer for Cross-lingual Image Captioning

no code implementations19 Jul 2023 Zijie Song, Zhenzhen Hu, Yuanen Zhou, Ye Zhao, Richang Hong, Meng Wang

The crucial issue in this task is to model the global and the local matching between the image and different languages.

Image Captioning

Generative Contrastive Graph Learning for Recommendation

1 code implementation11 Jul 2023 Yonghui Yang, Zhengwei Wu, Le Wu, Kun Zhang, Richang Hong, Zhiqiang Zhang, Jun Zhou, Meng Wang

Second, feature augmentation imposes the same scale noise augmentation on each node, which neglects the unique characteristics of nodes on the graph.

Collaborative Filtering Contrastive Learning +3

Advancing Incremental Few-shot Semantic Segmentation via Semantic-guided Relation Alignment and Adaptation

no code implementations18 May 2023 Yuan Zhou, Xin Chen, Yanrong Guo, Shijie Hao, Richang Hong, Qi Tian

Incremental few-shot semantic segmentation (IFSS) aims to incrementally extend a semantic segmentation model to novel classes according to only a few pixel-level annotated data, while preserving its segmentation capability on previously learned base categories.

Few-Shot Semantic Segmentation Incremental Learning +3

Iterative Adversarial Attack on Image-guided Story Ending Generation

no code implementations16 May 2023 Youze Wang, WenBo Hu, Richang Hong

Multimodal learning involves developing models that can integrate information from various sources like images and texts.

Adversarial Robustness Adversarial Text +4

Unlearnable Examples Give a False Sense of Security: Piercing through Unexploitable Data with Learnable Examples

1 code implementation16 May 2023 Wan Jiang, Yunfeng Diao, He Wang, Jianxin Sun, Meng Wang, Richang Hong

Unfortunately, we find UEs provide a false sense of security, because they cannot stop unauthorized users from utilizing other unprotected data to remove the protection, by turning unlearnable data into learnable again.

Multimodal Feature Extraction and Fusion for Emotional Reaction Intensity Estimation and Expression Classification in Videos with Transformers

1 code implementation16 Mar 2023 Jia Li, Yin Chen, Xuesong Zhang, Jiantao Nie, Ziqiang Li, Yangchen Yu, Yan Zhang, Richang Hong, Meng Wang

In this paper, we present our advanced solutions to the two sub-challenges of Affective Behavior Analysis in the wild (ABAW) 2023: the Emotional Reaction Intensity (ERI) Estimation Challenge and Expression (Expr) Classification Challenge.

Classification

Adaptive Data-Free Quantization

1 code implementation CVPR 2023 Biao Qian, Yang Wang, Richang Hong, Meng Wang

Data-free quantization (DFQ) recovers the performance of quantized network (Q) without the original data, but generates the fake sample via a generator (G) by learning from full-precision network (P), which, however, is totally independent of Q, overlooking the adaptability of the knowledge from generated samples, i. e., informative or not to the learning process of Q, resulting into the overflow of generalization error.

Data Free Quantization

Contrastive Video Question Answering via Video Graph Transformer

1 code implementation27 Feb 2023 Junbin Xiao, Pan Zhou, Angela Yao, Yicong Li, Richang Hong, Shuicheng Yan, Tat-Seng Chua

CoVGT's uniqueness and superiority are three-fold: 1) It proposes a dynamic graph transformer module which encodes video by explicitly capturing the visual objects, their relations and dynamics, for complex spatio-temporal reasoning.

Ranked #34 on Video Question Answering on NExT-QA (using extra training data)

Contrastive Learning Question Answering +1

Rethinking Data-Free Quantization as a Zero-Sum Game

1 code implementation19 Feb 2023 Biao Qian, Yang Wang, Richang Hong, Meng Wang

how to generate the samples with desirable adaptability to benefit the quantized network?

Data Free Quantization

LipFormer: Learning to Lipread Unseen Speakers based on Visual-Landmark Transformers

no code implementations4 Feb 2023 Feng Xue, Yu Li, Deyin Liu, Yincen Xie, Lin Wu, Richang Hong

However, generalizing these methods to unseen speakers incurs catastrophic performance degradation due to the limited number of speakers in training bank and the evident visual variations caused by the shape/color of lips for different speakers.

Lipreading Sentence

3D Human Pose Estimation With Spatio-Temporal Criss-Cross Attention

1 code implementation CVPR 2023 Zhenhua Tang, Zhaofan Qiu, Yanbin Hao, Richang Hong, Ting Yao

On this basis, we devise STCFormer by stacking multiple STC blocks and further integrate a new Structure-enhanced Positional Embedding (SPE) into STCFormer to take the structure of human body into consideration.

3D Human Pose Estimation

Global Temporal Difference Network for Action Recognition

no code implementations TMM 2022 Zhao Xie, Jiansong Chen, Kewei Wu, Dan Guo, Richang Hong

In the global aggregation module, the global prior knowledge is learned by aggregating the visual feature sequence of video into a global vector.

Action Recognition

Stereo Image Rain Removal via Dual-View Mutual Attention

no code implementations18 Nov 2022 Yanyan Wei, Zhao Zhang, ZhongQiu Zhao, Yang Zhao, Richang Hong, Yi Yang

Stereo images, containing left and right view images with disparity, are utilized in solving low-vision tasks recently, e. g., rain removal and super-resolution.

Disparity Estimation Image Restoration +2

Decoupled Cross-Scale Cross-View Interaction for Stereo Image Enhancement in The Dark

no code implementations2 Nov 2022 Huan Zheng, Zhao Zhang, Jicong Fan, Richang Hong, Yi Yang, Shuicheng Yan

Specifically, we present a decoupled interaction module (DIM) that aims for sufficient dual-view information interaction.

Image Enhancement

MEGCF: Multimodal Entity Graph Collaborative Filtering for Personalized Recommendation

1 code implementation14 Oct 2022 Kang Liu, Feng Xue, Dan Guo, Le Wu, Shujie Li, Richang Hong

This paper aims at solving the mismatch problem between MFE and UIM, so as to generate high-quality embedding representations and better model multimodal user preferences.

Collaborative Filtering image-classification +1

Joint Multi-grained Popularity-aware Graph Convolution Collaborative Filtering for Recommendation

1 code implementation10 Oct 2022 Kang Liu, Feng Xue, Xiangnan He, Dan Guo, Richang Hong

In this work, we propose to model multi-grained popularity features and jointly learn them together with high-order connectivity, to match the differentiation of user preferences exhibited in popularity features.

Collaborative Filtering Recommendation Systems

Seeing Through the Noisy Dark: Towards Real-world Low-Light Image Enhancement and Denoising

no code implementations2 Oct 2022 Jiahuan Ren, Zhao Zhang, Richang Hong, Mingliang Xu, Yi Yang, Shuicheng Yan

Low-light image enhancement (LLIE) aims at improving the illumination and visibility of dark images with lighting noise.

Attribute Denoising +1

Switchable Online Knowledge Distillation

1 code implementation12 Sep 2022 Biao Qian, Yang Wang, Hongzhi Yin, Richang Hong, Meng Wang

Instead of focusing on the accuracy gap at test phase by the existing arts, the core idea of SwitOKD is to adaptively calibrate the gap at training phase, namely distillation gap, via a switching strategy between two modes -- expert mode (pause the teacher while keep the student learning) and learning mode (restart the teacher).

Knowledge Distillation

Emotion Separation and Recognition from a Facial Expression by Generating the Poker Face with Vision Transformers

no code implementations22 Jul 2022 Jia Li, Jiantao Nie, Dan Guo, Richang Hong, Meng Wang

PF-ViT aims to separate and recognize the disturbance-agnostic emotion from a static facial image via generating its corresponding poker face, without the need for paired images.

Disentanglement Face Generation +2

Towards Feature Distribution Alignment and Diversity Enhancement for Data-Free Quantization

no code implementations30 Apr 2022 Yangcheng Gao, Zhao Zhang, Richang Hong, Haijun Zhang, Jicong Fan, Shuicheng Yan

To obtain high inter-class separability of semantic features, we cluster and align the feature distribution statistics to imitate the distribution of real data, so that the performance degradation is alleviated.

Data Free Quantization Diversity +2

A Review-aware Graph Contrastive Learning Framework for Recommendation

1 code implementation26 Apr 2022 Jie Shuai, Kun Zhang, Le Wu, Peijie Sun, Richang Hong, Meng Wang, Yong Li

Second, while most current models suffer from limited user behaviors, can we exploit the unique self-supervised signals in the review-aware graph to guide two recommendation components better?

Contrastive Learning Recommendation Systems +1

Vibration-based Uncertainty Estimation for Learning from Limited Supervision

no code implementations29 Sep 2021 Hengtong Hu, Lingxi Xie, Yinquan Wang, Richang Hong, Meng Wang, Qi Tian

We investigate the problem of estimating uncertainty for training data, so that deep neural networks can make use of the results for learning from limited supervision.

Active Learning

Few-shot Learning with Global Relatedness Decoupled-Distillation

no code implementations12 Jul 2021 Yuan Zhou, Yanrong Guo, Shijie Hao, Richang Hong, ZhengJun Zha, Meng Wang

To overcome these problems, we propose a new Global Relatedness Decoupled-Distillation (GRDD) method using the global category knowledge and the Relatedness Decoupled-Distillation (RDD) strategy.

Few-Shot Learning Metric Learning

Privileged Graph Distillation for Cold Start Recommendation

no code implementations31 May 2021 Shuai Wang, Kun Zhang, Le Wu, Haiping Ma, Richang Hong, Meng Wang

The teacher model is composed of a heterogeneous graph structure for warm users and items with privileged CF links.

Attribute Collaborative Filtering +1

Set2setRank: Collaborative Set to Set Ranking for Implicit Feedback based Recommendation

1 code implementation16 May 2021 Lei Chen, Le Wu, Kun Zhang, Richang Hong, Meng Wang

Despite the performance gain of these implicit feedback based models, the recommendation results are still far from satisfactory due to the sparsity of the observed item set for each user.

Collaborative Filtering

Few-shot Partial Multi-view Learning

no code implementations5 May 2021 Yuan Zhou, Yanrong Guo, Shijie Hao, Richang Hong, Jiebo Luo

The challenges of this task are twofold: (i) it is difficult to overcome the impact of data scarcity under the interference of missing views; (ii) the limited number of data exacerbates information scarcity, thus making it harder to address the view-missing issue in turn.

Few-Shot Learning MULTI-VIEW LEARNING

Fine-Grained Fashion Similarity Prediction by Attribute-Specific Embedding Learning

1 code implementation6 Apr 2021 Jianfeng Dong, Zhe Ma, Xiaofeng Mao, Xun Yang, Yuan He, Richang Hong, Shouling Ji

In this similarity paradigm, one should pay more attention to the similarity in terms of a specific design/attribute between fashion items.

Attribute Reranking

Revisiting Local Descriptor for Improved Few-Shot Classification

1 code implementation30 Mar 2021 Jun He, Richang Hong, Xueliang Liu, Mingliang Xu, Qianru Sun

Few-shot classification studies the problem of quickly adapting a deep learner to understanding novel classes based on few support images.

Classification Decision Making +1

Learning Fair Representations for Recommendation: A Graph-based Perspective

1 code implementation18 Feb 2021 Le Wu, Lei Chen, Pengyang Shao, Richang Hong, Xiting Wang, Meng Wang

For each user, this transformation is achieved under the adversarial learning of a user-centric graph, in order to obfuscate each sensitive feature between both the filtered user embedding and the sub graph structures of this user.

Fairness Recommendation Systems

One-bit Supervision for Image Classification

1 code implementation NeurIPS 2020 Hengtong Hu, Lingxi Xie, Zewei Du, Richang Hong, Qi Tian

Instead of training a model upon the accurate label of each sample, our setting requires the model to query with a predicted label of each sample and learn from the answer whether the guess is correct.

Classification General Classification +2

RGCF: Refined Graph Convolution Collaborative Filtering with concise and expressive embedding

1 code implementation7 Jul 2020 Kang Liu, Feng Xue, Richang Hong

In this work, we develop a new GCN-based Collaborative Filtering model, named Refined Graph convolution Collaborative Filtering(RGCF), where the construction of the embeddings of users (items) are delicately redesigned from several aspects during the aggregation on the graph.

Collaborative Filtering

Learning to Transfer Graph Embeddings for Inductive Graph based Recommendation

no code implementations24 May 2020 Le Wu, Yonghui Yang, Lei Chen, Defu Lian, Richang Hong, Meng Wang

The transfer network is designed to approximate the learned item embeddings from graph neural networks by taking each item's visual content as input, in order to tackle the new segment problem in the test phase.

Graph Neural Network Transfer Learning

Memory-Augmented Relation Network for Few-Shot Learning

no code implementations9 May 2020 Jun He, Richang Hong, Xueliang Liu, Mingliang Xu, Zheng-Jun Zha, Meng Wang

Metric-based few-shot learning methods concentrate on learning transferable feature embedding that generalizes well from seen categories to unseen categories under the supervision of limited number of labelled instances.

Few-Shot Learning Metric Learning +2

Real-world Person Re-Identification via Degradation Invariance Learning

no code implementations CVPR 2020 Yukun Huang, Zheng-Jun Zha, Xueyang Fu, Richang Hong, Liang Li

Person re-identification (Re-ID) in real-world scenarios usually suffers from various degradation factors, e. g., low-resolution, weak illumination, blurring and adverse weather.

Image Restoration Person Re-Identification +2

Creating Something from Nothing: Unsupervised Knowledge Distillation for Cross-Modal Hashing

1 code implementation CVPR 2020 Hengtong Hu, Lingxi Xie, Richang Hong, Qi Tian

In recent years, cross-modal hashing (CMH) has attracted increasing attentions, mainly because its potential ability of mapping contents from different modalities, especially in vision and language, into the same space, so that it becomes efficient in cross-modal data retrieval.

Knowledge Distillation Retrieval

Estimation-Action-Reflection: Towards Deep Interaction Between Conversational and Recommender Systems

no code implementations21 Feb 2020 Wenqiang Lei, Xiangnan He, Yisong Miao, Qingyun Wu, Richang Hong, Min-Yen Kan, Tat-Seng Chua

Recommender systems are embracing conversational technologies to obtain user preferences dynamically, and to overcome inherent limitations of their static models.

Recommendation Systems

Revisiting Graph based Collaborative Filtering: A Linear Residual Graph Convolutional Network Approach

2 code implementations28 Jan 2020 Lei Chen, Le Wu, Richang Hong, Kun Zhang, Meng Wang

Second, we propose a residual network structure that is specifically designed for CF with user-item interaction modeling, which alleviates the over smoothing problem in graph convolution aggregation operation with sparse user-item interaction data.

Collaborative Filtering Recommendation Systems +1

DiffNet++: A Neural Influence and Interest Diffusion Network for Social Recommendation

4 code implementations15 Jan 2020 Le Wu, Junwei Li, Peijie Sun, Richang Hong, Yong Ge, Meng Wang

Recently, we propose a preliminary work of a neural influence diffusion network (i. e., DiffNet) for social recommendation (Diffnet), which models the recursive social diffusion process to capture the higher-order relationships for each user.

Collaborative Filtering

Diversifying Inference Path Selection: Moving-Mobile-Network for Landmark Recognition

no code implementations1 Dec 2019 Biao Qian, Yang Wang, Zhao Zhang, Richang Hong, Meng Wang, Ling Shao

We intuitively find that M$^2$Net can essentially promote the diversity of the inference path (selected blocks subset) selection, so as to enhance the recognition accuracy.

Diversity Landmark Recognition

MMGCN: Multi-modal Graph Convolution Network for Personalized Recommendation of Micro-video

1 code implementation ACM International Conference on Multimedia 2019 Yinwei Wei, Xiang Wang, Liqiang Nie, Xiangnan He, Richang Hong, Tat-Seng Chua

Existing works on multimedia recommendation largely exploit multi-modal contents to enrich item representations, while less effort is made to leverage information interchange between users and items to enhance user representations and further capture user's fine-grained preferences on different modalities.

Microvideo Recommendation Micro-video recommendations +5

A Coarse-to-Fine Multi-stream Hybrid Deraining Network for Single Image Deraining

no code implementations28 Aug 2019 Yanyan Wei, Zhao Zhang, Haijun Zhang, Richang Hong, Meng Wang

To obtain the negative rain streaks during training process more accurately, we present a new module named dual path residual dense block, i. e., Residual path and Dense path.

Single Image Deraining SSIM

Robust Subspace Discovery by Block-diagonal Adaptive Locality-constrained Representation

no code implementations4 Aug 2019 Zhao Zhang, Jiahuan Ren, Sheng Li, Richang Hong, Zheng-Jun Zha, Meng Wang

Leveraging on the Frobenius-norm based latent low-rank representation model, rBDLR jointly learns the coding coefficients and salient features, and improves the results by enhancing the robustness to outliers and errors in given data, preserving local information of salient features adaptively and ensuring the block-diagonal structures of the coefficients.

Representation Learning

Joint Subspace Recovery and Enhanced Locality Driven Robust Flexible Discriminative Dictionary Learning

no code implementations11 Jun 2019 Zhao Zhang, Jiahuan Ren, Weiming Jiang, Zheng Zhang, Richang Hong, Shuicheng Yan, Meng Wang

We propose a joint subspace recovery and enhanced locality based robust flexible label consistent dictionary learning method called Robust Flexible Discriminative Dictionary Learning (RFDDL).

Dictionary Learning

Learning to Compose and Reason with Language Tree Structures for Visual Grounding

no code implementations5 Jun 2019 Richang Hong, Daqing Liu, Xiaoyu Mo, Xiangnan He, Hanwang Zhang

Grounding natural language in images, such as localizing "the black dog on the left of the tree", is one of the core problems in artificial intelligence, as it needs to comprehend the fine-grained and compositional language space.

Visual Grounding Visual Reasoning

Personalized Multimedia Item and Key Frame Recommendation

no code implementations1 Jun 2019 Le Wu, Lei Chen, Yonghui Yang, Richang Hong, Yong Ge, Xing Xie, Meng Wang

We argue that the key challenge of this problem lies in discovering users' visual profiles for key frame recommendation, as most recommendation models would fail without any users' fine-grained image behavior.

Online Filter Clustering and Pruning for Efficient Convnets

no code implementations28 May 2019 Zhengguang Zhou, Wengang Zhou, Richang Hong, Houqiang Li

Pruning filters is an effective method for accelerating deep neural networks (DNNs), but most existing approaches prune filters on a pre-trained network directly which limits in acceleration.

Clustering

A Neural Influence Diffusion Model for Social Recommendation

2 code implementations20 Apr 2019 Le Wu, Peijie Sun, Yanjie Fu, Richang Hong, Xiting Wang, Meng Wang

The key idea of our proposed model is that we design a layer-wise influence propagation structure to model how users' latent embeddings evolve as the social diffusion process continues.

Collaborative Filtering Recommendation Systems

Cross-Entropy Adversarial View Adaptation for Person Re-identification

no code implementations3 Apr 2019 Lin Wu, Richang Hong, Yang Wang, Meng Wang

The main contribution is to learn coupled asymmetric mappings regarding view characteristics which are adversarially trained to address the view discrepancy by optimising the cross-entropy view confusion objective.

Person Re-Identification

Deep Item-based Collaborative Filtering for Top-N Recommendation

1 code implementation11 Nov 2018 Feng Xue, Xiangnan He, Xiang Wang, Jiandong Xu, Kai Liu, Richang Hong

In this work, we propose a more expressive ICF solution by accounting for the nonlinear and higher-order relationship among items.

Collaborative Filtering Decision Making +1

Fast Matrix Factorization with Non-Uniform Weights on Missing Data

1 code implementation11 Nov 2018 Xiangnan He, Jinhui Tang, Xiaoyu Du, Richang Hong, Tongwei Ren, Tat-Seng Chua

This poses an imbalanced learning problem, since the scale of missing entries is usually much larger than that of observed entries, but they cannot be ignored due to the valuable negative signal.

SocialGCN: An Efficient Graph Convolutional Network based Model for Social Recommendation

no code implementations7 Nov 2018 Le Wu, Peijie Sun, Richang Hong, Yanjie Fu, Xiting Wang, Meng Wang

Based on a classical CF model, the key idea of our proposed model is that we borrow the strengths of GCNs to capture how users' preferences are influenced by the social diffusion process in social networks.

Collaborative Filtering Recommendation Systems

A Hierarchical Attention Model for Social Contextual Image Recommendation

1 code implementation3 Jun 2018 Le Wu, Lei Chen, Richang Hong, Yanjie Fu, Xing Xie, Meng Wang

After that, we design a hierarchical attention network that naturally mirrors the hierarchical relationship (elements in each aspects level, and the aspect level) of users' latent interests with the identified key aspects.

Interleaved Structured Sparse Convolutional Neural Networks

no code implementations CVPR 2018 Guotian Xie, Jingdong Wang, Ting Zhang, Jian-Huang Lai, Richang Hong, Guo-Jun Qi

In this paper, we study the problem of designing efficient convolutional neural network architectures with the interest in eliminating the redundancy in convolution kernels.

Multi-Cue Correlation Filters for Robust Visual Tracking

1 code implementation CVPR 2018 Ning Wang, Wengang Zhou, Qi Tian, Richang Hong, Meng Wang, Houqiang Li

By combining different types of features, our approach constructs multiple experts through Discriminative Correlation Filter (DCF) and each of them tracks the target independently.

Visual Tracking

IGCV$2$: Interleaved Structured Sparse Convolutional Neural Networks

2 code implementations17 Apr 2018 Guotian Xie, Jingdong Wang, Ting Zhang, Jian-Huang Lai, Richang Hong, Guo-Jun Qi

In this paper, we study the problem of designing efficient convolutional neural network architectures with the interest in eliminating the redundancy in convolution kernels.

Self-Supervised Video Hashing with Hierarchical Binary Auto-encoder

no code implementations7 Feb 2018 Jingkuan Song, Hanwang Zhang, Xiangpeng Li, Lianli Gao, Meng Wang, Richang Hong

Existing video hash functions are built on three isolated stages: frame pooling, relaxed learning, and binarization, which have not adequately explored the temporal order of video frames in a joint binary optimization model, resulting in severe information loss.

Binarization Decoder +2

Enhancing Person Re-identification in a Self-trained Subspace

1 code implementation20 Apr 2017 Xun Yang, Meng Wang, Richang Hong, Qi Tian, Yong Rui

To address this problem, in this paper, we propose a self-trained subspace learning paradigm for person re-ID which effectively utilizes both labeled and unlabeled data to learn a discriminative subspace where person images across disjoint camera views can be easily matched.

Person Re-Identification

Point-of-Interest Recommendations: Learning Potential Check-ins from Friends

1 code implementation1 Aug 2016 Huayu Li, Yong Ge, Richang Hong, Hengshu Zhu

The emergence of Location-based Social Network (LBSN) services provides a wonderful opportunity to build personalized Point-of-Interest (POI) recommender systems.

Decision Making Recommendation Systems

Cascaded Interactional Targeting Network for Egocentric Video Analysis

no code implementations CVPR 2016 Yang Zhou, Bingbing Ni, Richang Hong, Xiaokang Yang, Qi Tian

Firstly, a novel EM-like learning framework is proposed to train the pixel-level deep convolutional neural network (DCNN) by seamlessly integrating weakly supervised data (i. e., massive bounding box annotations) with a small set of strongly supervised data (i. e., fully annotated hand segmentation maps) to achieve state-of-the-art hand segmentation performance.

Action Recognition Foreground Segmentation +4

Pooling the Convolutional Layers in Deep ConvNets for Action Recognition

no code implementations6 Nov 2015 Shichao Zhao, Yanbin Liu, Yahong Han, Richang Hong

It achieves the accuracy of 93. 78\% on UCF101 which is the state-of-the-art and the accuracy of 65. 62\% on HMDB51 which is comparable to the state-of-the-art.

Action Recognition image-classification +2

Interaction Part Mining: A Mid-Level Approach for Fine-Grained Action Recognition

no code implementations CVPR 2015 Yang Zhou, Bingbing Ni, Richang Hong, Meng Wang, Qi Tian

Secondly, these object regions are matched and tracked across frames to form a large spatio-temporal graph based on the appearance matching and the dense motion trajectories through them.

Fine-grained Action Recognition Human-Object Interaction Detection +2

Crowded Scene Analysis: A Survey

no code implementations6 Feb 2015 Teng Li, Huan Chang, Meng Wang, Bingbing Ni, Richang Hong, Shuicheng Yan

Then, existing models, popular algorithms, evaluation protocols, as well as system performance are provided corresponding to different aspects of crowded scene analysis.

Anomaly Detection Survey

Cannot find the paper you are looking for? You can Submit a new open access paper.