Search Results for author: Yongdong Zhang

Found 114 papers, 59 papers with code

DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations

1 code implementation11 Mar 2024 Tianhao Qi, Shancheng Fang, Yanze Wu, Hongtao Xie, Jiawei Liu, Lang Chen, Qian He, Yongdong Zhang

The Q-Formers are trained using paired images rather than the identical target, in which the reference image and the ground-truth image are with the same style or semantics.

Disentanglement

RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization

no code implementations1 Mar 2024 Mengqi Huang, Zhendong Mao, Mingcong Liu, Qian He, Yongdong Zhang

However, the inherent entangled influence scope of pseudo-words with the given text results in a dual-optimum paradox, i. e., the similarity of the given subjects and the controllability of the given text could not be optimal simultaneously.

Alleviating Structural Distribution Shift in Graph Anomaly Detection

1 code implementation25 Jan 2024 Yuan Gao, Xiang Wang, Xiangnan He, Zhenguang Liu, Huamin Feng, Yongdong Zhang

Graph anomaly detection (GAD) is a challenging binary classification problem due to its different structural distribution between anomalies and normal nodes -- abnormal nodes are a minority, therefore holding high heterophily and low homophily compared to normal nodes.

Binary Classification Graph Anomaly Detection

Frequency Domain Modality-invariant Feature Learning for Visible-infrared Person Re-Identification

no code implementations3 Jan 2024 Yulin Li, Tianzhu Zhang, Yongdong Zhang

Visible-infrared person re-identification (VI-ReID) is challenging due to the significant cross-modality discrepancies between visible and infrared images.

Metric Learning Person Re-Identification

Grammatical Error Correction via Mixed-Grained Weighted Training

no code implementations23 Nov 2023 Jiahao Li, Quan Wang, Chiwei Zhu, Zhendong Mao, Yongdong Zhang

In this paper, the inherent discrepancies are manifested in two aspects, namely, accuracy of data annotation and diversity of potential annotations.

Grammatical Error Correction Sentence

On the Calibration of Large Language Models and Alignment

no code implementations22 Nov 2023 Chiwei Zhu, Benfeng Xu, Quan Wang, Yongdong Zhang, Zhendong Mao

As large language models attract increasing attention and find widespread application, concurrent challenges of reliability also arise at the same time.

Causality is all you need

no code implementations21 Nov 2023 Ning Xu, YiFei Gao, Hongshuo Tian, Yongdong Zhang, An-An Liu

In this paper, we propose the Causal Graph Routing (CGR) framework, an integrated causal scheme relying entirely on the intervention mechanisms to reveal the cause-effect forces hidden in data.

Document Classification

CARIS: Context-Augmented Referring Image Segmentation

1 code implementation ACM MM 2023 Sun-Ao Liu, Yiheng Zhang, Zhaofan Qiu, Hongtao Xie, Yongdong Zhang, Ting Yao

Technically, CARIS develops a context-aware mask decoder with sequential bidirectional cross-modal attention to integrate the linguistic features with visual context, which are then aligned with pixel-wise visual features.

Image Segmentation Segmentation +1

Air-Decoding: Attribute Distribution Reconstruction for Decoding-Time Controllable Text Generation

1 code implementation23 Oct 2023 Tianqi Zhong, Quan Wang, Jingxuan Han, Yongdong Zhang, Zhendong Mao

Then we design a novel attribute distribution reconstruction method to balance the obtained distributions and use the reconstructed distributions to guide language models for generation, effectively avoiding the issue of Attribute Collapse.

Attribute Text Generation

Promoting Generalization for Exact Solvers via Adversarial Instance Augmentation

no code implementations22 Oct 2023 Haoyang Liu, Yufei Kuang, Jie Wang, Xijun Li, Yongdong Zhang, Feng Wu

To tackle this problem, we propose a novel approach, which is called Adversarial Instance Augmentation and does not require to know the problem type for new instance generation, to promote data diversity for learning-based branching modules in the branch-and-bound (B&B) Solvers (AdaSolver).

Imitation Learning

Accelerate Presolve in Large-Scale Linear Programming via Reinforcement Learning

no code implementations18 Oct 2023 Yufei Kuang, Xijun Li, Jie Wang, Fangzhou Zhu, Meng Lu, Zhihai Wang, Jia Zeng, Houqiang Li, Yongdong Zhang, Feng Wu

Specifically, we formulate the routine design task as a Markov decision process and propose an RL framework with adaptive action sequences to generate high-quality presolve routines efficiently.

reinforcement-learning Reinforcement Learning (RL)

Dual-Stream Knowledge-Preserving Hashing for Unsupervised Video Retrieval

no code implementations12 Oct 2023 Pandeng Li, Hongtao Xie, Jiannan Ge, Lei Zhang, Shaobo Min, Yongdong Zhang

Hence, we address this problem by decomposing video information into reconstruction-dependent and semantic-dependent information, which disentangles the semantic extraction from reconstruction constraint.

Retrieval Semantic Retrieval +3

Symmetrical Linguistic Feature Distillation with CLIP for Scene Text Recognition

1 code implementation8 Oct 2023 Zixiao Wang, Hongtao Xie, Yuxin Wang, Jianjun Xu, Boqiang Zhang, Yongdong Zhang

In this paper, we explore the potential of the Contrastive Language-Image Pretraining (CLIP) model in scene text recognition (STR), and establish a novel Symmetrical Linguistic Feature Distillation framework (named CLIP-OCR) to leverage both visual and linguistic knowledge in CLIP.

Optical Character Recognition (OCR) Scene Text Recognition

A Deep Instance Generative Framework for MILP Solvers Under Limited Data Availability

1 code implementation NeurIPS 2023 Zijie Geng, Xijun Li, Jie Wang, Xiao Li, Yongdong Zhang, Feng Wu

In the past few years, there has been an explosive surge in the use of machine learning (ML) techniques to address combinatorial optimization (CO) problems, especially mixed-integer linear programs (MILPs).

Combinatorial Optimization

T2IW: Joint Text to Image & Watermark Generation

no code implementations7 Sep 2023 An-An Liu, Guokai Zhang, Yuting Su, Ning Xu, Yongdong Zhang, Lanjun Wang

Furthermore, we strengthen the watermark robustness of our approach by subjecting the compound image to various post-processing attacks, with minimal pixel distortion observed in the revealed watermark.

Image Generation

A Circuit Domain Generalization Framework for Efficient Logic Synthesis in Chip Design

1 code implementation22 Aug 2023 Zhihai Wang, Lei Chen, Jie Wang, Xing Li, Yinqi Bai, Xijun Li, Mingxuan Yuan, Jianye Hao, Yongdong Zhang, Feng Wu

In particular, we notice that the runtime of the Resub and Mfs2 operators often dominates the overall runtime of LS optimization processes.

Domain Generalization

Balanced Classification: A Unified Framework for Long-Tailed Object Detection

1 code implementation4 Aug 2023 Tianhao Qi, Hongtao Xie, Pandeng Li, Jiannan Ge, Yongdong Zhang

In this paper, we contend that the learning bias originates from two factors: 1) the unequal competition arising from the imbalanced distribution of foreground categories, and 2) the lack of sample diversity in tail categories.

Hallucination Long-tailed Object Detection +1

MomentDiff: Generative Video Moment Retrieval from Random to Real

1 code implementation NeurIPS 2023 Pandeng Li, Chen-Wei Xie, Hongtao Xie, Liming Zhao, Lei Zhang, Yun Zheng, Deli Zhao, Yongdong Zhang

Video moment retrieval pursues an efficient and generalized solution to identify the specific temporal segments within an untrimmed video that correspond to a given language description.

Moment Retrieval Retrieval

DreamIdentity: Improved Editability for Efficient Face-identity Preserved Image Generation

no code implementations1 Jul 2023 Zhuowei Chen, Shancheng Fang, Wei Liu, Qian He, Mengqi Huang, Yongdong Zhang, Zhendong Mao

While large-scale pre-trained text-to-image models can synthesize diverse and high-quality human-centric images, an intractable problem is how to preserve the face identity for conditioned face images.

Image Generation

ExpertPrompting: Instructing Large Language Models to be Distinguished Experts

1 code implementation24 May 2023 Benfeng Xu, An Yang, Junyang Lin, Quan Wang, Chang Zhou, Yongdong Zhang, Zhendong Mao

The answering quality of an aligned large language model (LLM) can be drastically improved if treated with proper crafting of prompts.

In-Context Learning Instruction Following +2

Not All Image Regions Matter: Masked Vector Quantization for Autoregressive Image Generation

1 code implementation CVPR 2023 Mengqi Huang, Zhendong Mao, Quan Wang, Yongdong Zhang

Existing autoregressive models follow the two-stage generation paradigm that first learns a codebook in the latent space for image reconstruction and then completes the image generation autoregressively based on the learned codebook.

Image Generation Image Reconstruction +1

Towards Accurate Image Coding: Improved Autoregressive Image Generation with Dynamic Vector Quantization

1 code implementation CVPR 2023 Mengqi Huang, Zhendong Mao, Zhuowei Chen, Yongdong Zhang

Existing vector quantization (VQ) based autoregressive models follow a two-stage generation paradigm that first learns a codebook to encode images as discrete codes, and then completes generation based on the learned codebook.

Image Generation Position +1

Linguistic More: Taking a Further Step toward Efficient and Accurate Scene Text Recognition

1 code implementation9 May 2023 Boqiang Zhang, Hongtao Xie, Yuxin Wang, Jianjun Xu, Yongdong Zhang

Vision model have gained increasing attention due to their simplicity and efficiency in Scene Text Recognition (STR) task.

Scene Text Recognition

Reformulating CTR Prediction: Learning Invariant Feature Interactions for Recommendation

1 code implementation26 Apr 2023 Yang Zhang, Tianhao Shi, Fuli Feng, Wenjie Wang, Dingxian Wang, Xiangnan He, Yongdong Zhang

However, such a manner inevitably learns unstable feature interactions, i. e., the ones that exhibit strong correlations in historical data but generalize poorly for future serving.

Click-Through Rate Prediction Disentanglement +1

$k$NN Prompting: Beyond-Context Learning with Calibration-Free Nearest Neighbor Inference

1 code implementation24 Mar 2023 Benfeng Xu, Quan Wang, Zhendong Mao, Yajuan Lyu, Qiaoqiao She, Yongdong Zhang

In-Context Learning (ICL), which formulates target tasks as prompt completion conditioned on in-context demonstrations, has become the prevailing utilization of LLMs.

In-Context Learning

Generalization in Visual Reinforcement Learning with the Reward Sequence Distribution

1 code implementation19 Feb 2023 Jie Wang, Rui Yang, Zijie Geng, Zhihao Shi, Mingxuan Ye, Qi Zhou, Shuiwang Ji, Bin Li, Yongdong Zhang, Feng Wu

The appealing features of RSD-OA include that: (1) RSD-OA is invariant to visual distractions, as it is conditioned on the predefined subsequent action sequence without task-irrelevant information from transition dynamics, and (2) the reward sequence captures long-term task-relevant information in both rewards and transition dynamics.

reinforcement-learning Reinforcement Learning (RL) +1

De Novo Molecular Generation via Connection-aware Motif Mining

1 code implementation2 Feb 2023 Zijie Geng, Shufang Xie, Yingce Xia, Lijun Wu, Tao Qin, Jie Wang, Yongdong Zhang, Feng Wu, Tie-Yan Liu

The obtained motif vocabulary consists of not only molecular motifs (i. e., the frequent fragments), but also their connection information, indicating how the motifs are connected with each other.

Learning Cut Selection for Mixed-Integer Linear Programming via Hierarchical Sequence Model

no code implementations1 Feb 2023 Zhihai Wang, Xijun Li, Jie Wang, Yufei Kuang, Mingxuan Yuan, Jia Zeng, Yongdong Zhang, Feng Wu

Cut selection -- which aims to select a proper subset of the candidate cuts to improve the efficiency of solving MILPs -- heavily depends on (P1) which cuts should be preferred, and (P2) how many cuts should be selected.

Learning Orthogonal Prototypes for Generalized Few-Shot Semantic Segmentation

1 code implementation CVPR 2023 Sun-Ao Liu, Yiheng Zhang, Zhaofan Qiu, Hongtao Xie, Yongdong Zhang, Ting Yao

POP builds a set of orthogonal prototypes, each of which represents a semantic class, and makes the prediction for each class separately based on the features projected onto its prototype.

Generalized Few-Shot Semantic Segmentation

Progressive Spatio-Temporal Prototype Matching for Text-Video Retrieval

1 code implementation ICCV 2023 Pandeng Li, Chen-Wei Xie, Liming Zhao, Hongtao Xie, Jiannan Ge, Yun Zheng, Deli Zhao, Yongdong Zhang

In the event-sentence prototype matching phase, we design a temporal prototype generation mechanism to associate intra-frame objects and interact inter-frame temporal relations.

Object Retrieval +2

Adaptive Template Transformer for Mitochondria Segmentation in Electron Microscopy Images

no code implementations ICCV 2023 Yuwen Pan, Naisong Luo, Rui Sun, Meng Meng, Tianzhu Zhang, Zhiwei Xiong, Yongdong Zhang

Mitochondria, as tiny structures within the cell, are of significant importance to study cell functions for biological and clinical analysis.

Dynamic Generative Targeted Attacks With Pattern Injection

no code implementations CVPR 2023 Weiwei Feng, Nanqing Xu, Tianzhu Zhang, Yongdong Zhang

Concretely, the former adopts a dynamic convolution kernel and a static convolution kernel for the specific instance and the global dataset, respectively, which can inherit the advantages of both instance-specific and instance-agnostic attacks.

Crossing the Gap: Domain Generalization for Image Captioning

no code implementations CVPR 2023 Yuchen Ren, Zhendong Mao, Shancheng Fang, Yan Lu, Tong He, Hao Du, Yongdong Zhang, Wanli Ouyang

In this paper, we introduce a new setting called Domain Generalization for Image Captioning (DGIC), where the data from the target domain is unseen in the learning process.

Domain Generalization Image Captioning +1

Learning Semantic Relationship Among Instances for Image-Text Matching

1 code implementation CVPR 2023 Zheren Fu, Zhendong Mao, Yan Song, Yongdong Zhang

Image-text matching, a bridge connecting image and language, is an important task, which generally learns a holistic cross-modal embedding to achieve a high-quality semantic alignment between the two modalities.

Image Retrieval Image-text matching +8

Exploring Stroke-Level Modifications for Scene Text Editing

1 code implementation5 Dec 2022 Yadong Qu, Qingfeng Tan, Hongtao Xie, Jianjun Xu, Yuxin Wang, Yongdong Zhang

Moreover, two new datasets (Tamper-Syn2k and Tamper-Scene) are proposed to fill the blank of public evaluation datasets.

Attribute Scene Text Editing

Intra-class Adaptive Augmentation with Neighbor Correction for Deep Metric Learning

1 code implementation29 Nov 2022 Zheren Fu, Zhendong Mao, Bo Hu, An-An Liu, Yongdong Zhang

They have overlooked the wide characteristic changes of different classes and can not model abundant intra-class variations for generations.

Image Augmentation Image Retrieval +5

ABINet++: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Spotting

1 code implementation19 Nov 2022 Shancheng Fang, Zhendong Mao, Hongtao Xie, Yuxin Wang, Chenggang Yan, Yongdong Zhang

In this paper, we argue that the limited capacity of language models comes from 1) implicit language modeling; 2) unidirectional feature representation; and 3) language model with noise input.

Blocking Language Modelling +2

Cross-Modality Transformer for Visible-Infrared Person Re-Identification

no code implementations ECCV 2022 Kongzhu Jiang, Tianzhu Zhang, Xiang Liu, Bingqiao Qian, Yongdong Zhang, Feng Wu ;

To alleviate the above issues, we propose a novel Cross-Modality Transformer (CMT) to jointly explore a modality-level alignment module and an instance-level module for VI-ReID.

Person Re-Identification

Improving Chinese Spelling Check by Character Pronunciation Prediction: The Effects of Adaptivity and Granularity

1 code implementation20 Oct 2022 Jiahao Li, Quan Wang, Zhendong Mao, Junbo Guo, Yanyan Yang, Yongdong Zhang

In this paper, we consider introducing an auxiliary task of Chinese pronunciation prediction (CPP) to improve CSC, and, for the first time, systematically discuss the adaptivity and granularity of this auxiliary task.

Bridging the Gap Between Vision Transformers and Convolutional Neural Networks on Small Datasets

1 code implementation12 Oct 2022 Zhiying Lu, Hongtao Xie, Chuanbin Liu, Yongdong Zhang

On channel aspect, we introduce a dynamic feature aggregation module in MLP and a brand new "head token" design in multi-head self-attention module to help re-calibrate channel representation and make different channel group representation interacts with each other.

Inductive Bias

REMOT: A Region-to-Whole Framework for Realistic Human Motion Transfer

no code implementations1 Sep 2022 Quanwei Yang, Xinchen Liu, Wu Liu, Hongtao Xie, Xiaoyan Gu, Lingyun Yu, Yongdong Zhang

Human Video Motion Transfer (HVMT) aims to, given an image of a source person, generate his/her video that imitates the motion of the driving person.

MAPLE: Masked Pseudo-Labeling autoEncoder for Semi-supervised Point Cloud Action Recognition

no code implementations1 Sep 2022 Xiaodong Chen, Wu Liu, Xinchen Liu, Yongdong Zhang, Jungong Han, Tao Mei

In DestFormer, the spatial and temporal dimensions of the 4D point cloud videos are decoupled to achieve efficient self-attention for learning both long-term and short-term features.

Action Recognition

MFAN: Multi-modal Feature-enhanced Attention Networks for Rumor Detection

1 code implementation 2022 2022 Jiaqi Zheng, Xi Zhang, Sanchuan Guo, Quan Wang, Wenyu Zang, Yongdong Zhang

Rumor spreaders are increasingly taking advantage of multimedia content to attract and mislead news consumers on social media.

Addressing Confounding Feature Issue for Causal Recommendation

1 code implementation13 May 2022 Xiangnan He, Yang Zhang, Fuli Feng, Chonggang Song, Lingling Yi, Guohui Ling, Yongdong Zhang

We demonstrate DCR on the backbone model of neural factorization machine (NFM), showing that DCR leads to more accurate prediction of user preference with small inference time cost.

Recommendation Systems

Rumor Detection with Self-supervised Learning on Texts and Social Graph

no code implementations19 Apr 2022 Yuan Gao, Xiang Wang, Xiangnan He, Huamin Feng, Yongdong Zhang

At the core is to model the rumor characteristics inherent in rich information, such as propagation patterns in social network and semantic patterns in post content, and differentiate them from the truth.

Self-Supervised Learning

Part-level Action Parsing via a Pose-guided Coarse-to-Fine Framework

no code implementations9 Mar 2022 Xiaodong Chen, Xinchen Liu, Wu Liu, Kun Liu, Dong Wu, Yongdong Zhang, Tao Mei

Therefore, researchers start to focus on a new task, Part-level Action Parsing (PAP), which aims to not only predict the video-level action but also recognize the frame-level fine-grained actions or interactions of body parts for each person in the video.

Action Parsing Action Recognition

Motion-Modulated Temporal Fragment Alignment Network for Few-Shot Action Recognition

no code implementations CVPR 2022 Jiamin Wu, Tianzhu Zhang, Zhe Zhang, Feng Wu, Yongdong Zhang

To address this issue, we propose an end-to-end Motion-modulated Temporal Fragment Alignment Network (MTFAN) by jointly exploring the task-specific motion modulation and the multi-level temporal fragment alignment for Few-Shot Action Recognition (FSAR).

Few-Shot action recognition Few Shot Action Recognition +1

Partial Class Activation Attention for Semantic Segmentation

1 code implementation CVPR 2022 Sun-Ao Liu, Hongtao Xie, Hai Xu, Yongdong Zhang, Qi Tian

Current attention-based methods for semantic segmentation mainly model pixel relation through pairwise affinity and coarse segmentation.

Relation Segmentation +1

From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network

4 code implementations ICCV 2021 Yuxin Wang, Hongtao Xie, Shancheng Fang, Jing Wang, Shenggao Zhu, Yongdong Zhang

Such operation guides the vision model to use not only the visual texture of characters, but also the linguistic information in visual context for recognition when the visual cues are confused (e. g. occlusion, noise, etc.).

Language Modelling Scene Text Recognition

Causal Incremental Graph Convolution for Recommender System Retraining

1 code implementation16 Aug 2021 Sihao Ding, Fuli Feng, Xiangnan He, Yong Liao, Jun Shi, Yongdong Zhang

Towards the goal, we propose a \textit{Causal Incremental Graph Convolution} approach, which consists of two new operators named \textit{Incremental Graph Convolution} (IGC) and \textit{Colliding Effect Distillation} (CED) to estimate the output of full graph convolution.

Causal Inference Recommendation Systems

PERT: A Progressively Region-based Network for Scene Text Removal

1 code implementation24 Jun 2021 Yuxin Wang, Hongtao Xie, Shancheng Fang, Yadong Qu, Yongdong Zhang

However, there exists two problems: 1) the implicit erasure guidance causes the excessive erasure to non-text areas; 2) the one-stage erasure lacks the exhaustive removal of text region.

Uncertainty Guided Collaborative Training for Weakly Supervised Temporal Action Detection

no code implementations CVPR 2021 Wenfei Yang, Tianzhu Zhang, Xiaoyuan Yu, Tian Qi, Yongdong Zhang, Feng Wu

To alleviate this problem, we propose a novel Uncertainty Guided Collaborative Training (UGCT) strategy, which mainly includes two key designs: (1) The first design is an online pseudo label generation module, in which the RGB and FLOW streams work collaboratively to learn from each other.

Action Detection Pseudo Label

Lesion-Aware Transformers for Diabetic Retinopathy Grading

no code implementations CVPR 2021 Rui Sun, Yihao Li, Tianzhu Zhang, Zhendong Mao, Feng Wu, Yongdong Zhang

First, to the best of our knowledge, this is the first work to formulate lesion discovery as a weakly supervised lesion localization problem via a transformer decoder.

Diabetic Retinopathy Grading

Diverse Part Discovery: Occluded Person Re-identification with Part-Aware Transformer

no code implementations CVPR 2021 Yulin Li, Jianfeng He, Tianzhu Zhang, Xiang Liu, Yongdong Zhang, Feng Wu

To address these issues, we propose a novel end-to-end Part-Aware Transformer (PAT) for occluded person Re-ID through diverse part discovery via a transformer encoderdecoder architecture, including a pixel context based transformer encoder and a part prototype based transformer decoder.

Person Re-Identification

Causal Intervention for Leveraging Popularity Bias in Recommendation

1 code implementation13 May 2021 Yang Zhang, Fuli Feng, Xiangnan He, Tianxin Wei, Chonggang Song, Guohui Ling, Yongdong Zhang

This work studies an unexplored problem in recommendation -- how to leverage popularity bias to improve the recommendation accuracy.

Collaborative Filtering Recommendation Systems

Action Unit Memory Network for Weakly Supervised Temporal Action Localization

no code implementations CVPR 2021 Wang Luo, Tianzhu Zhang, Wenfei Yang, Jingen Liu, Tao Mei, Feng Wu, Yongdong Zhang

In this paper, we present an Action Unit Memory Network (AUMN) for weakly supervised temporal action localization, which can mitigate the above two challenges by learning an action unit memory bank.

Weakly Supervised Action Localization Weakly-supervised Temporal Action Localization +1

Frequency-aware Discriminative Feature Learning Supervised by Single-Center Loss for Face Forgery Detection

no code implementations CVPR 2021 Jiaming Li, Hongtao Xie, Jiahong Li, Zhongyuan Wang, Yongdong Zhang

Face forgery detection is raising ever-increasing interest in computer vision since facial manipulation technologies cause serious worries.

Task-Aware Part Mining Network for Few-Shot Learning

no code implementations ICCV 2021 Jiamin Wu, Tianzhu Zhang, Yongdong Zhang, Feng Wu

The task-aware part filters can adapt to any individual task and automatically mine task-related local parts even for an unseen task.

Few-Shot Learning

Meta-Attack: Class-Agnostic and Model-Agnostic Physical Adversarial Attack

no code implementations ICCV 2021 Weiwei Feng, Baoyuan Wu, Tianzhu Zhang, Yong Zhang, Yongdong Zhang

To tackle these issues, we propose a class-agnostic and model-agnostic physical adversarial attack model (Meta-Attack), which is able to not only generate robust physical adversarial examples by simulating color and shape distortions, but also generalize to attacking novel images and novel DNN models by accessing a few digital and physical images.

Adversarial Attack Few-Shot Learning

Foreground Activation Maps for Weakly Supervised Object Localization

no code implementations ICCV 2021 Meng Meng, Tianzhu Zhang, Qi Tian, Yongdong Zhang, Feng Wu

To the best of our knowledge, this is the first work that can achieve remarkable performance for both tasks by optimizing them jointly via FAM for WSOL.

Classification Object +1

Hierarchical Granularity Transfer Learning

no code implementations NeurIPS 2020 Shaobo Min, Hongtao Xie, Hantao Yao, Xuran Deng, Zheng-Jun Zha, Yongdong Zhang

In this paper, we introduce a new task, named Hierarchical Granularity Transfer Learning (HGTL), to recognize sub-level categories with basic-level annotations and semantic descriptions for hierarchical categories.

Transfer Learning

CatGCN: Graph Convolutional Networks with Categorical Node Features

1 code implementation11 Sep 2020 Weijian Chen, Fuli Feng, Qifan Wang, Xiangnan He, Chonggang Song, Guohui Ling, Yongdong Zhang

In this paper, we propose a new GCN model named CatGCN, which is tailored for graph learning when the node features are categorical.

Graph Learning Node Classification +1

Depth image denoising using nuclear norm and learning graph model

no code implementations9 Aug 2020 Chenggang Yan, Zhisheng Li, Yongbing Zhang, Yutao Liu, Xiangyang Ji, Yongdong Zhang

The depth images denoising are increasingly becoming the hot research topic nowadays because they reflect the three-dimensional (3D) scene and can be applied in various fields of computer vision.

Image Denoising Image Restoration

Curriculum Learning for Natural Language Understanding

no code implementations ACL 2020 Benfeng Xu, Licheng Zhang, Zhendong Mao, Quan Wang, Hongtao Xie, Yongdong Zhang

With the great success of pre-trained language models, the pretrain-finetune paradigm now becomes the undoubtedly dominant solution for natural language understanding (NLU) tasks.

Natural Language Understanding

Attribute-Induced Bias Eliminating for Transductive Zero-Shot Learning

no code implementations31 May 2020 Hantao Yao, Shaobo Min, Yongdong Zhang, Changsheng Xu

Then, an attentional graph attribute embedding is proposed to reduce the semantic bias between seen and unseen categories, which utilizes the graph operation to capture the semantic relationship between categories.

Attribute Transfer Learning +1

How to Retrain Recommender System? A Sequential Meta-Learning Method

1 code implementation27 May 2020 Yang Zhang, Fuli Feng, Chenxu Wang, Xiangnan He, Meng Wang, Yan Li, Yongdong Zhang

Nevertheless, normal training on new data only may easily cause overfitting and forgetting issues, since the new data is of a smaller scale and contains fewer information on long-term user preference.

Meta-Learning Recommendation Systems

ContourNet: Taking a Further Step toward Accurate Arbitrary-shaped Scene Text Detection

1 code implementation CVPR 2020 Yuxin Wang, Hongtao Xie, Zheng-Jun Zha, Mengting Xing, Zilong Fu, Yongdong Zhang

Then a novel Local Orthogonal Texture-aware Module (LOTM) models the local texture information of proposal features in two orthogonal directions and represents text region with a set of contour points.

Region Proposal Scene Text Detection +1

Graph Structured Network for Image-Text Matching

1 code implementation CVPR 2020 Chunxiao Liu, Zhendong Mao, Tianzhu Zhang, Hongtao Xie, Bin Wang, Yongdong Zhang

The GSMN explicitly models object, relation and attribute as a structured phrase, which not only allows to learn correspondence of object, relation and attribute separately, but also benefits to learn fine-grained correspondence of structured phrase.

Attribute Image-text matching +3

Domain-aware Visual Bias Eliminating for Generalized Zero-Shot Learning

1 code implementation CVPR 2020 Shaobo Min, Hantao Yao, Hongtao Xie, Chaoqun Wang, Zheng-Jun Zha, Yongdong Zhang

Recent methods focus on learning a unified semantic-aligned visual representation to transfer knowledge between two domains, while ignoring the effect of semantic-free visual representation in alleviating the biased recognition problem.

Generalized Zero-Shot Learning

Multi-Objective Matrix Normalization for Fine-grained Visual Recognition

1 code implementation30 Mar 2020 Shaobo Min, Hantao Yao, Hongtao Xie, Zheng-Jun Zha, Yongdong Zhang

In this paper, we propose an efficient Multi-Objective Matrix Normalization (MOMN) method that can simultaneously normalize a bilinear representation in terms of square-root, low-rank, and sparsity.

Fine-Grained Visual Recognition

Bilinear Graph Neural Network with Neighbor Interactions

1 code implementation10 Feb 2020 Hongmin Zhu, Fuli Feng, Xiangnan He, Xiang Wang, Yan Li, Kai Zheng, Yongdong Zhang

We term this framework as Bilinear Graph Neural Network (BGNN), which improves GNN representation ability with bilinear interactions between neighbor nodes.

General Classification Node Classification

LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation

16 code implementations6 Feb 2020 Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yongdong Zhang, Meng Wang

We propose a new model named LightGCN, including only the most essential component in GCN -- neighborhood aggregation -- for collaborative filtering.

Collaborative Filtering Graph Classification +1

Asymmetric GAN for Unpaired Image-to-image Translation

no code implementations25 Dec 2019 Yu Li, Sheng Tang, Rui Zhang, Yongdong Zhang, Jintao Li, Shuicheng Yan

While in situations where two domains are asymmetric in complexity, i. e., the amount of information between two domains is different, these approaches pose problems of poor generation quality, mapping ambiguity, and model sensitivity.

Image-to-Image Translation Translation

Scheduled Differentiable Architecture Search for Visual Recognition

no code implementations23 Sep 2019 Zhaofan Qiu, Ting Yao, Yiheng Zhang, Yongdong Zhang, Tao Mei

Moreover, we enlarge the search space of SDAS particularly for video recognition by devising several unique operations to encode spatio-temporal dynamics and demonstrate the impact in affecting the architecture search of SDAS.

Video Recognition

ACE-Net: Biomedical Image Segmentation with Augmented Contracting and Expansive Paths

no code implementations23 Aug 2019 Yanhao Zhu, Zhineng Chen, Shuai Zhao, Hongtao Xie, Wenming Guo, Yongdong Zhang

Nowadays U-net-like FCNs predominate various biomedical image segmentation applications and attain promising performance, largely due to their elegant architectures, e. g., symmetric contracting and expansive paths as well as lateral skip-connections.

Image Segmentation Segmentation +1

Domain-Specific Embedding Network for Zero-Shot Recognition

1 code implementation12 Aug 2019 Shaobo Min, Hantao Yao, Hongtao Xie, Zheng-Jun Zha, Yongdong Zhang

In contrast to previous methods, the DSEN decomposes the domain-shared projection function into one domain-invariant and two domain-specific sub-functions to explore the similarities and differences between two domains.

Zero-Shot Learning

Consensus Feature Network for Scene Parsing

no code implementations29 Jul 2019 Tianyi Wu, Sheng Tang, Rui Zhang, Guodong Guo, Yongdong Zhang

However, classification networks are dominated by the discriminative portion, so directly applying classification networks to scene parsing will result in inconsistent parsing predictions within one instance and among instances of the same category.

General Classification Scene Parsing

Dense Scale Network for Crowd Counting

1 code implementation24 Jun 2019 Feng Dai, Hao liu, Yike Ma, Juan Cao, Qiang Zhao, Yongdong Zhang

The key component of our network is the dense dilated convolution block, in which each dilation layer is densely connected with the others to preserve information from continuously varied scales.

Crowd Counting

Context-Aware Visual Policy Network for Fine-Grained Image Captioning

1 code implementation6 Jun 2019 Zheng-Jun Zha, Daqing Liu, Hanwang Zhang, Yongdong Zhang, Feng Wu

With the maturity of visual detection techniques, we are more ambitious in describing visual content with open-vocabulary, fine-grained and free-form language, i. e., the task of image captioning.

Image Captioning Image Paragraph Captioning +2

Relational Collaborative Filtering:Modeling Multiple Item Relations for Recommendation

2 code implementations29 Apr 2019 Xin Xin, Xiangnan He, Yongfeng Zhang, Yongdong Zhang, Joemon Jose

In this work, we propose Relational Collaborative Filtering (RCF), a general framework to exploit multiple relations between items in recommender system.

Collaborative Filtering Recommendation Systems +1

Not All Words are Equal: Video-specific Information Loss for Video Captioning

no code implementations1 Jan 2019 Jiarong Dong, Ke Gao, Xiaokai Chen, Junbo Guo, Juan Cao, Yongdong Zhang

To address this issue, we propose a novel learning strategy called Information Loss, which focuses on the relationship between the video-specific visual content and corresponding representative words.

Video Captioning

CGNet: A Light-weight Context Guided Network for Semantic Segmentation

4 code implementations20 Nov 2018 Tianyi Wu, Sheng Tang, Rui Zhang, Yongdong Zhang

To tackle this problem, we propose a novel Context Guided Network (CGNet), which is a light-weight and efficient network for semantic segmentation.

Segmentation Semantic Segmentation

CA3Net: Contextual-Attentional Attribute-Appearance Network for Person Re-Identification

no code implementations19 Nov 2018 Jiawei Liu, Zheng-Jun Zha, Hongtao Xie, Zhiwei Xiong, Yongdong Zhang

An appearance network is developed to learn appearance features from the full body, horizontal and vertical body parts of pedestrians with spatial dependencies among body parts.

Attribute Multi-Task Learning +1

Context-Aware Visual Policy Network for Sequence-Level Image Captioning

1 code implementation16 Aug 2018 Daqing Liu, Zheng-Jun Zha, Hanwang Zhang, Yongdong Zhang, Feng Wu

To fill the gap, we propose a Context-Aware Visual Policy network (CAVP) for sequence-level image captioning.

Image Captioning Reinforcement Learning (RL)

A Two-Stream Mutual Attention Network for Semi-supervised Biomedical Segmentation with Noisy Labels

no code implementations31 Jul 2018 Shaobo Min, Xuejin Chen, Zheng-Jun Zha, Feng Wu, Yongdong Zhang

\begin{abstract} Learning-based methods suffer from a deficiency of clean annotations, especially in biomedical segmentation.

Sequential Prediction of Social Media Popularity with Deep Temporal Context Networks

1 code implementation12 Dec 2017 Bo Wu, Wen-Huang Cheng, Yongdong Zhang, Qiushi Huang, Jintao Li, Tao Mei

With a joint embedding network, we obtain a unified deep representation of multi-modal user-post data in a common embedding space.

Social Media Popularity Prediction

Time Matters: Multi-scale Temporalization of Social Media Popularity

no code implementations12 Dec 2017 Bo Wu, Wen-Huang Cheng, Yongdong Zhang, Tao Mei

We evaluate our approach on two large-scale Flickr image datasets with over 1. 8 million photos in total, for the task of popularity prediction.

Social Media Popularity Prediction

Multimodal Fusion with Recurrent Neural Networks for Rumor Detection on Microblogs

no code implementations Mountain View, CA, USA 2017 Zhiwei Jin, Juan Cao, Han Guo, Yongdong Zhang

In this paper, we propose a novel Recurrent Neural Network with an at- tention mechanism (att-RNN) to fuse multimodal features for e ective rumor detection.

Scale-Adaptive Convolutions for Scene Parsing

no code implementations ICCV 2017 Rui Zhang, Sheng Tang, Yongdong Zhang, Jintao Li, Shuicheng Yan

Through adding a new scale regression layer, we can dynamically infer the position-adaptive scale coefficients which are adopted to resize the convolutional patches.

regression Scene Parsing

APE-GAN: Adversarial Perturbation Elimination with GAN

3 code implementations18 Jul 2017 Shiwei Shen, Guoqing Jin, Ke Gao, Yongdong Zhang

Although neural networks could achieve state-of-the-art performance while recongnizing images, they often suffer a tremendous defeat from adversarial examples--inputs generated by utilizing imperceptible but intentional perturbation to clean samples from the datasets.

One-Shot Fine-Grained Instance Retrieval

no code implementations4 Jul 2017 Hantao Yao, Shiliang Zhang, Yongdong Zhang, Jintao Li, Qi Tian

Aiming to conquer this issue, we propose a retrieval task named One-Shot Fine-Grained Instance Retrieval (OSFGIR).

Fine-Grained Visual Categorization Image Retrieval +1

Deep Representation Learning with Part Loss for Person Re-Identification

no code implementations4 Jul 2017 Hantao Yao, Shiliang Zhang, Yongdong Zhang, Jintao Li, Qi Tian

The representation learning risk is evaluated by the proposed part loss, which automatically generates several parts for an image, and computes the person classification loss on each part separately.

Classification General Classification +2

Task-Driven Dynamic Fusion: Reducing Ambiguity in Video Description

no code implementations CVPR 2017 Xishan Zhang, Ke Gao, Yongdong Zhang, Dongming Zhang, Jintao Li, Qi Tian

This paper contributes to: 1)The first in-depth study of the weakness inherent in data-driven static fusion methods for video captioning.

Video Captioning Video Description

DR2-Net: Deep Residual Reconstruction Network for Image Compressive Sensing

1 code implementation19 Feb 2017 Hantao Yao, Feng Dai, Dongming Zhang, Yike Ma, Shiliang Zhang, Yongdong Zhang, Qi Tian

Accordingly, DR$^{2}$-Net consists of two components, \emph{i. e.,} linear mapping network and residual network, respectively.

Compressive Sensing Image Reconstruction

Image Credibility Analysis with Effective Domain Transferred Deep Networks

no code implementations16 Nov 2016 Zhiwei Jin, Juan Cao, Jiebo Luo, Yongdong Zhang

In order to overcome the scarcity of training samples of fake images, we first construct a large-scale auxiliary dataset indirectly related to this task.

Image Classification Transfer Learning

AC-BLSTM: Asymmetric Convolutional Bidirectional LSTM Networks for Text Classification

1 code implementation7 Nov 2016 Depeng Liang, Yongdong Zhang

Recently deeplearning models have been shown to be capable of making remarkable performance in sentences and documents classification tasks.

General Classification Sentence Embeddings +3

Scene-adaptive Coded Apertures Imaging

no code implementations19 Jun 2015 Xuehui Wang, Jinli Suo, Jingyi Yu, Yongdong Zhang, Qionghai Dai

Firstly, we capture the scene with a pinhole and analyze the scene content to determine primary edge orientations.

Multi-Task Deep Visual-Semantic Embedding for Video Thumbnail Selection

no code implementations CVPR 2015 Wu Liu, Tao Mei, Yongdong Zhang, Cherry Che, Jiebo Luo

Given the tremendous growth of online videos, video thumbnail, as the common visualization form of video content, is becoming increasingly important to influence user's browsing and searching experience.

Multi-Task Learning

Binary Code Ranking with Weighted Hamming Distance

no code implementations CVPR 2013 Lei Zhang, Yongdong Zhang, Jinhu Tang, Ke Lu, Qi Tian

In this paper, we propose a weighted Hamming distance ranking algorithm (WhRank) to rank the binary codes of hashing methods.

Cannot find the paper you are looking for? You can Submit a new open access paper.