Search Results for author: Yong Rui

Found 25 papers, 5 papers with code

Structure Matters: Tackling the Semantic Discrepancy in Diffusion Models for Image Inpainting

1 code implementation • 29 Mar 2024 • Haipeng Liu, Yang Wang, Biao Qian, Meng Wang, Yong Rui

Denoising diffusion probabilistic models for image inpainting aim to add the noise to the texture of image during the forward process and recover masked regions with unmasked ones of the texture via the reverse denoising process. Despite the meaningful semantics generation, the existing arts suffer from the semantic discrepancy between masked and unmasked regions, since the semantically dense unmasked texture fails to be completely degraded while the masked regions turn to the pure noise in diffusion process, leading to the large discrepancy between them.

Denoising Image Inpainting

Paper
Code

DomainVerse: A Benchmark Towards Real-World Distribution Shifts For Tuning-Free Adaptive Domain Generalization

no code implementations • 5 Mar 2024 • Feng Hou, Jin Yuan, Ying Yang, Yang Liu, Yang Zhang, Cheng Zhong, Zhongchao shi, Jianping Fan, Yong Rui, Zhiqiang He

With the recent advance of vision-language models (VLMs), viewed as natural source models, the cross-domain task changes to directly adapt the pre-trained source model to arbitrary target domains equipped with prior domain knowledge, and we name this task Adaptive Domain Generalization (ADG).

Domain Generalization

Paper
Add Code

A Survey on Video Moment Localization

no code implementations • 13 Jun 2023 • Meng Liu, Liqiang Nie, Yunxiao Wang, Meng Wang, Yong Rui

Video moment localization, also known as video moment retrieval, aiming to search a target segment within a video described by a given natural language query.

Moment Retrieval Retrieval +1

Paper
Add Code

Epistemic Graph: A Plug-And-Play Module For Hybrid Representation Learning

no code implementations • 30 May 2023 • Jin Yuan, Yang Zhang, Yangzhou Du, Zhongchao shi, Xin Geng, Jianping Fan, Yong Rui

In this paper, a novel Epistemic Graph Layer (EGLayer) is introduced to enable hybrid learning, enhancing the exchange of information between deep features and a structured knowledge graph.

Few-Shot Learning Knowledge Graphs +1

Paper
Add Code

Learning cross space mapping via DNN using large scale click-through logs

no code implementations • 26 Feb 2023 • Wei Yu, Kuiyuan Yang, Yalong Bai, Hongxun Yao, Yong Rui

The image and query are mapped to a common vector space via these two parts respectively, and image-query similarity is naturally defined as an inner product of their mappings in the space.

Image Classification Image Retrieval +1

Paper
Add Code

Delving Globally into Texture and Structure for Image Inpainting

1 code implementation • 17 Sep 2022 • Haipeng Liu, Yang Wang, Meng Wang, Yong Rui

Our model is orthogonal to the fashionable arts, such as Convolutional Neural Networks (CNNs), Attention and Transformer model, from the perspective of texture and structure information for image inpainting.

Image Inpainting

Paper
Code

Self-Supervised Graph Neural Network for Multi-Source Domain Adaptation

1 code implementation • 8 Apr 2022 • Jin Yuan, Feng Hou, Yangzhou Du, Zhongchao shi, Xin Geng, Jianping Fan, Yong Rui

Domain adaptation (DA) tries to tackle the scenarios when the test data does not fully follow the same distribution of the training data, and multi-source domain adaptation (MSDA) is very attractive for real world applications.

Domain Adaptation Self-Supervised Learning +1

Paper
Code

Graph Attention Transformer Network for Multi-Label Image Classification

1 code implementation • 8 Mar 2022 • Jin Yuan, Shikai Chen, Yao Zhang, Zhongchao shi, Xin Geng, Jianping Fan, Yong Rui

Subsequently, we design the graph attention transformer layer to transfer this adjacency matrix to adapt to the current domain.

Classification Graph Attention +2

Paper
Code

A Distributed Approach towards Discriminative Distance Metric Learning

no code implementations • 11 May 2019 • Jun Li, Xun Lin, Xiaoguang Rui, Yong Rui, DaCheng Tao

Distance metric learning is successful in discovering intrinsic relations in data.

Metric Learning

Paper
Add Code

A Survey on Food Computing

no code implementations • 22 Aug 2018 • Weiqing Min, Shuqiang Jiang, Linhu Liu, Yong Rui, Ramesh Jain

This is the first comprehensive survey that targets the study of computing technology for the food area and also offers a collection of research studies and technologies to benefit researchers and practitioners working in different food-related fields.

Computers and Society Multimedia

Paper
Add Code

AI Oriented Large-Scale Video Management for Smart City: Technologies, Standards and Beyond

no code implementations • 5 Dec 2017 • Ling-Yu Duan, Yihang Lou, Shiqi Wang, Wen Gao, Yong Rui

To practically facilitate deep neural network models in the large-scale video analysis, there are still unprecedented challenges for the large-scale video data management.

Management

Paper
Add Code

Multi-Level Attention Networks for Visual Question Answering

no code implementations • CVPR 2017 • Dongfei Yu, Jianlong Fu, Tao Mei, Yong Rui

To solve the challenges, we propose a multi-level attention network for visual question answering that can simultaneously reduce the semantic gap by semantic attention and benefit fine-grained spatial inference by visual attention.

Question Answering Visual Question Answering

Paper
Add Code

Enhancing Person Re-identification in a Self-trained Subspace

1 code implementation • 20 Apr 2017 • Xun Yang, Meng Wang, Richang Hong, Qi Tian, Yong Rui

To address this problem, in this paper, we propose a self-trained subspace learning paradigm for person re-ID which effectively utilizes both labeled and unlabeled data to learn a discriminative subspace where person images across disjoint camera views can be easily matched.

Person Re-Identification

Paper
Code

MSR-VTT: A Large Video Description Dataset for Bridging Video and Language

no code implementations • CVPR 2016 • Jun Xu, Tao Mei, Ting Yao, Yong Rui

In this paper we present MSR-VTT (standing for "ABC-Video to Text") which is a new large-scale video benchmark for video understanding, especially the emerging task of translating video to text.

Image Captioning Sentence +2

Paper
Add Code

Joint Multiview Segmentation and Localization of RGB-D Images Using Depth-Induced Silhouette Consistency

no code implementations • CVPR 2016 • Chi Zhang, Zhiwei Li, Rui Cai, Hongyang Chao, Yong Rui

In this paper, we propose an RGB-D camera localization approach which takes an effective geometry constraint, i. e. silhouette consistency, into consideration.

Camera Localization Image Segmentation +2

Paper
Add Code

Highlight Detection With Pairwise Deep Ranking for First-Person Video Summarization

no code implementations • CVPR 2016 • Ting Yao, Tao Mei, Yong Rui

The emergence of wearable devices such as portable cameras and smart glasses makes it possible to record life logging first-person videos.

Highlight Detection Video Summarization

Paper
Add Code

Network Morphism

no code implementations • 5 Mar 2016 • Tao Wei, Changhu Wang, Yong Rui, Chang Wen Chen

The second requirement for this network morphism is its ability to deal with non-linearity in a network.

MORPH

Paper
Add Code

Relaxing From Vocabulary: Robust Weakly-Supervised Deep Learning for Vocabulary-Free Image Tagging

no code implementations • ICCV 2015 • Jianlong Fu, Yue Wu, Tao Mei, Jinqiao Wang, Hanqing Lu, Yong Rui

The development of deep learning has empowered machines with comparable capability of recognizing limited image categories to human beings.

Paper
Add Code

MeshStereo: A Global Stereo Model With Mesh Alignment Regularization for View Interpolation

no code implementations • ICCV 2015 • Chi Zhang, Zhiwei Li, Yanhua Cheng, Rui Cai, Hongyang Chao, Yong Rui

We present a novel global stereo model designed for view interpolation.

Stereo Matching Stereo Matching Hand

Paper
Add Code

Query Adaptive Similarity Measure for RGB-D Object Recognition

no code implementations • ICCV 2015 • Yanhua Cheng, Rui Cai, Chi Zhang, Zhiwei Li, Xin Zhao, Kaiqi Huang, Yong Rui

The reasons are in two-fold: (1) existing similarity measures are sensitive to object pose and scale changes, as well as intra-class variations; and (2) effectively fusing RGB and depth cues is still an open problem.

Object Object Recognition

Paper
Add Code

Automatically Solving Number Word Problems by Semantic Parsing and Reasoning

no code implementations • EMNLP 2015 • Shuming Shi, Yuehui Wang, Chin-Yew Lin, Xiaojiang Liu, Yong Rui

Semantic Parsing

Paper
Add Code

Jointly Modeling Embedding and Translation to Bridge Video and Language

no code implementations • CVPR 2016 • Yingwei Pan, Tao Mei, Ting Yao, Houqiang Li, Yong Rui

Our proposed LSTM-E consists of three components: a 2-D and/or 3-D deep convolutional neural networks for learning powerful video representation, a deep RNN for generating sentences, and a joint embedding model for exploring the relationships between visual content and sentence semantics.

Sentence Translation