RBF-Softmax: Learning Deep Representative Prototypes with Radial Basis Function Softmax

1 code implementation ECCV 2020 Xiao Zhang, Rui Zhao, Yu Qiao, Hongsheng Li

To address this problem, this paper introduces a novel Radial Basis Function (RBF) distances to replace the commonly used inner products in the softmax loss function, such that it can adaptively assign losses to regularize the intra-class and inter-class distances by reshaping the relative differences, and thus creating more representative prototypes of classes to improve optimization.

FastFlow: Unsupervised Anomaly Detection and Localization via 2D Normalizing Flows

no code implementations15 Nov 2021 Jiawei Yu, Ye Zheng, Xiang Wang, Wei Li, Yushuang Wu, Rui Zhao, Liwei Wu

However, current methods can not effectively map image features to a tractable base distribution and ignore the relationship between local and global features which are important to identify anomalies.

 Ranked #1 on Anomaly Detection on MVTec AD (using extra training data)

Unsupervised Anomaly Detection

Boundary Distribution Estimation to Precise Object Detection

no code implementations2 Nov 2021 Haoran Zhou, Hang Huang, Rui Zhao, Wei Wang, Qingguo Zhou

In principal modern detectors, the task of object localization is implemented by the box subnet which concentrates on bounding box regression.

Object Detection Object Localization

Focus Your Distribution: Coarse-to-Fine Non-Contrastive Learning for Anomaly Detection and Localization

no code implementations9 Oct 2021 Ye Zheng, Xiang Wang, Rui Deng, Tianpeng Bao, Rui Zhao, Liwei Wu

To facilitate the learning with only normal images, we propose a new pretext task called non-contrastive learning for the fine alignment stage.

Ranked #7 on Anomaly Detection on MVTec AD (using extra training data)

Contrastive Learning Unsupervised Anomaly Detection

SCFlow: Optical Flow Estimation for Spiking Camera

no code implementations8 Oct 2021 Liwen Hu, Rui Zhao, Ziluo Ding, Ruiqin Xiong, Lei Ma, Tiejun Huang

Optical flow estimation has achieved remarkable success in image-based and event-based vision, but % existing methods cannot be directly applied in spike stream from spiking camera.

Event-based vision Motion Estimation +1

Dr.Aid: Supporting Data-governance Rule Compliance for Decentralized Collaboration in an Automated Way

no code implementations3 Oct 2021 Rui Zhao, Malcolm Atkinson, Petros Papapanagiotou, Federica Magnoni, Jacques Fleuriot

It depends on federations sharing data that often have governance rules or external regulations restricting their use.

Multi-Source Video Domain Adaptation with Temporal Attentive Moment Alignment

no code implementations21 Sep 2021 Yuecong Xu, Jianfei Yang, Haozhi Cao, Keyu Wu, Min Wu, Rui Zhao, Zhenghua Chen

Multi-Source Domain Adaptation (MSDA) is a more practical domain adaptation scenario in real-world scenarios.

Unsupervised Domain Adaptation

An Automated Framework for Supporting Data-Governance Rule Compliance in Decentralized MIMO Contexts

no code implementations2 Sep 2021 Rui Zhao

We propose Dr. Aid, a logic-based AI framework for automated compliance checking of data governance rules over data-flow graphs.

MST: Masked Self-Supervised Transformer for Visual Representation

no code implementations NeurIPS 2021 Zhaowen Li, Zhiyang Chen, Fan Yang, Wei Li, Yousong Zhu, Chaoyang Zhao, Rui Deng, Liwei Wu, Rui Zhao, Ming Tang, Jinqiao Wang

More importantly, the masked tokens together with the remaining tokens are further recovered by a global image decoder, which preserves the spatial information of the image and is more friendly to the downstream dense prediction tasks.

Language Modelling Object Detection +1

Improving Facial Attribute Recognition by Group and Graph Learning

no code implementations28 May 2021 Zhenghao Chen, Shuhang Gu, Feng Zhu, Jing Xu, Rui Zhao

For the spatial correlation, we aggregate attributes with spatial similarity into a part-based group and then introduce a Group Attention Learning to generate the group attention and the part-based group feature.

Graph Learning

Neighbourhood-guided Feature Reconstruction for Occluded Person Re-Identification

no code implementations16 May 2021 Shijie Yu, Dapeng Chen, Rui Zhao, Haobin Chen, Yu Qiao

Person images captured by surveillance cameras are often occluded by various obstacles, which lead to defective feature representation and harm person re-identification (Re-ID) performance.

Person Re-Identification

On Addressing Practical Challenges for RNN-Transducer

no code implementations27 Apr 2021 Rui Zhao, Jian Xue, Jinyu Li, Wenning Wei, Lei He, Yifan Gong

The first challenge is solved with a splicing data method which concatenates the speech segments extracted from the source domain data.

Speech Recognition

Memory Enhanced Embedding Learning for Cross-Modal Video-Text Retrieval

no code implementations29 Mar 2021 Rui Zhao, Kecheng Zheng, Zheng-Jun Zha, Hongtao Xie, Jiebo Luo

The cross-modal memory module is employed to record the instance embeddings of all the datasets for global negative mining.

Video-Text Retrieval

Mutual Information State Intrinsic Control

2 code implementations ICLR 2021 Rui Zhao, Yang Gao, Pieter Abbeel, Volker Tresp, Wei Xu

Reinforcement learning has been shown to be highly successful at many challenging tasks.

Progressive Correspondence Pruning by Consensus Learning

no code implementations ICCV 2021 Chen Zhao, Yixiao Ge, Feng Zhu, Rui Zhao, Hongsheng Li, Mathieu Salzmann

Correspondence selection aims to correctly select the consistent matches (inliers) from an initial set of putative correspondences.

Denoising Pose Estimation

Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition

no code implementations3 Nov 2020 Zhong Meng, Sarangarajan Parthasarathy, Eric Sun, Yashesh Gaur, Naoyuki Kanda, Liang Lu, Xie Chen, Rui Zhao, Jinyu Li, Yifan Gong

The external language models (LM) integration remains a challenging task for end-to-end (E2E) automatic speech recognition (ASR) which has no clear division between acoustic and language models.

automatic-speech-recognition End-To-End Speech Recognition +2

The Vulnerability of the Neural Networks Against Adversarial Examples in Deep Learning Algorithms

no code implementations2 Nov 2020 Rui Zhao

Based on current security threats faced by deep learning, this paper introduces the problem of adversarial examples in deep learning, sorts out the existing attack and defense methods of the black box and white box, and classifies them.

Enhancing and Learning Denoiser without Clean Reference

no code implementations9 Sep 2020 Rui Zhao, Daniel P. K. Lun, Kin-Man Lam

Recent studies on learning-based image denoising have achieved promising performance on various noise reduction tasks.

Image Denoising

Transfer Learning Approaches for Streaming End-to-End Speech Recognition System

no code implementations12 Aug 2020 Vikas Joshi, Rui Zhao, Rupesh R. Mehta, Kshitiz Kumar, Jinyu Li

Transfer learning (TL) is widely used in conventional hybrid automatic speech recognition (ASR) system, to transfer the knowledge from source to target language.

automatic-speech-recognition End-To-End Speech Recognition +2

Deep Reinforcement Learning Based Mobile Edge Computing for Intelligent Internet of Things

no code implementations1 Aug 2020 Rui Zhao, Xinjie Wang, Junjuan Xia, Liseng Fan

In particular, the system cost of latency and energy consumption can be reduced significantly by the proposed deep reinforcement learning based algorithm.


Developing RNN-T Models Surpassing High-Performance Hybrid Models with Customization Capability

no code implementations30 Jul 2020 Jinyu Li, Rui Zhao, Zhong Meng, Yanqing Liu, Wenning Wei, Sarangarajan Parthasarathy, Vadim Mazalov, Zhenghao Wang, Lei He, Sheng Zhao, Yifan Gong

Because of its streaming nature, recurrent neural network transducer (RNN-T) is a very promising end-to-end (E2E) model that may replace the popular hybrid model for automatic speech recognition.

automatic-speech-recognition Speech Recognition

Learning Individualized Treatment Rules with Estimated Translated Inverse Propensity Score

1 code implementation2 Jul 2020 Zhiliang Wu, Yinchong Yang, Yunpu Ma, Yushan Liu, Rui Zhao, Michael Moor, Volker Tresp

Randomized controlled trials typically analyze the effectiveness of treatments with the goal of making treatment recommendations for patient subgroups.

Enhancement of a CNN-Based Denoiser Based on Spatial and Spectral Analysis

no code implementations28 Jun 2020 Rui Zhao, Kin-Man Lam, Daniel P. K. Lun

Since most of the content or energy of natural images resides in the low-frequency spectrum, their transformed coefficients in the frequency domain are highly imbalanced.

Image Denoising

Continual Representation Learning for Biometric Identification

1 code implementation8 Jun 2020 Bo Zhao, Shixiang Tang, Dapeng Chen, Hakan Bilen, Rui Zhao

With the explosion of digital data in recent years, continuously learning new tasks from a stream of data without forgetting previously acquired knowledge has become increasingly important.

Continual Learning Knowledge Distillation +1

Self-supervising Fine-grained Region Similarities for Large-scale Image Localization

1 code implementation ECCV 2020 Yixiao Ge, Haibo Wang, Feng Zhu, Rui Zhao, Hongsheng Li

The task of large-scale retrieval-based image localization is to estimate the geographical location of a query image by recognizing its nearest reference images from a city-scale dataset.

Image Retrieval

Bayesian Adversarial Human Motion Synthesis

1 code implementation CVPR 2020 Rui Zhao, Hui Su, Qiang Ji

By explicitly capturing the distribution of the data and parameters, our model has a more compact parameterization compared to GAN-based generative models.

Bayesian Inference Data Augmentation +1

COCAS: A Large-Scale Clothes Changing Person Dataset for Re-identification

no code implementations CVPR 2020 Shijie Yu, Shihua Li, Dapeng Chen, Rui Zhao, Junjie Yan, Yu Qiao

To address the clothes changing person re-id problem, we construct a novel large-scale re-id benchmark named ClOthes ChAnging Person Set (COCAS), which provides multiple images of the same identity with different clothes.

Person Re-Identification

Exploring Pre-training with Alignments for RNN Transducer based End-to-End Speech Recognition

no code implementations1 May 2020 Hu Hu, Rui Zhao, Jinyu Li, Liang Lu, Yifan Gong

Recently, the recurrent neural network transducer (RNN-T) architecture has become an emerging trend in end-to-end automatic speech recognition research due to its advantages of being capable for online streaming speech recognition.

automatic-speech-recognition End-To-End Speech Recognition +2

Stacked Convolutional Deep Encoding Network for Video-Text Retrieval

no code implementations10 Apr 2020 Rui Zhao, Kecheng Zheng, Zheng-Jun Zha

Existing dominant approaches for cross-modal video-text retrieval task are to learn a joint embedding space to measure the cross-modal similarity.

Fine-tuning Language Modelling +1

Learning to Cluster Faces via Confidence and Connectivity Estimation

2 code implementations CVPR 2020 Lei Yang, Dapeng Chen, Xiaohang Zhan, Rui Zhao, Chen Change Loy, Dahua Lin

With the vertex confidence and edge connectivity, we can naturally organize more relevant vertices on the affinity graph and group them into clusters.

Connectivity Estimation Face Clustering

High-Accuracy and Low-Latency Speech Recognition with Two-Head Contextual Layer Trajectory LSTM Model

no code implementations17 Mar 2020 Jinyu Li, Rui Zhao, Eric Sun, Jeremy H. M. Wong, Amit Das, Zhong Meng, Yifan Gong

While the community keeps promoting end-to-end models over conventional hybrid models, which usually are long short-term memory (LSTM) models trained with a cross entropy criterion followed by a sequence discriminative training criterion, we argue that such conventional hybrid models can still be significantly improved.

automatic-speech-recognition Speech Recognition

Structured Domain Adaptation with Online Relation Regularization for Unsupervised Person Re-ID

2 code implementations14 Mar 2020 Yixiao Ge, Feng Zhu, Rui Zhao, Hongsheng Li

An improved pseudo-label-based encoder can therefore be obtained by jointly training the source-to-target translated images with ground-truth identities and target-domain images with pseudo identities.

Translation Unsupervised Domain Adaptation +1

Mutual Information-based State-Control for Intrinsically Motivated Reinforcement Learning

no code implementations5 Feb 2020 Rui Zhao, Yang Gao, Pieter Abbeel, Volker Tresp, Wei Xu

In reinforcement learning, an agent learns to reach a set of goals by means of an external reward signal.

Towards a computer-interpretable actionable formal model to encode data governance rules

no code implementations19 Nov 2019 Rui Zhao, Malcolm Atkinson

With the needs of science and business, data sharing and re-use has become an intensive activity for various areas.

Bayesian Graph Convolution LSTM for Skeleton Based Action Recognition

no code implementations ICCV 2019 Rui Zhao, Kang Wang, Hui Su, Qiang Ji

Finally, the whole model is extended under the Bayesian framework to a probabilistic model in order to better capture the stochasticity and variation in the data.

Action Recognition Bayesian Inference +1

Self-Supervised State-Control through Intrinsic Mutual Information Rewards

1 code implementation25 Sep 2019 Rui Zhao, Volker Tresp, Wei Xu

Our results show that the mutual information between the context states and the states of interest can be an effective ingredient for overcoming challenges in robotic manipulation tasks with sparse rewards.

OpenAI Gym

Maximum Entropy-Regularized Multi-Goal Reinforcement Learning

2 code implementations21 May 2019 Rui Zhao, Xudong Sun, Volker Tresp

This objective encourages the agent to maximize the expected return, as well as to achieve more diverse goals.

Multi-Goal Reinforcement Learning OpenAI Gym

Neural Networks for Modeling Source Code Edits

no code implementations4 Apr 2019 Rui Zhao, David Bieber, Kevin Swersky, Daniel Tarlow

In this work, we instead treat source code as a dynamic object and tackle the problem of modeling the edits that software developers make to source code files.

Curiosity-Driven Experience Prioritization via Density Estimation

no code implementations20 Feb 2019 Rui Zhao, Volker Tresp

In Reinforcement Learning (RL), an agent explores the environment and collects trajectories into the memory buffer for later learning.

Density Estimation OpenAI Gym

Advancing Acoustic-to-Word CTC Model with Attention and Mixed-Units

no code implementations31 Dec 2018 Amit Das, Jinyu Li, Guoli Ye, Rui Zhao, Yifan Gong

In particular, we introduce Attention CTC, Self-Attention CTC, Hybrid CTC, and Mixed-unit CTC.

Language Modelling voice assistant

Energy-Based Hindsight Experience Prioritization

2 code implementations2 Oct 2018 Rui Zhao, Volker Tresp

We evaluate our Energy-Based Prioritization (EBP) approach on four challenging robotic manipulation tasks in simulation.

Efficient Dialog Policy Learning via Positive Memory Retention

2 code implementations2 Oct 2018 Rui Zhao, Volker Tresp

This paper is concerned with the training of recurrent neural networks as goal-oriented dialog agents using reinforcement learning.

Goal-Oriented Dialog Object Discovery

Learning Goal-Oriented Visual Dialog via Tempered Policy Gradient

1 code implementation2 Jul 2018 Rui Zhao, Volker Tresp

Learning goal-oriented dialogues by means of deep reinforcement learning has recently become a popular research topic.

Policy Gradient Methods Visual Dialog

Bilateral Ordinal Relevance Multi-Instance Regression for Facial Action Unit Intensity Estimation

no code implementations CVPR 2018 Yong Zhang, Rui Zhao, Wei-Ming Dong, Bao-Gang Hu, Qiang Ji

The majority of methods directly apply supervised learning techniques to AU intensity estimation while few methods exploit unlabeled samples to improve the performance.

Developing Far-Field Speaker System Via Teacher-Student Learning

no code implementations14 Apr 2018 Jinyu Li, Rui Zhao, Zhuo Chen, Changliang Liu, Xiong Xiao, Guoli Ye, Yifan Gong

In this study, we develop the keyword spotting (KWS) and acoustic model (AM) components in a far-field speaker system.

Keyword Spotting Model Compression

Advancing Acoustic-to-Word CTC Model

no code implementations15 Mar 2018 Jinyu Li, Guoli Ye, Amit Das, Rui Zhao, Yifan Gong

However, the word-based CTC model suffers from the out-of-vocabulary (OOV) issue as it can only model limited number of words in the output layer and maps all the remaining words into an OOV output node.

Language Modelling voice assistant

Advancing Connectionist Temporal Classification With Attention Modeling

no code implementations15 Mar 2018 Amit Das, Jinyu Li, Rui Zhao, Yifan Gong

In this study, we propose advancing all-neural speech recognition by directly incorporating attention modeling within the Connectionist Temporal Classification (CTC) framework.

Classification General Classification +3

Acoustic-To-Word Model Without OOV

no code implementations28 Nov 2017 Jinyu Li, Guoli Ye, Rui Zhao, Jasha Droppo, Yifan Gong

However, this type of word-based CTC model suffers from the out-of-vocabulary (OOV) issue as it can only model limited number of words in the output layer and maps all the remaining words into an OOV output node.

voice assistant

Improved training for online end-to-end speech recognition systems

1 code implementation6 Nov 2017 Suyoun Kim, Michael L. Seltzer, Jinyu Li, Rui Zhao

Achieving high accuracy with end-to-end speech recognizers requires careful parameter initialization prior to training.

Curriculum Learning End-To-End Speech Recognition +1

Large-Scale Domain Adaptation via Teacher-Student Learning

no code implementations17 Aug 2017 Jinyu Li, Michael L. Seltzer, Xi Wang, Rui Zhao, Yifan Gong

High accuracy speech recognition requires a large amount of transcribed data for supervised training.

Domain Adaptation Speech Recognition

A Nuclear-norm Model for Multi-Frame Super-Resolution Reconstruction from Video Clips

no code implementations17 Apr 2017 Rui Zhao, Raymond H. Chan

Then a low-rank model is used to construct the reference frame in high-resolution by incorporating the information of the low-resolution frames.

Multi-Frame Super-Resolution Optical Flow Estimation

Two-Stream RNN/CNN for Action Recognition in 3D Videos

no code implementations22 Mar 2017 Rui Zhao, Haider Ali, Patrick van der Smagt

The recognition of actions from video sequences has many applications in health monitoring, assisted living, surveillance, and smart homes.

Action Recognition

Deep Learning and Its Applications to Machine Health Monitoring: A Survey

1 code implementation16 Dec 2016 Rui Zhao, Ruqiang Yan, Zhenghua Chen, Kezhi Mao, Peng Wang, Robert X. Gao

Since 2006, deep learning (DL) has become a rapidly growing research direction, redefining state-of-the-art performances in a wide range of areas such as object recognition, image segmentation, speech recognition and machine translation.

Machine Translation Object Recognition +3

Saliency Detection by Multi-Context Deep Learning

no code implementations CVPR 2015 Rui Zhao, Wanli Ouyang, Hongsheng Li, Xiaogang Wang

Low-level saliency cues or priors do not produce good enough saliency detection results especially when the salient object presents in a low-contrast background with confusing visual appearance.

Image Classification RGB Salient Object Detection +2

Highly Efficient Forward and Backward Propagation of Convolutional Neural Networks for Pixelwise Classification

no code implementations15 Dec 2014 Hongsheng Li, Rui Zhao, Xiaogang Wang

The proposed algorithms eliminate all the redundant computation in convolution and pooling on images by introducing novel d-regularly sparse kernels.

Classification General Classification +3

Nilpotent matrices having a given Jordan type as maximum commuting nilpotent orbit

1 code implementation8 Sep 2014 Anthony Iarrobino, Leila Khatami, Bart Van Steirteghem, Rui Zhao

In 2012 P. Oblak formulated a conjecture concerning the cardinality of the set of partitions $P$ such that ${\mathcal Q}(P)$ is a given stable partition $ Q$ with two parts, and proved some special cases.

Rings and Algebras Commutative Algebra Representation Theory 15A27 (Primary), 05E40 (Secondary), 13E10, 15A21

Learning Mid-level Filters for Person Re-identification

no code implementations CVPR 2014 Rui Zhao, Wanli Ouyang, Xiaogang Wang

In this paper, we propose a novel approach of learning mid-level filters from automatically discovered patch clusters for person re-identification.

Patch Matching Person Re-Identification

DeepReID: Deep Filter Pairing Neural Network for Person Re-Identification

no code implementations CVPR 2014 Wei Li, Rui Zhao, Tong Xiao, Xiaogang Wang

In this paper, we propose a novel filter pairing neural network (FPNN) to jointly handle misalignment, photometric and geometric transforms, occlusions and background clutter.

Person Re-Identification

Unsupervised Salience Learning for Person Re-identification

no code implementations CVPR 2013 Rui Zhao, Wanli Ouyang, Xiaogang Wang

In this paper, we propose a novel perspective for person re-identification based on unsupervised salience learning.

Patch Matching Person Re-Identification

