1 code implementation • ECCV 2020 • Xiao Zhang, Rui Zhao, Yu Qiao, Hongsheng Li
To address this problem, this paper introduces a novel Radial Basis Function (RBF) distances to replace the commonly used inner products in the softmax loss function, such that it can adaptively assign losses to regularize the intra-class and inter-class distances by reshaping the relative differences, and thus creating more representative prototypes of classes to improve optimization.
no code implementations • 25 Mar 2023 • Xiaoshi Wu, Keqiang Sun, Feng Zhu, Rui Zhao, Hongsheng Li
Using the HPS, we propose a simple yet effective method to adapt Stable Diffusion to better align with human aesthetic preferences.
no code implementations • 24 Mar 2023 • Yulin Luo, Rui Zhao, Xiaobao Wei, Jinwei Chen, Yijie Lu, Shenghao Xie, Tianyu Wang, Ruiqin Xiong, Ming Lu, Shanghang Zhang
Our MoWE achieves SOTA performance in upstream task on the proposed dataset and two public datasets, i. e. All-Weather and Rain/Fog-Cityscapes, and also have better perceptual results in downstream segmentation task compared to other methods.
1 code implementation • 23 Mar 2023 • Xiaoshi Wu, Feng Zhu, Rui Zhao, Hongsheng Li
To overcome these obstacles, we propose CORA, a DETR-style framework that adapts CLIP for Open-vocabulary detection by Region prompting and Anchor pre-matching.
Ranked #1 on
Open Vocabulary Object Detection
on MSCOCO
(using extra training data)
no code implementations • 23 Mar 2023 • Shaobo Lin, Kun Wang, Xingyu Zeng, Rui Zhao
To construct a representative synthetic training dataset, we maximize the diversity of the selected images via a sample-based and cluster-based method.
1 code implementation • 21 Mar 2023 • Yajing Zheng, Jiyuan Zhang, Rui Zhao, Jianhao Ding, Shiyan Chen, Ruiqin Xiong, Zhaofei Yu, Tiejun Huang
SpikeCV focuses on encapsulation for spike data, standardization for dataset interfaces, modularization for vision tasks, and real-time applications for challenging scenes.
no code implementations • 15 Mar 2023 • Guoqiang Jin, Fan Yang, Mingshan Sun, Ruyi Zhao, Yakun Liu, Wei Li, Tianpeng Bao, Liwei Wu, Xingyu Zeng, Rui Zhao
To this end, we propose SeqCo-DETR, a novel Sequence Consistency-based self-supervised method for object DEtection with TRansformers.
no code implementations • 10 Mar 2023 • Shixiang Tang, Cheng Chen, Qingsong Xie, Meilin Chen, Yizhou Wang, Yuanzheng Ci, Lei Bai, Feng Zhu, Haiyang Yang, Li Yi, Rui Zhao, Wanli Ouyang
Specifically, we propose a \textbf{HumanBench} based on existing datasets to comprehensively evaluate on the common ground the generalization abilities of different pretraining methods on 19 datasets from 6 diverse downstream tasks, including person ReID, pose estimation, human parsing, pedestrian attribute recognition, pedestrian detection, and crowd counting.
1 code implementation • 6 Mar 2023 • Yuanzheng Ci, Yizhou Wang, Meilin Chen, Shixiang Tang, Lei Bai, Feng Zhu, Rui Zhao, Fengwei Yu, Donglian Qi, Wanli Ouyang
When adapted to a specific task, UniHCP achieves new SOTAs on a wide range of human-centric tasks, e. g., 69. 8 mIoU on CIHP for human parsing, 86. 18 mA on PA-100K for attribute prediction, 90. 3 mAP on Market1501 for ReID, and 85. 8 JI on CrowdHuman for pedestrian detection, performing better than specialized models tailored for each task.
no code implementations • 2 Mar 2023 • Rui Zhao, Wei Li, Zhipeng Hu, Lincheng Li, Zhengxia Zou, Zhenwei Shi, Changjie Fan
In our method, taking the power of large-scale pre-trained multi-modal CLIP and neural rendering, T2P searches both continuous facial parameters and discrete facial parameters in a unified framework.
no code implementations • 28 Feb 2023 • Shaobo Lin, Kun Wang, Xingyu Zeng, Rui Zhao
Specifically, we first discover the base images which contain the FP of novel categories and select a certain amount of samples from them for the base and novel categories balance.
no code implementations • 28 Feb 2023 • Zhaowen Li, Yousong Zhu, Zhiyang Chen, Wei Li, Chaoyang Zhao, Liwei Wu, Rui Zhao, Ming Tang, Jinqiao Wang
However, its high random mask ratio would result in two serious problems: 1) the data are not efficiently exploited, which brings inefficient pre-training (\eg, 1600 epochs for MAE $vs.$ 300 epochs for the supervised), and 2) the high uncertainty and inconsistency of the pre-trained model, \ie, the prediction of the same patch may be inconsistent under different mask rounds.
no code implementations • 22 Feb 2023 • Meilin Chen, Yizhou Wang, Shixiang Tang, Feng Zhu, Haiyang Yang, Lei Bai, Rui Zhao, Donglian Qi, Wanli Ouyang
Despite being feasible, recent works largely overlooked discovering the most discriminative regions for contrastive learning to object representations in scene images.
no code implementations • 26 Jan 2023 • Shaobo Lin, Xingyu Zeng, Rui Zhao
The generalization power of the pre-trained model is the key for few-shot deep learning.
no code implementations • 30 Dec 2022 • Hangyu Mao, Rui Zhao, Hao Chen, Jianye Hao, Yiqun Chen, Dong Li, Junge Zhang, Zhen Xiao
Recent methods combine the Transformer with these modules for better performance.
no code implementations • 5 Dec 2022 • Rui Zhao, Jian Xue, Partha Parthasarathy, Veljko Miljanic, Jinyu Li
Neural transducer is now the most popular end-to-end model for speech recognition, due to its naturally streaming ability.
1 code implementation • 3 Dec 2022 • Yu Qi, Fan Yang, Yousong Zhu, Yufei Liu, Liwei Wu, Rui Zhao, Wei Li
By introducing stochastic prediction and the parallel encoder-decoder, SAIM significantly improve the performance of autoregressive image modeling.
no code implementations • 27 Nov 2022 • Jinghui Lu, Rui Zhao, Brian Mac Namee, Fei Tan
In this work, we present a ``versatile'' model -- the Prompting-based Unified NER system (PUnifiedNER) -- that works with data from different domains and can recognise up to 37 entity types simultaneously, and theoretically it could be as many as possible.
no code implementations • 25 Nov 2022 • Tianpeng Bao, Jiadong Chen, Wei Li, Xiang Wang, Jingjing Fei, Liwei Wu, Rui Zhao, Ye Zheng
However, existing datasets for unsupervised anomaly detection are biased towards manufacturing inspection, not considering maintenance inspection which is usually conducted under outdoor uncontrolled environment such as varying camera viewpoints, messy background and degradation of object surface after long-term working.
no code implementations • 17 Nov 2022 • Xun Gong, Yu Wu, Jinyu Li, Shujie Liu, Rui Zhao, Xie Chen, Yanmin Qian
This motivates us to leverage the factorized neural transducer structure, containing a real language model, the vocabulary predictor.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 20 Oct 2022 • Jianqiu Chen, Mingshan Sun, Ye Zheng, Tianpeng Bao, Zhenyu He, Donghai Li, Guoqiang Jin, Rui Zhao, Liwei Wu, Xiaoke Jiang
Existing direct 6D pose estimation methods regress target 6D poses without the need for post-processing, making them effective and easy to develop.
no code implementations • 12 Oct 2022 • Shaobo Lin, Xingyu Zeng, Rui Zhao
Conventional training of deep neural networks usually requires a substantial amount of data with expensive human annotations.
no code implementations • 8 Oct 2022 • Zhenyu Mao, Dongsheng Zhu, Jinghui Lu, Rui Zhao, Fei Tan
Contrastive learning methods achieve state-of-the-art results in unsupervised sentence representation learning.
no code implementations • 30 Sep 2022 • Jinghui Lu, Rui Zhao, Brian Mac Namee, Dongsheng Zhu, Weidong Han, Fei Tan
In this paper, we propose a theoretical framework to explain the efficacy of prompt learning in zero/few-shot scenarios.
2 code implementations • 28 Sep 2022 • Zhiyang Chen, Yousong Zhu, Zhaowen Li, Fan Yang, Wei Li, Haixin Wang, Chaoyang Zhao, Liwei Wu, Rui Zhao, Jinqiao Wang, Ming Tang
Obj2Seq is able to flexibly determine input categories to satisfy customized requirements, and be easily extended to different visual tasks.
1 code implementation • 15 Sep 2022 • Ye Du, Yujun Shen, Haochen Wang, Jingjing Fei, Wei Li, Liwei Wu, Rui Zhao, Zehua Fu, Qingjie Liu
Self-training has shown great potential in semi-supervised learning.
1 code implementation • 14 Sep 2022 • Zhenyu Mao, Ziyue Li, Dedong Li, Lei Bai, Rui Zhao
Unlike the existing cross-scale contrastive learning methods on graphs that only contrast a graph and its belonging nodes, the contrast between road segment and trajectory is elaborately tailored via novel positive sampling and adaptive weighting strategies.
no code implementations • 15 Aug 2022 • Mingshan Sun, Ye Zheng, Tianpeng Bao, Jianqiu Chen, Guoqiang Jin, Liwei Wu, Rui Zhao, Xiaoke Jiang
Uni6D is the first 6D pose estimation approach to employ a unified backbone network to extract features from both RGB and depth images.
no code implementations • 1 Aug 2022 • Xulin Li, Yan Lu, Bin Liu, Yating Liu, Guojun Yin, Qi Chu, Jinyang Huang, Feng Zhu, Rui Zhao, Nenghai Yu
But we find existing graph-based methods in the visible-infrared person re-identification task (VI-ReID) suffer from bad generalization because of two issues: 1) train-test modality balance gap, which is a property of VI-ReID task.
no code implementations • 22 Jun 2022 • Kaifeng Zhang, Rui Zhao, Ziming Zhang, Yang Gao
In this work, we propose Auto-Encoding Adversarial Imitation Learning (AEAIL), a robust and scalable AIL framework.
no code implementations • 10 May 2022 • Haiyang Yang, Meilin Chen, Yizhou Wang, Shixiang Tang, Feng Zhu, Lei Bai, Rui Zhao, Wanli Ouyang
While recent self-supervised learning methods have achieved good performances with evaluation set on the same domain as the training set, they will have an undesirable performance decrease when tested on a different domain.
no code implementations • 28 Apr 2022 • Shaofeng Zhang, Feng Zhu, Junchi Yan, Rui Zhao, Xiaokang Yang
Scalability is an important consideration for deep graph neural networks.
no code implementations • CVPR 2022 • Xiaoke Jiang, Donghai Li, Hao Chen, Ye Zheng, Rui Zhao, Liwei Wu
They use a 2D CNN for RGB images and a per-pixel point cloud network for depth data, as well as a fusion network for feature fusion.
no code implementations • CVPR 2022 • Zhaowen Li, Yousong Zhu, Fan Yang, Wei Li, Chaoyang Zhao, Yingying Chen, Zhiyang Chen, Jiahao Xie, Liwei Wu, Rui Zhao, Ming Tang, Jinqiao Wang
Furthermore, our method can also exploit single-centric-object dataset such as ImageNet and outperforms BYOL by 2. 5% with the same pre-training epochs in linear probing, and surpass current self-supervised object detection methods on COCO dataset, demonstrating its universality and potential.
1 code implementation • CVPR 2022 • Yuchao Wang, Haochen Wang, Yujun Shen, Jingjing Fei, Wei Li, Guoqiang Jin, Liwei Wu, Rui Zhao, Xinyi Le
A common practice is to select the highly confident predictions as the pseudo ground-truth, but it leads to a problem that most pixels may be left unused due to their unreliability.
no code implementations • CVPR 2022 • Shaofeng Zhang, Lyn Qiu, Feng Zhu, Junchi Yan, Hengrui Zhang, Rui Zhao, Hongyang Li, Xiaokang Yang
Existing symmetric contrastive learning methods suffer from collapses (complete and dimensional) or quadratic complexity of objectives.
1 code implementation • 22 Dec 2021 • Rui Zhao, Jinming Song, Yufeng Yuan, Hu Haifeng, Yang Gao, Yi Wu, Zhongqian Sun, Yang Wei
We study the problem of training a Reinforcement Learning (RL) agent that is collaborative with humans without using any human data.
1 code implementation • CVPR 2022 • Zhikang Wang, Feng Zhu, Shixiang Tang, Rui Zhao, Lihuo He, Jiangning Song
With the guidance of the occlusion scores from OEM, the feature diffusion process is mainly conducted on visible body parts, which guarantees the quality of the synthesized NTP characteristics.
Ranked #1 on
Person Re-Identification
on Occluded REID
(Rank-1 metric)
no code implementations • CVPR 2022 • Yizhou Wang, Shixiang Tang, Feng Zhu, Lei Bai, Rui Zhao, Donglian Qi, Wanli Ouyang
The pretrain-finetune paradigm is a classical pipeline in visual learning.
3 code implementations • 15 Nov 2021 • Jiawei Yu, Ye Zheng, Xiang Wang, Wei Li, Yushuang Wu, Rui Zhao, Liwei Wu
However, current methods can not effectively map image features to a tractable base distribution and ignore the relationship between local and global features which are important to identify anomalies.
Ranked #6 on
Anomaly Detection
on MVTec AD
(using extra training data)
Unsupervised Anomaly Detection
Weakly Supervised Defect Detection
no code implementations • 2 Nov 2021 • Haoran Zhou, Hang Huang, Rui Zhao, Wei Wang, Qingguo Zhou
In principal modern detectors, the task of object localization is implemented by the box subnet which concentrates on bounding box regression.
no code implementations • 9 Oct 2021 • Ye Zheng, Xiang Wang, Rui Deng, Tianpeng Bao, Rui Zhao, Liwei Wu
To facilitate the learning with only normal images, we propose a new pretext task called non-contrastive learning for the fine alignment stage.
Ranked #27 on
Anomaly Detection
on MVTec AD
(using extra training data)
1 code implementation • CVPR 2022 • Liwen Hu, Rui Zhao, Ziluo Ding, Lei Ma, Boxin Shi, Ruiqin Xiong, Tiejun Huang
Further, for training SCFlow, we synthesize two sets of optical flow data for the spiking camera, SPIkingly Flying Things and Photo-realistic High-speed Motion, denoted as SPIFT and PHM respectively, corresponding to random high-speed and well-designed scenes.
no code implementations • 3 Oct 2021 • Rui Zhao, Malcolm Atkinson, Petros Papapanagiotou, Federica Magnoni, Jacques Fleuriot
It depends on federations sharing data that often have governance rules or external regulations restricting their use.
no code implementations • 29 Sep 2021 • Shaobo Lin, Xingyu Zeng, Rui Zhao
Conventional training of deep neural networks usually requires a substantial amount of data with expensive human annotations.
no code implementations • ICLR 2022 • Shaofeng Zhang, Feng Zhu, Junchi Yan, Rui Zhao, Xiaokang Yang
The proposed two methods (FCL, ICL) can be combined synthetically, called Zero-CL, where ``Zero'' means negative samples are \textbf{zero} relevant, which allows Zero-CL to completely discard negative pairs i. e., with \textbf{zero} negative samples.
no code implementations • 29 Sep 2021 • Kaifeng Zhang, Rui Zhao, Ziming Zhang, Yang Gao
Reinforcement learning (RL) provides a powerful framework for decision-making, but its application in practice often requires a carefully designed reward function.
no code implementations • 29 Sep 2021 • Yizhou Wang, Shixiang Tang, Feng Zhu, Lei Bai, Rui Zhao, Donglian Qi, Wanli Ouyang
The pretrain-finetune paradigm is a classical pipeline in visual learning.
no code implementations • 21 Sep 2021 • Yuecong Xu, Jianfei Yang, Haozhi Cao, Keyu Wu, Min Wu, Rui Zhao, Zhenghua Chen
Multi-Source Domain Adaptation (MSDA) is a more practical domain adaptation scenario in real-world scenarios.
1 code implementation • 10 Sep 2021 • Ziluo Ding, Rui Zhao, Jiyuan Zhang, Tianxiao Gao, Ruiqin Xiong, Zhaofei Yu, Tiejun Huang
Recently, many deep learning methods have shown great success in providing promising solutions to many event-based problems, such as optical flow estimation.
no code implementations • 2 Sep 2021 • Rui Zhao
We propose Dr. Aid, a logic-based AI framework for automated compliance checking of data governance rules over data-flow graphs.
no code implementations • NeurIPS 2021 • Zhaowen Li, Zhiyang Chen, Fan Yang, Wei Li, Yousong Zhu, Chaoyang Zhao, Rui Deng, Liwei Wu, Rui Zhao, Ming Tang, Jinqiao Wang
More importantly, the masked tokens together with the remaining tokens are further recovered by a global image decoder, which preserves the spatial information of the image and is more friendly to the downstream dense prediction tasks.
no code implementations • 28 May 2021 • Zhenghao Chen, Shuhang Gu, Feng Zhu, Jing Xu, Rui Zhao
For the spatial correlation, we aggregate attributes with spatial similarity into a part-based group and then introduce a Group Attention Learning to generate the group attention and the part-based group feature.
no code implementations • 26 May 2021 • Shijie Yu, Feng Zhu, Dapeng Chen, Rui Zhao, Haobin Chen, Shixiang Tang, Jinguo Zhu, Yu Qiao
In UDCL, a universal expert supervises the learning of domain experts and continuously gathers knowledge from all domain experts.
no code implementations • 16 May 2021 • Shijie Yu, Dapeng Chen, Rui Zhao, Haobin Chen, Yu Qiao
Person images captured by surveillance cameras are often occluded by various obstacles, which lead to defective feature representation and harm person re-identification (Re-ID) performance.
no code implementations • 27 Apr 2021 • Rui Zhao, Jian Xue, Jinyu Li, Wenning Wei, Lei He, Yifan Gong
The first challenge is solved with a splicing data method which concatenates the speech segments extracted from the source domain data.
no code implementations • 27 Apr 2021 • Yixiao Ge, Xiao Zhang, Ching Lam Choi, Ka Chun Cheung, Peipei Zhao, Feng Zhu, Xiaogang Wang, Rui Zhao, Hongsheng Li
In this way, our BAKE framework achieves online knowledge ensembling across multiple samples with only a single network.
no code implementations • 29 Mar 2021 • Rui Zhao, Kecheng Zheng, Zheng-Jun Zha, Hongtao Xie, Jiebo Luo
The cross-modal memory module is employed to record the instance embeddings of all the datasets for global negative mining.
2 code implementations • ICLR 2021 • Rui Zhao, Yang Gao, Pieter Abbeel, Volker Tresp, Wei Xu
Reinforcement learning has been shown to be highly successful at many challenging tasks.
no code implementations • ICCV 2021 • Chen Zhao, Yixiao Ge, Feng Zhu, Rui Zhao, Hongsheng Li, Mathieu Salzmann
Correspondence selection aims to correctly select the consistent matches (inliers) from an initial set of putative correspondences.
no code implementations • 3 Nov 2020 • Zhong Meng, Sarangarajan Parthasarathy, Eric Sun, Yashesh Gaur, Naoyuki Kanda, Liang Lu, Xie Chen, Rui Zhao, Jinyu Li, Yifan Gong
The external language models (LM) integration remains a challenging task for end-to-end (E2E) automatic speech recognition (ASR) which has no clear division between acoustic and language models.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 2 Nov 2020 • Rui Zhao
Based on current security threats faced by deep learning, this paper introduces the problem of adversarial examples in deep learning, sorts out the existing attack and defense methods of the black box and white box, and classifies them.
no code implementations • 9 Sep 2020 • Rui Zhao, Daniel P. K. Lun, Kin-Man Lam
Recent studies on learning-based image denoising have achieved promising performance on various noise reduction tasks.
no code implementations • 12 Aug 2020 • Vikas Joshi, Rui Zhao, Rupesh R. Mehta, Kshitiz Kumar, Jinyu Li
Transfer learning (TL) is widely used in conventional hybrid automatic speech recognition (ASR) system, to transfer the knowledge from source to target language.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 11 Aug 2020 • Yongchao Liu, Yue Jin, Yong Chen, Teng Teng, Hang Ou, Rui Zhao, Yao Zhang
Accelerating deep model training and inference is crucial in practice.
no code implementations • 1 Aug 2020 • Rui Zhao, Xinjie Wang, Junjuan Xia, Liseng Fan
In particular, the system cost of latency and energy consumption can be reduced significantly by the proposed deep reinforcement learning based algorithm.
no code implementations • 30 Jul 2020 • Jinyu Li, Rui Zhao, Zhong Meng, Yanqing Liu, Wenning Wei, Sarangarajan Parthasarathy, Vadim Mazalov, Zhenghao Wang, Lei He, Sheng Zhao, Yifan Gong
Because of its streaming nature, recurrent neural network transducer (RNN-T) is a very promising end-to-end (E2E) model that may replace the popular hybrid model for automatic speech recognition.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 9 Jul 2020 • Rui Zhao, Tianshan Liu, Jun Xiao, Daniel P. K. Lun, Kin-Man Lam
Multi-task learning is an effective learning strategy for deep-learning-based facial expression recognition tasks.
1 code implementation • 2 Jul 2020 • Zhiliang Wu, Yinchong Yang, Yunpu Ma, Yushan Liu, Rui Zhao, Michael Moor, Volker Tresp
Randomized controlled trials typically analyze the effectiveness of treatments with the goal of making treatment recommendations for patient subgroups.
no code implementations • 28 Jun 2020 • Rui Zhao, Kin-Man Lam, Daniel P. K. Lun
Since most of the content or energy of natural images resides in the low-frequency spectrum, their transformed coefficients in the frequency domain are highly imbalanced.
1 code implementation • 8 Jun 2020 • Bo Zhao, Shixiang Tang, Dapeng Chen, Hakan Bilen, Rui Zhao
With the explosion of digital data in recent years, continuously learning new tasks from a stream of data without forgetting previously acquired knowledge has become increasingly important.
3 code implementations • ECCV 2020 • Yixiao Ge, Haibo Wang, Feng Zhu, Rui Zhao, Hongsheng Li
The task of large-scale retrieval-based image localization is to estimate the geographical location of a query image by recognizing its nearest reference images from a city-scale dataset.
3 code implementations • NeurIPS 2020 • Yixiao Ge, Feng Zhu, Dapeng Chen, Rui Zhao, Hongsheng Li
To solve these problems, we propose a novel self-paced contrastive learning framework with hybrid memory.
Ranked #3 on
Unsupervised Domain Adaptation
on Market to MSMT
1 code implementation • CVPR 2020 • Rui Zhao, Hui Su, Qiang Ji
By explicitly capturing the distribution of the data and parameters, our model has a more compact parameterization compared to GAN-based generative models.
1 code implementation • 28 May 2020 • Jinyu Li, Yu Wu, Yashesh Gaur, Chengyi Wang, Rui Zhao, Shujie Liu
Among all three E2E models, transformer-AED achieved the best accuracy in both streaming and non-streaming mode.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • CVPR 2020 • Shijie Yu, Shihua Li, Dapeng Chen, Rui Zhao, Junjie Yan, Yu Qiao
To address the clothes changing person re-id problem, we construct a novel large-scale re-id benchmark named ClOthes ChAnging Person Set (COCAS), which provides multiple images of the same identity with different clothes.
no code implementations • 1 May 2020 • Hu Hu, Rui Zhao, Jinyu Li, Liang Lu, Yifan Gong
Recently, the recurrent neural network transducer (RNN-T) architecture has become an emerging trend in end-to-end automatic speech recognition research due to its advantages of being capable for online streaming speech recognition.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 10 Apr 2020 • Rui Zhao, Kecheng Zheng, Zheng-Jun Zha
Existing dominant approaches for cross-modal video-text retrieval task are to learn a joint embedding space to measure the cross-modal similarity.
3 code implementations • CVPR 2020 • Lei Yang, Dapeng Chen, Xiaohang Zhan, Rui Zhao, Chen Change Loy, Dahua Lin
With the vertex confidence and edge connectivity, we can naturally organize more relevant vertices on the affinity graph and group them into clusters.
no code implementations • 17 Mar 2020 • Jinyu Li, Rui Zhao, Eric Sun, Jeremy H. M. Wong, Amit Das, Zhong Meng, Yifan Gong
While the community keeps promoting end-to-end models over conventional hybrid models, which usually are long short-term memory (LSTM) models trained with a cross entropy criterion followed by a sequence discriminative training criterion, we argue that such conventional hybrid models can still be significantly improved.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
4 code implementations • 14 Mar 2020 • Yixiao Ge, Feng Zhu, Dapeng Chen, Rui Zhao, Xiaogang Wang, Hongsheng Li
To tackle the challenges, we propose an end-to-end structured domain adaptation framework with an online relation-consistency regularization term.
Ranked #4 on
Unsupervised Domain Adaptation
on Market to MSMT
no code implementations • 5 Feb 2020 • Rui Zhao, Yang Gao, Pieter Abbeel, Volker Tresp, Wei Xu
In reinforcement learning, an agent learns to reach a set of goals by means of an external reward signal.
no code implementations • 19 Nov 2019 • Rui Zhao, Malcolm Atkinson
With the needs of science and business, data sharing and re-use has become an intensive activity for various areas.
no code implementations • ICCV 2019 • Rui Zhao, Kang Wang, Hui Su, Qiang Ji
Finally, the whole model is extended under the Bayesian framework to a probabilistic model in order to better capture the stochasticity and variation in the data.
Ranked #72 on
Skeleton Based Action Recognition
on NTU RGB+D
1 code implementation • 26 Sep 2019 • Jinyu Li, Rui Zhao, Hu Hu, Yifan Gong
In this paper, we improve the RNN-T training in two aspects.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
1 code implementation • 25 Sep 2019 • Rui Zhao, Volker Tresp, Wei Xu
Our results show that the mutual information between the context states and the states of interest can be an effective ingredient for overcoming challenges in robotic manipulation tasks with sparse rewards.
no code implementations • ICCV 2019 • Suichan Li, Dapeng Chen, Bin Liu, Nenghai Yu, Rui Zhao
Learning discriminative image feature embeddings is of great importance to visual recognition.
1 code implementation • CVPR 2019 • Rui Zhao, Wanru Xu, Hui Su, Qiang Ji
Human action recognition remains as a challenging task partially due to the presence of large variations in the execution of action.
Ranked #2 on
Skeleton Based Action Recognition
on MSR Action3D
3 code implementations • 21 May 2019 • Rui Zhao, Xudong Sun, Volker Tresp
This objective encourages the agent to maximize the expected return, as well as to achieve more diverse goals.
no code implementations • CVPR 2019 • Xiao Zhang, Rui Zhao, Junjie Yan, Mengya Gao, Yu Qiao, Xiaogang Wang, Hongsheng Li
Cosine-based softmax losses significantly improve the performance of deep face recognition networks.
3 code implementations • CVPR 2019 • Xiao Zhang, Rui Zhao, Yu Qiao, Xiaogang Wang, Hongsheng Li
Our results show that training deep neural networks with the AdaCos loss is stable and able to achieve high face recognition accuracy.
Ranked #5 on
Face Verification
on MegaFace
no code implementations • 4 Apr 2019 • Rui Zhao, David Bieber, Kevin Swersky, Daniel Tarlow
In this work, we instead treat source code as a dynamic object and tackle the problem of modeling the edits that software developers make to source code files.
no code implementations • 20 Feb 2019 • Rui Zhao, Volker Tresp
In Reinforcement Learning (RL), an agent explores the environment and collects trajectories into the memory buffer for later learning.
no code implementations • 31 Dec 2018 • Amit Das, Jinyu Li, Guoli Ye, Rui Zhao, Yifan Gong
In particular, we introduce Attention CTC, Self-Attention CTC, Hybrid CTC, and Mixed-unit CTC.
2 code implementations • 2 Oct 2018 • Rui Zhao, Volker Tresp
This paper is concerned with the training of recurrent neural networks as goal-oriented dialog agents using reinforcement learning.
2 code implementations • 2 Oct 2018 • Rui Zhao, Volker Tresp
We evaluate our Energy-Based Prioritization (EBP) approach on four challenging robotic manipulation tasks in simulation.
1 code implementation • 2 Jul 2018 • Rui Zhao, Volker Tresp
Learning goal-oriented dialogues by means of deep reinforcement learning has recently become a popular research topic.
no code implementations • CVPR 2018 • Kang Wang, Rui Zhao, Qiang Ji
Through a top-down inference, the HGM can synthesize eye images consistent with the given eye gaze.
no code implementations • CVPR 2018 • Yong Zhang, Rui Zhao, Wei-Ming Dong, Bao-Gang Hu, Qiang Ji
The majority of methods directly apply supervised learning techniques to AU intensity estimation while few methods exploit unlabeled samples to improve the performance.
no code implementations • CVPR 2018 • Jing Xu, Rui Zhao, Feng Zhu, Huaming Wang, Wanli Ouyang
AACN consists of two main components: Pose-guided Part Attention (PPA) and Attention-aware Feature Composition (AFC).
16 code implementations • ICLR 2018 • Adams Wei Yu, David Dohan, Minh-Thang Luong, Rui Zhao, Kai Chen, Mohammad Norouzi, Quoc V. Le
On the SQuAD dataset, our model is 3x to 13x faster in training and 4x to 9x faster in inference, while achieving equivalent accuracy to recurrent models.
Ranked #27 on
Question Answering
on SQuAD1.1 dev
no code implementations • 14 Apr 2018 • Jinyu Li, Rui Zhao, Zhuo Chen, Changliang Liu, Xiong Xiao, Guoli Ye, Yifan Gong
In this study, we develop the keyword spotting (KWS) and acoustic model (AM) components in a far-field speaker system.
no code implementations • 15 Mar 2018 • Amit Das, Jinyu Li, Rui Zhao, Yifan Gong
In this study, we propose advancing all-neural speech recognition by directly incorporating attention modeling within the Connectionist Temporal Classification (CTC) framework.
no code implementations • 15 Mar 2018 • Jinyu Li, Guoli Ye, Amit Das, Rui Zhao, Yifan Gong
However, the word-based CTC model suffers from the out-of-vocabulary (OOV) issue as it can only model limited number of words in the output layer and maps all the remaining words into an OOV output node.
4 code implementations • 10 Jan 2018 • Daniel A. Abolafia, Mohammad Norouzi, Jonathan Shen, Rui Zhao, Quoc V. Le
Models and examples built with TensorFlow
no code implementations • 28 Nov 2017 • Jinyu Li, Guoli Ye, Rui Zhao, Jasha Droppo, Yifan Gong
However, this type of word-based CTC model suffers from the out-of-vocabulary (OOV) issue as it can only model limited number of words in the output layer and maps all the remaining words into an OOV output node.
1 code implementation • 6 Nov 2017 • Suyoun Kim, Michael L. Seltzer, Jinyu Li, Rui Zhao
Achieving high accuracy with end-to-end speech recognizers requires careful parameter initialization prior to training.
no code implementations • 17 Aug 2017 • Jinyu Li, Michael L. Seltzer, Xi Wang, Rui Zhao, Yifan Gong
High accuracy speech recognition requires a large amount of transcribed data for supervised training.
no code implementations • 17 Apr 2017 • Rui Zhao, Raymond H. Chan
Then a low-rank model is used to construct the reference frame in high-resolution by incorporating the information of the low-resolution frames.
no code implementations • 22 Mar 2017 • Rui Zhao, Haider Ali, Patrick van der Smagt
The recognition of actions from video sequences has many applications in health monitoring, assisted living, surveillance, and smart homes.
no code implementations • 8 Feb 2017 • Pei Wang, Guochao Bu, Ronghao Li, Rui Zhao
The new scanner was named as BEE, which can scan the forest trees in three dimension.
1 code implementation • 16 Dec 2016 • Rui Zhao, Ruqiang Yan, Zhenghua Chen, Kezhi Mao, Peng Wang, Robert X. Gao
Since 2006, deep learning (DL) has become a rapidly growing research direction, redefining state-of-the-art performances in a wide range of areas such as object recognition, image segmentation, speech recognition and machine translation.
no code implementations • CVPR 2016 • Rui Zhao, Quan Gan, Shangfei Wang, Qiang Ji
In fully supervised case, all the frames are provided with intensity annotations.
no code implementations • CVPR 2015 • Rui Zhao, Wanli Ouyang, Hongsheng Li, Xiaogang Wang
Low-level saliency cues or priors do not produce good enough saliency detection results especially when the salient object presents in a low-contrast background with confusing visual appearance.
no code implementations • 15 Dec 2014 • Hongsheng Li, Rui Zhao, Xiaogang Wang
The proposed algorithms eliminate all the redundant computation in convolution and pooling on images by introducing novel d-regularly sparse kernels.
no code implementations • 5 Dec 2014 • Rui Zhao, Wanli Ouyang, Xiaogang Wang
(3) saliency matching is proposed based on patch matching.
1 code implementation • 8 Sep 2014 • Anthony Iarrobino, Leila Khatami, Bart Van Steirteghem, Rui Zhao
In 2012 P. Oblak formulated a conjecture concerning the cardinality of the set of partitions $P$ such that ${\mathcal Q}(P)$ is a given stable partition $ Q$ with two parts, and proved some special cases.
Rings and Algebras Commutative Algebra Representation Theory 15A27 (Primary), 05E40 (Secondary), 13E10, 15A21
no code implementations • CVPR 2014 • Rui Zhao, Wanli Ouyang, Xiaogang Wang
In this paper, we propose a novel approach of learning mid-level filters from automatically discovered patch clusters for person re-identification.
no code implementations • CVPR 2014 • Wei Li, Rui Zhao, Tong Xiao, Xiaogang Wang
In this paper, we propose a novel filter pairing neural network (FPNN) to jointly handle misalignment, photometric and geometric transforms, occlusions and background clutter.
no code implementations • CVPR 2013 • Rui Zhao, Wanli Ouyang, Xiaogang Wang
In this paper, we propose a novel perspective for person re-identification based on unsupervised salience learning.