Conclusions:Compared with the results of our previous studies and other studies, the method in this paper achieves a very competitive sensitivity with less training data and maintains a low false positive rate. As the only method currently using 3D U-Net for aneurysm detection, it proves the feasibility and superior performance of this network in aneurysm detection, and also explores the potential of the channel attention mechanism in this task.
In this paper, we present a novel speaker diarization system for streaming on-device applications.
With additional techniques such as pronunciation and silence probability modeling, plus multi-style training, we achieve a +5. 42% and +3. 18% relative WER improvement for the development and evaluation sets of the Fearless Steps Corpus.
Despite the impressive clustering performance and efficiency in characterizing both the relationship between data and cluster structure, existing graph-based multi-view clustering methods still have the following drawbacks.
We present Long Short-term TRansformer (LSTR), a temporal modeling algorithm for online action detection, which employs a long- and short-term memory mechanism to model prolonged sequence data.
We design a new instance-to-track matching objective to learn appearance embedding that compares a candidate detection to the embedding of the tracks persisted in the tracker.
Our hierarchical GNN uses a novel approach to merge connected components predicted at each level of the hierarchy to form a new graph at the next level.
The common implementation of face recognition systems as a cascade of a detection stage and a recognition or verification stage can cause problems beyond failures of the detector.
Existing systems use the same embedding model to compute representations (embeddings) for the query and gallery images.
1 code implementation • 25 Feb 2021 • Yuanhan Zhang, Zhenfei Yin, Jing Shao, Ziwei Liu, Shuo Yang, Yuanjun Xiong, Wei Xia, Yan Xu, Man Luo, Jian Liu, Jianshu Li, Zhijun Chen, Mingyu Guo, Hui Li, Junfu Liu, Pengfei Gao, Tianqi Hong, Hao Han, Shijie Liu, Xinhua Chen, Di Qiu, Cheng Zhen, Dashuang Liang, Yufeng Jin, Zhanlong Hao
It is the largest face anti-spoofing dataset in terms of the numbers of the data and the subjects.
2 code implementations • 18 Feb 2021 • Liming Jiang, Zhengkui Guo, Wayne Wu, Zhaoyang Liu, Ziwei Liu, Chen Change Loy, Shuo Yang, Yuanjun Xiong, Wei Xia, Baoying Chen, Peiyu Zhuang, Sili Li, Shen Chen, Taiping Yao, Shouhong Ding, Jilin Li, Feiyue Huang, Liujuan Cao, Rongrong Ji, Changlei Lu, Ganchao Tan
This paper reports methods and results in the DeeperForensics Challenge 2020 on real-world face forgery detection.
Exotic phenomenon can be achieved in quantum materials by confining electronic states into two dimensions.
Strongly Correlated Electrons Materials Science Superconductivity
We propose a new method to detect deepfake images using the cue of the source feature inconsistency within the forged images.
First, we examine a simple contrastive learning approach (SimCLR) with a momentum contrastive (MoCo) learning framework, where the MoCo speaker embedding system utilizes a queue to maintain a large set of negative examples.
Despite speaker verification has achieved significant performance improvement with the development of deep neural networks, domain mismatch is still a challenging problem in this field.
Reducing inconsistencies in the behavior of different versions of an AI system can be as important in practice as reducing its overall error.
Data augmentation has been highly effective in narrowing the data gap and reducing the cost for human annotation, especially for tasks where ground truth labels are difficult and expensive to acquire.
In forensic applications, it is very common that only small naturalistic datasets consisting of short utterances in complex or unknown acoustic environments are available.
Forensic audio analysis for speaker verification offers unique challenges due to location/scenario uncertainty and diversity mismatch between reference and naturalistic field recordings.
In this study, we propose the global context guided channel and time-frequency transformations to model the long-range, non-local time-frequency dependencies and channel variances in speaker representations.
Fast IPv6 scanning is challenging in the field of network measurement as it requires exploring the whole IPv6 address space but limited by current computational power.
The reliability of using fully convolutional networks (FCNs) has been successfully demonstrated by recent studies in many speech applications.
To address this problem we develop an experimental method for measuring algorithmic bias of face analysis algorithms, which manipulates directly the attributes of interest, e. g., gender and skin tone, in order to reveal causal links between attribute variation and performance change.
In this paper, we focus on improving the online face liveness detection system to enhance the security of the downstream face recognition system.
Backward compatibility is critical to quickly deploy new embedding models that leverage ever-growing large-scale training datasets and improvements in deep learning architectures and training methods.
Speaker verification systems often degrade significantly when there is a language mismatch between training and testing data.
In this study, we introduce a convolutional time-frequency-channel "Squeeze and Excitation" (tfc-SE) module to explicitly model inter-dependencies between the time-frequency domain and multiple channels.
These advantages of thermal infrared cameras make the segmentation of semantic objects in day and night.
The usability study on some real-world corpora illustrates the superiority of cLDA to explore the underlying topics automatically but also model their connections and variations across multiple collections.
no code implementations • 19 Feb 2019 • Chen Change Loy, Dahua Lin, Wanli Ouyang, Yuanjun Xiong, Shuo Yang, Qingqiu Huang, Dongzhan Zhou, Wei Xia, Quanquan Li, Ping Luo, Junjie Yan, Jian-Feng Wang, Zuoxin Li, Ye Yuan, Boxun Li, Shuai Shao, Gang Yu, Fangyun Wei, Xiang Ming, Dong Chen, Shifeng Zhang, Cheng Chi, Zhen Lei, Stan Z. Li, Hongkai Zhang, Bingpeng Ma, Hong Chang, Shiguang Shan, Xilin Chen, Wu Liu, Boyan Zhou, Huaxiong Li, Peng Cheng, Tao Mei, Artem Kukharenko, Artem Vasenin, Nikolay Sergievskiy, Hua Yang, Liangqi Li, Qiling Xu, Yuan Hong, Lin Chen, Mingjun Sun, Yirong Mao, Shiying Luo, Yongjun Li, Ruiping Wang, Qiaokang Xie, Ziyang Wu, Lei Lu, Yiheng Liu, Wengang Zhou
This paper presents a review of the 2018 WIDER Challenge on Face and Pedestrian.
Convolutional Neural Network (CNN) has demonstrated promising performance in single-label image classification tasks.
In this paper, we introduce a subcategory-aware object classification framework to boost category level object classification performance.