The second component is the Temporal Aggregation Module (TAM), which separates embeddings into trend and seasonal components, and extracts meaningful temporal correlations to identify primary components, while filtering out random noise.
In this paper, we first verify through an experiment that style factors are a vital part of domain bias.
Contrastive learning has shown great potential in video representation learning.
During inference, a pixel-wise association procedure is proposed to recover object connections through frames based on the pixel-wise prediction.
Natural language (NL) based vehicle retrieval aims to search specific vehicle given text description.
Human-Object Interaction (HOI) detection is a fundamental task in high-level human-centric scene understanding.
Person Search is a relevant task that aims to jointly solve Person Detection and Person Re-identification(re-ID).
By taking advantage of both dense detection and sparse set detection, Efficient DETR leverages dense prior to initialize the object containers and brings the gap of the 1-decoder structure and 6-decoder structure.
We propose HOI Transformer to tackle human object interaction (HOI) detection in an end-to-end manner.
Ranked #29 on Human-Object Interaction Detection on HICO-DET (using extra training data)
no code implementations • 19 Feb 2019 • Chen Change Loy, Dahua Lin, Wanli Ouyang, Yuanjun Xiong, Shuo Yang, Qingqiu Huang, Dongzhan Zhou, Wei Xia, Quanquan Li, Ping Luo, Junjie Yan, Jian-Feng Wang, Zuoxin Li, Ye Yuan, Boxun Li, Shuai Shao, Gang Yu, Fangyun Wei, Xiang Ming, Dong Chen, Shifeng Zhang, Cheng Chi, Zhen Lei, Stan Z. Li, Hongkai Zhang, Bingpeng Ma, Hong Chang, Shiguang Shan, Xilin Chen, Wu Liu, Boyan Zhou, Huaxiong Li, Peng Cheng, Tao Mei, Artem Kukharenko, Artem Vasenin, Nikolay Sergievskiy, Hua Yang, Liangqi Li, Qiling Xu, Yuan Hong, Lin Chen, Mingjun Sun, Yirong Mao, Shiying Luo, Yongjun Li, Ruiping Wang, Qiaokang Xie, Ziyang Wu, Lei Lu, Yiheng Liu, Wengang Zhou
This paper presents a review of the 2018 WIDER Challenge on Face and Pedestrian.
There are a total of $470K$ human instances from the train and validation subsets, and $~22. 6$ persons per image, with various kinds of occlusions in the dataset.
Ranked #7 on Pedestrian Detection on Caltech (using extra training data)
A new dataset called 4K-Face is also introduced to evaluate the performance of face detection with extreme large scale variations.