no code implementations • ECCV 2020 • Fang-Yu Wu, Jeremy S. Smith, Wenjin Lu, Chaoyi Pang, Bai-Ling Zhang
Few-shot learning, namely recognizing novel categories with a very small amount of training examples, is a challenging area of machine learning research.
no code implementations • 20 Feb 2024 • Zhe Tang, Ruocheng Gu, Sihao Li, Kyeong Soo Kim, Jeremy S. Smith
As a case study, we have constructed a dynamic database covering three floors of the IR building of XJTLU based on RSSI measurements, over 44 days, and investigated the differences between static and dynamic databases in terms of statistical characteristics and localization performance.
1 code implementation • 22 Jul 2022 • Rui Qiu, Ming Xu, Yuyao Yan, Jeremy S. Smith, Xi Yang
Although deep-learning based methods for monocular pedestrian detection have made great progress, they are still vulnerable to heavy occlusions.
Ranked #2 on Multiview Detection on Wildtrack (using extra training data)
no code implementations • 13 Nov 2018 • Shi-Yang Yan, Yuan Xie, Fang-Yu Wu, Jeremy S. Smith, Wenjin Lu, Bai-Ling Zhang
Automatically generating the descriptions of an image, i. e., image captioning, is an important and fundamental topic in artificial intelligence, which bridges the gap between computer vision and natural language processing.
no code implementations • 25 Aug 2017 • Shi-Yang Yan, Jeremy S. Smith, Wenjin Lu, Bai-Ling Zhang
Through visualization of what have been learnt by the networks, it can be observed that both the attention regions of images and the hierarchical temporal structure can be captured by HM-AN.
no code implementations • 24 Jul 2017 • Fang-Yu Wu, Shi-Yang Yan, Jeremy S. Smith, Bai-Ling Zhang
In this paper, we attempted to solve the traffic scene recognition problem by combining the features representational capabilities of CNN with the VLAD encoding scheme.
no code implementations • 9 May 2017 • Shi-Yang Yan, Jeremy S. Smith, Wenjin Lu, Bai-Ling Zhang
This paper presents improvements to the soft attention model by combining a convolutional LSTM with a hierarchical system architecture to recognize action categories in videos.