Then, we design a spatiotemporal Transformer architecture to detect objects via an end-to-end sequence prediction problem, where the novel temporal Transformer module leverages rich temporal cues from two visual streams to improve the detection performance.
Spiking neural networks (SNNs) are brain-inspired energy-efficient models that encode information in spatiotemporal dynamics.
Uncertainty estimation is critical for numerous applications of deep neural networks and draws growing attention from researchers.
Event cameras, offering high temporal resolutions and high dynamic ranges, have brought a new perspective to address common challenges (e. g., motion blur and low light) in monocular depth estimation.
In this paper, we propose a unified framework named BEV-LGKD to transfer the knowledge in the teacher-student manner.
Neuromorphic spike data, an upcoming modality with high temporal resolution, has shown promising potential in real-world applications due to its inherent advantage to overcome high-velocity motion blur.
In this paper, we propose a novel Uncertainty-Guided Depth Fusion (UGDF) framework to fuse the predictions of monocular and stereo depth estimation networks for spike camera.
We propose a novel Event-based Video reconstruction framework based on a fully Spiking Neural Network (EVSNN), which utilizes Leaky-Integrate-and-Fire (LIF) neuron and Membrane Potential (MP) neuron.
no code implementations • 23 Jan 2022 • Tiejun Huang, Yajing Zheng, Zhaofei Yu, Rui Chen, Yuan Li, Ruiqin Xiong, Lei Ma, Junwei Zhao, Siwei Dong, Lin Zhu, Jianing Li, Shanshan Jia, Yihua Fu, Boxin Shi, Si Wu, Yonghong Tian
By treating vidar as spike trains in biological vision, we have further developed a spiking neural network-based machine vision system that combines the speed of the machine and the mechanism of biological vision, achieving high-speed object detection and tracking 1, 000x faster than human vision.
Short peptides with antimicrobial activity have therapeutic potential for treating bacterial infections.
Different from visible cameras which record intensity images frame by frame, the biologically inspired event camera produces a stream of asynchronous and sparse events with much lower latency.
Ranked #1 on Object Tracking on VisEvent
To this end, we propose Differentiable AIS (DAIS), a variant of AIS which ensures differentiability by abandoning the Metropolis-Hastings corrections.
In this paper, we propose a NeuSpike-Net to learn both the high dynamic range and high motion sensitivity of DVS and the full texture sampling of spike camera to achieve high-speed and high dynamic image reconstruction.
This paper tackles this challenge through jointly enforcing visual and temporal consistency in the combination of a local one-hot classification and a global multi-class classification.
We prove that under certain conditions, a linear combination of quality and diversity constitutes a divergence metric between the generated distribution and the real distribution.
Recently, due to the booming influence of online social networks, detecting fake news is drawing significant attention from both academic communities and general public.
The long-term relations are captured by a temporal self-attention model to alleviate the occlusions and noises in video sequences.
A temporal stream in this network is constructed by inserting several Multi-scale 3D (M3D) convolution layers into a 2D CNN network.
This paper mainly establishes a large-scale Long sequence Video database for person re-IDentification (LVreID).
Our deep architecture explicitly leverages the human part cues to alleviate the pose variations and learn robust feature representations from both the global image and different local parts.
Ranked #103 on Person Re-Identification on Market-1501