By iteratively learning with the two strategies, the attentive regions are gradually shifted from the background to the foreground and the features become more discriminative.
To solve this problem, we propose the Annotation Efficient Person Re-Identification method to select image pairs from an alternative pair set according to the fallibility and diversity of pairs, and train the Re-ID model based on the annotation.
We evaluate our algorithm on the CIFAR-10, CIFAR-100 and ImageNet datasets and achieve state-of-the-art accuracy, using fewer time-steps.
We propose a novel Event-based Video reconstruction framework based on a fully Spiking Neural Network (EVSNN), which utilizes Leaky-Integrate-and-Fire (LIF) neuron and Membrane Potential (MP) neuron.
no code implementations • 23 Jan 2022 • Tiejun Huang, Yajing Zheng, Zhaofei Yu, Rui Chen, Yuan Li, Ruiqin Xiong, Lei Ma, Junwei Zhao, Siwei Dong, Lin Zhu, Jianing Li, Shanshan Jia, Yihua Fu, Boxin Shi, Si Wu, Yonghong Tian
By treating vidar as spike trains in biological vision, we have further developed a spiking neural network-based machine vision system that combines the speed of the machine and the mechanism of biological vision, achieving high-speed object detection and tracking 1, 000x faster than human vision.
With the help of special neuromorphic hardware, spiking neural networks (SNNs) are expected to realize artificial intelligence with less energy consumption.
More importantly, due to mimicking receptive field mechanism to collect regional information, RVSM can filter high intensity noise effectively and improves the problem that Spike camera is sensitive to noise largely.
Imperfect labels are ubiquitous in real-world datasets and seriously harm the model performance.
Ranked #1 on Image Classification on Clothing1M (using extra training data)
To elucidate the underlying mechanism clearly, we first study continuous attractor neural networks (CANNs), and find that noisy neural adaptation, exemplified by spike frequency adaptation (SFA) in this work, can generate Lévy flights representing transitions of the network state in the attractor space.
Inspired by BERT, we devise a Masked Point Modeling (MPM) task to pre-train point cloud Transformers.
Ranked #8 on 3D Point Cloud Classification on ScanObjectNN
Further, for training SCFlow, we synthesize two sets of optical flow data for the spiking camera, SPIkingly Flying Things and Photo-realistic High-speed Motion, denoted as SPIFT and PHM respectively, corresponding to random high-speed and well-designed scenes.
In this paper, we theoretically analyze ANN-SNN conversion error and derive the estimated activation function of SNNs.
Despite that spiking neural networks (SNNs) show strong advantages in information encoding, power consuming, and computational capability, the underdevelopment of supervised learning algorithms is still a hindrance for training SNN.
Agents determine the priority of decision-making by comparing the value of intention.
Recently, many deep learning methods have shown great success in providing promising solutions to many event-based problems, such as optical flow estimation.
Different from the conventional digital cameras that compact the photoelectric information within the exposure interval into a single snapshot, the spike camera produces a continuous spike stream to record the dynamic light intensity variation process.
Mimicking the sampling mechanism of the fovea, a retina-inspired camera, named spiking camera, is developed to record the external information with a sampling rate of 40, 000 Hz, and outputs asynchronous binary spike streams.
We show that the inference time can be reduced by optimizing the upper bound of the fit curve in the revised ANN to achieve fast inference.
Our key innovation is to redefine the gradient to a new synaptic parameter, allowing better exploration of network structures by taking full advantage of the competition between pruning and regrowth of connections.
Previous Spiking ResNet mimics the standard residual block in ANNs and simply replaces ReLU activation layers with spiking neurons, which suffers the degradation problem and can hardly implement residual learning.
In this paper, we propose a NeuSpike-Net to learn both the high dynamic range and high motion sensitivity of DVS and the full texture sampling of spike camera to achieve high-speed and high dynamic image reconstruction.
In this paper, we properly exploit the relative motion and derive the relationship between light intensity and each spike, so as to recover the external scene with both high temporal and high spatial resolution.
A conventional camera often suffers from over- or under-exposure when recording a real-world scene with a very high dynamic range (HDR).
In this process, one of the key challenges is to reduce the risk of generalizing the inherent characteristics of numerous unknown samples learned from a small amount of known data.
Computer vision technology is widely used in biological and medical data analysis and understanding.
In this study, we build a computational model to elucidate the computational advantages associated with the interactions between two pathways.
In this paper, we take inspiration from the observation that membrane-related parameters are different across brain regions, and propose a training algorithm that is capable of learning not only the synaptic weights but also the membrane time constants of SNNs.
Empirically, we show that I2C can not only reduce communication overhead but also improve the performance in a variety of multi-agent cooperative scenarios, comparing to existing methods.
Thus, KQ can represent the weight tensor in the convolution layer with low-bit indexes and a kernel codebook with limited size, which enables KQ to achieve significant compression ratio.
Meanwhile, we systematically review state-of-the-art techniques in video compression and feature compression from the unique perspective of MPEG standardization, which provides the academic and industrial evidence to realize the collaborative compression of video and feature streams in a broad range of AI applications.
The proposed model consists of two alternate processes, progressive clustering and episodic training.
Experimental data has revealed that in addition to feedforward connections, there exist abundant feedback connections in a neural pathway.
By solving the problem with its closed-form solution on the fly with the setup of transduction, our approach efficiently tailors an episodic-wise metric for each task to adapt all features from a shared task-agnostic embedding space into a more discriminative task-specific metric space.
One way to compress these heavy models is knowledge transfer (KT), in which a light student network is trained through absorbing the knowledge from a powerful teacher network.
Using the benchmark dataset Omniglot, we show that our model outperforms other unsupervised few-shot learning methods to a large extend and approaches to the performances of supervised methods.
Conventional frame-based camera is not able to meet the demand of rapid reaction for real-time applications, while the emerging dynamic vision sensor (DVS) can realize high speed capturing for moving objects.
Predictors for new categories are added to the classification layer to "open" the deep neural networks to incorporate new categories dynamically.
The SID is an end-to-end decoder with one end as neural spikes and the other end as images, which can be trained directly such that visual scenes are reconstructed from spikes in a highly accurate fashion.
Exploiting multi-scale representations is critical to improve edge detection for objects at different scales.
Ranked #2 on Edge Detection on BIPED
Recent studies have suggested that the cognitive process of the human brain is realized as probabilistic inference and can be further modeled by probabilistic graphical models like Markov random fields.
A temporal stream in this network is constructed by inserting several Multi-scale 3D (M3D) convolution layers into a 2D CNN network.
When a deep CNN with many layers is used for the visual system, it is not easy to compare the structure components of CNNs with possible neuroscience underpinnings due to highly complex circuits from the retina to higher visual cortex.
Furthermore, we show that STNMF can separate spikes of a ganglion cell into a few subsets of spikes where each subset is contributed by one presynaptic bipolar cell.
Taken together, our results suggest that the WTA circuit could be seen as the minimal inference unit of neuronal circuits.
Photoreceptors in the retina are coupled by electrical synapses called "gap junctions".
As compared with traditional video retargeting, stereo video retargeting poses new challenges because stereo video contains the depth information of salient objects and its time dynamics.
To conquer these issues, we propose an End-to-End BoWs (E$^2$BoWs) model based on Deep Convolutional Neural Network (DCNN).
This paper provides an overview of the on-going compact descriptors for video analysis standard (CDVA) from the ISO/IEC moving pictures experts group (MPEG).
Object detection aims to identify instances of semantic objects of a certain class in images or videos.
In this paper, we propose to leverage intra-class variance in metric learning of triplet network to improve the performance of fine-grained recognition.
We also introduce an attention mechanism on the temporal domain to capture the long-term dependence meanwhile finding the salient portions.
Despite a lot of research efforts devoted in recent years, how to efficiently learn long-term dependencies from sequences still remains a pretty challenging task.
Nevertheless, most of the existing features or descriptors cannot capture motion information effectively, especially for long-term motion.
Most existing person re-identification (Re-ID) approaches follow a supervised learning framework, in which a large number of labelled matching pairs are required for training.
To further facilitate the future research on this problem, we also present a carefully-organized large-scale image database "VehicleID", which includes multiple images of the same vehicle captured by different real-world cameras in a city.