Dense Video Captioning (DVC) aims at detecting and describing different events in a given video.
Modelling irregularly-sampled time series (ISTS) is challenging because of missing values.
Nearest neighbor (NN) sampling provides more semantic variations than pre-defined transformations for self-supervised learning (SSL) based image recognition problems.
In this paper, we present Latent Graph Attention (LGA) a computationally inexpensive (linear to the number of nodes) and stable, modular framework for incorporating the global context in the existing architectures, especially empowering small-scale architectures to give performance closer to large size architectures, thus making the light-weight architectures more useful for edge devices with lower compute power and lower energy needs.
Many real-world applications based on online learning produce streaming data that is haphazard in nature, i. e., contains missing features, features becoming obsolete in time, the appearance of new features at later points in time and a lack of clarity on the total number of input features.
We present a novel Master Assistant Buddy Network (MABNet) for image retrieval which incorporates both learning mechanisms.
We further show that when no pretraining is done or when the pretrained transformer models are used with non-natural images (e. g. medical data), CNNs tend to generalize better than transformers at even very small coreset sizes.
Ranked #3 on Image Classification on Tiny ImageNet Classification
Fluorescence microscopy is a quintessential tool for observing cells and understanding the underlying mechanisms of life-sustaining processes of all living organisms.
Traditional CNN models are trained and tested on relatively low resolution images (<300 px), and cannot be directly operated on large-scale images due to compute and memory constraints.
A comparison between SOTA trackers using CNNs, transformers as well as the combination of the two is presented to study their stability at various compression ratios.
To address this issue, partial binarization techniques have been developed, but a systematic approach to mixing binary and full-precision parameters in a single network is still lacking.
Based on the outlined issues, we introduce a novel research problem of training CNN models for very large images, and present 'UltraMNIST dataset', a simple yet representative benchmark dataset for this task.
Solving electromagnetic inverse scattering problems (ISPs) is challenging due to the intrinsic nonlinearity, ill-posedness, and expensive computational cost.
Performing artificial intelligence (AI) tasks such as segmentation, tracking, and analytics of small sub-cellular structures such as mitochondria in microscopy videos of living cells is a prime example.
Streaming classification methods assume the number of input features is fixed and always received.
Learning to dehaze single hazy images, especially using a small training dataset is quite challenging.
Ranked #1 on Image Dehazing on Dense-Haze
Artificial intelligence (AI) driven methods can be useful to predict the parameters, risks, and effects of such an epidemic.
In our proposed methodology, called IRON-MAN (Integrated Rational prediction and Motionless ANalysis), we utilize Bayesian update on top of the individual image frame analysis in the videos and this has resulted in highly accurate prediction of Temporal Motionless Analysis of the Videos (TMAV) for most of the chosen test cases.
Preliminary results indicate that the domain is highly related to computer vision and pattern recognition research with several challenging avenues.
However, the conventional assessment metrics suitable for usual object detection are deficient in the maritime setting.
Three specific problems related to this topic have been studied, viz., polygonal approximation of digital curves, tangent estimation of digital curves, and ellipse fitting anddetection from digital curves.