Precise and efficient cloud and cloud shadow masking methods are required for the automated use of this data.
Since remote sensing images contain extensive small-scale texture structures, it is important to effectively restore image details from hazy images.
This paper presents a comprehensive analysis of explainable fact-checking through a series of experiments, focusing on the ability of large language models to verify public health claims and provide explanations or justifications for their veracity assessments.
This paper explores a novel multi-modal alternating learning paradigm pursuing a reconciliation between the exploitation of uni-modal features and the exploration of cross-modal interactions.
To evaluate the generalization results, we build a novel MVS domain generalization benchmark including synthetic and real-world datasets.
We introduce Xmodel-VLM, a cutting-edge multimodal vision language model.
Significant progress has been made in the field of handwritten mathematical expression recognition, while existing encoder-decoder methods are usually difficult to model global information in \LaTeX.
Accurate detection of vulvovaginal candidiasis is critical for women's health, yet its sparse distribution and visually ambiguous characteristics pose significant challenges for accurate identification by pathologists and neural networks alike.
The effectiveness of the lens functions is demonstrated in two use cases and their computational cost is analysed in a synthetic benchmark.
To this end, we first develop OpenGait, a flexible and efficient gait recognition platform.