self-prompting analogical reasoning for uav object detection
Unmanned Aerial Vehicle Object Detection (UAVOD) presents unique challenges due to varying altitudes, dynamic backgrounds, and the small size of objects. Traditional de- tection methods often struggle with these challenges, as they typically rely on visual features only and fail to ex- tract the semantic relations between the objects. To address these limitations,we propose a novel approach named Self- Prompting Analogical Reasoning (SPAR). Ourmethod uti- lizes the vision-languagemodel (CLIP) to generate context- aware prompts based on image features, providing rich se- mantic information that guides analogical reasoning. SPAR includes two main modules: self-prompting and analogi- cal reasoning. Self-prompting module based on learnable description and CLIP-text encoder generates context-aware prompt by combining specific image feature; then an object- ness prompt scoremap is produced by computing the simi- laritybetweenpixel-level features andcontext-awareprompt. Withthis scoremap,multi-scale image features are enhanced and pixel-level features are chosen for graph construction. While for analogical reasoningmodule, graph nodes consist of category-level prompt nodes and pixel-level image feature nodes.Analogical inference is based on graph convolution. Under the guidance of category-level nodes, different-scale object features have been enhanced, which helps achieve more accuratedetectionof challengingobjects.Extensive ex- periments illustrate that SPARoutperforms traditionalmeth- ods,offeringamorerobustandaccuratesolutionforUAVOD.
PDF Abstract