Structured data offers a sophisticated mechanism for the organization of information.
Automatic estimation of 3D human pose from monocular RGB images is a challenging and unsolved problem in computer vision.
Enhancing the domain generalization performance of Face Anti-Spoofing (FAS) techniques has emerged as a research focus.
In this study, we introduce an uncertainty-aware in-context learning framework to empower the model to enhance or reject its output in response to uncertainty.
Quantization has emerged as a promising direction for model compression.
To generate high-quality segmentation predictions from referring expressions, we propose CLIPUNetr - a new CLIP-driven referring expression segmentation network.
Besides, we construct the OpenPCSeg codebase, which is the largest and most comprehensive outdoor LiDAR segmentation codebase.
Ranked #2 on 3D Semantic Segmentation on SemanticKITTI (using extra training data)
Extensive experiments on Waymo Open Dataset show our DetZero outperforms all state-of-the-art onboard and offboard 3D detection methods.
Text-to-image generative models such as Stable Diffusion and DALL$\cdot$E raise many ethical concerns due to the generation of harmful images such as Not-Safe-for-Work (NSFW) ones.
Camera localization is a classical computer vision task that serves various Artificial Intelligence and Robotics applications.
Notably, LoGoNet ranks 1st on Waymo 3D object detection leaderboard and obtains 81. 02 mAPH (L2) detection performance.
Moreover, the hierarchical representations at both instance level and channel level can be coordinated by the heterogeneous information aggregation under the guidance of global view.
Federated learning (FL) allows multiple clients to collaboratively train a deep learning model.
In this paper, we present an efficient client-server visual localization architecture that fuses global and local pose estimations to realize promising precision and efficiency.
For disease prediction tasks, most existing graph-based methods tend to define the graph manually based on specified modality (e. g., demographic information), and then integrated other modalities to obtain the patient representation by Graph Representation Learning (GRL).
For image exposure enhancement, the tasks of Single-Exposure Correction (SEC) and Multi-Exposure Fusion (MEF) are widely studied in the image processing community.
High-precision camera re-localization technology in a pre-established 3D environment map is the basis for many tasks, such as Augmented Reality, Robotics and Autonomous Driving.
The localization pipeline is designed as a coarse-to-fine paradigm.
For the vehicle detection and tracking module, we adopted YOLOv5 and multi-scale tracking to localize the anomalies.
The success of the former heavily depends on the quality of the shadow model, i. e., the transferability between the shadow and the target; the latter, given only blackbox probing access to the target model, cannot make an effective inference of unknowns, compared with MI attacks using shadow models, due to the insufficient number of qualified samples labeled with ground truth membership information.
The K-means algorithm is arguably the most popular data clustering method, commonly applied to processed datasets in some "feature spaces", as is in spectral clustering.
The paper presents a Traffic Sign Recognition (TSR) system, which can fast and accurately recognize traffic signs of different sizes in images.