Out-of-distribution (OOD) detection plays a crucial role in ensuring the security of neural networks.
Based on this process, we introduce SGAS, a model for part editing that employs two strategies: feature disentanglement and constraint.
While most of the conventional MIL methods use attention scores to estimate instance importance scores (IIS) which contribute to the prediction of the slide labels, these often lead to skewed attention distributions and inaccuracies in identifying crucial instances.
Then, to better utilize image attributes in aesthetic assessment, we propose the Unified Multi-attribute Aesthetic Assessment Framework (UMAAF) to model both absolute and relative attributes of images.
Specifically, we first develop two specialized pre-trained diffusion models, i. e., Text-driven Diffusion Model (TDM) and Subject-augmented Diffusion Model (SDM), for scene and person generation, respectively.
Our DCQA dataset is expected to foster research on understanding visualizations in documents, especially for scenarios that require complex reasoning for charts in the visually-rich document.
Firstly, utilizing compressed evidence features as input to the model results in the loss of fine-grained information within the evidence.
To comprehensively understand their intrinsic semantics, in this paper, we obtain prototype representations for each type of event relation and propose a Prototype-Enhanced Matching (ProtoEM) framework for the joint extraction of multiple kinds of event relations.
This paper considers a type of incremental aggregated gradient (IAG) method for large-scale distributed optimization.
1 code implementation • 27 Jul 2023 • Lingdong Kong, Yaru Niu, Shaoyuan Xie, Hanjiang Hu, Lai Xing Ng, Benoit R. Cottereau, Ding Zhao, Liangjun Zhang, Hesheng Wang, Wei Tsang Ooi, Ruijie Zhu, Ziyang Song, Li Liu, Tianzhu Zhang, Jun Yu, Mohan Jing, Pengwei Li, Xiaohua Qi, Cheng Jin, Yingfeng Chen, Jie Hou, Jie Zhang, Zhen Kan, Qiang Ling, Liang Peng, Minglei Li, Di Xu, Changpeng Yang, Yuanqi Yao, Gang Wu, Jian Kuai, Xianming Liu, Junjun Jiang, Jiamian Huang, Baojun Li, Jiale Chen, Shuang Zhang, Sun Ao, Zhenyu Li, Runze Chen, Haiyong Luo, Fang Zhao, Jingze Yu
In this paper, we summarize the winning solutions from the RoboDepth Challenge -- an academic competition designed to facilitate and advance robust OoD depth estimation.
Retrieving charts from a large corpus is a fundamental task that can benefit numerous applications such as visualization recommendations. The retrieved results are expected to conform to both explicit visual attributes (e. g., chart type, colormap) and implicit user intents (e. g., design style, context information) that vary upon application scenarios.
Transformer is beneficial for image denoising tasks since it can model long-range dependencies to overcome the limitations presented by inductive convolutional biases.
Thus, label-efficient deep learning methods are developed to make comprehensive use of the labeled data as well as the abundance of unlabeled and weak-labeled data.
We present the Aggregated Text TRansformer(ATTR), which is designed to represent texts in scene images with a multi-scale self-attention mechanism.
Scene text erasing seeks to erase text contents from scene images and current state-of-the-art text erasing models are trained on large-scale synthetic data.
Spiking Neural Networks (SNNs) have piqued researchers' interest because of their capacity to process temporal information and low power consumption.
Besides, the missing patterns are diverse in reality, but existing methods can only handle fixed ones, which means a poor generalization ability.
As details are missing in most representations of structures, the lack of controllability to details is one of the major weaknesses in structure-based controllable point cloud generation.
The points generated by AXform do not have the strong 2-manifold constraint, which improves the generation of non-smooth surfaces.
To measure the proposed image layer modeling method, we propose a manually-labeled non-Manhattan layout fine-grained segmentation dataset named FPD.
We describe the architecture and algorithms of the Adaptive Charging Network (ACN), which was first deployed on the Caltech campus in early 2016 and is currently operating at over 100 other sites in the United States.
In this paper, we introduce the One-sample Guided Object Representation Disassembling (One-GORD) method, which only requires one annotated sample for each object category to learn disassembled object representation from unannotated images.
Segmentation of infected areas in chest CT volumes is of great significance for further diagnosis and treatment of COVID-19 patients.
As an effective way of metric learning, triplet loss has been widely used in many deep learning tasks, including face recognition and person-ReID, leading to many states of the arts.