Cette étude explore la capacité d’auto-correction dans le cas d’un analyseur neuronal par transitions.
To exploit the progressive interactions among these regions, we represent them as a region graph, on which the parts relation reasoning is performed with graph convolutions, thus leading to our PRR branch.
On the other hand, mutual instance selection further selects reliable and informative instances for training according to the peer-confidence and relationship disagreement of the networks.
Visual-based 3D semantic occupancy perception (also known as 3D semantic scene completion) is a new perception paradigm for robotic applications like autonomous driving.
1 code implementation • 27 Jul 2023 • Lingdong Kong, Yaru Niu, Shaoyuan Xie, Hanjiang Hu, Lai Xing Ng, Benoit R. Cottereau, Ding Zhao, Liangjun Zhang, Hesheng Wang, Wei Tsang Ooi, Ruijie Zhu, Ziyang Song, Li Liu, Tianzhu Zhang, Jun Yu, Mohan Jing, Pengwei Li, Xiaohua Qi, Cheng Jin, Yingfeng Chen, Jie Hou, Jie Zhang, Zhen Kan, Qiang Ling, Liang Peng, Minglei Li, Di Xu, Changpeng Yang, Yuanqi Yao, Gang Wu, Jian Kuai, Xianming Liu, Junjun Jiang, Jiamian Huang, Baojun Li, Jiale Chen, Shuang Zhang, Sun Ao, Zhenyu Li, Runze Chen, Haiyong Luo, Fang Zhao, Jingze Yu
In this paper, we summarize the winning solutions from the RoboDepth Challenge -- an academic competition designed to facilitate and advance robust OoD depth estimation.
Once the anchor transformations are found, per-vertex nonlinear displacements of the garment template can be regressed in a canonical space, which reduces the complexity of deformation space learning.
Driven by our explored distance-based losses that explicitly model the motion semantics and geometry, these two modules can learn residual motion modifications on the source motion to generate plausible retargeted motion in a single inference without post-processing.
Particularly, we propose i) semantic contrast to drive network learning by contrasting massive categorical object regions, leading to a more holistic object pattern understanding, and ii) semantic aggregation to gather diverse relational contexts in the memory to enrich semantic representations.
To tackle these challenges, we propose a meta learning based framework, MetaTTE, to continuously provide accurate travel time estimation over time by leveraging well-designed deep neural network model called DED, which consists of Data preprocessing module and Encoder-Decoder network module.
Traffic forecasting is crucial for public safety and resource optimization, yet is very challenging due to three aspects: i) current existing works mostly exploit intricate temporal patterns (e. g., the short-term thunderstorm and long-term daily trends) within a single method, which fail to accurately capture spatio-temporal dependencies under different schemas; ii) the under-exploration of the graph positional encoding limit the extraction of spatial information in the commonly used full graph attention network; iii) the quadratic complexity of the full graph attention introduces heavy computational needs.
Besides, we propose a novel encoder-decoder architecture to incorporate the cross-time dynamic graph-based GCN for multi-step traffic forecasting.
In this paper, we propose a novel dynamic multi-graph convolution recurrent network (DMGCRN) to tackle above issues, which can model the spatial correlations of distance, the spatial correlations of structure, and the temporal correlations simultaneously.
Traffic prediction has gradually attracted the attention of researchers because of the increase in traffic big data.
While single-view 3D reconstruction has made significant progress benefiting from deep shape representations in recent years, garment reconstruction is still not solved well due to open surfaces, diverse topologies and complex geometric details.
The comparative experiments of knowledge distillation on six public datasets also demonstrate that the SMLDist outperforms other advanced knowledge distillation methods of students' performance, which verifies the good generalization of the SMLDist on other classification tasks, including but not limited to HAR.
We first use a coarse deraining network to reduce the rain streaks on the input images, and then adopt a pre-trained semantic segmentation network to extract semantic features from the coarse derained image.
This paper proposes a human parsing based texture transfer model via cross-view consistency learning to generate the texture of 3D human body from a single image.
In this way, human annotations are no longer required, and it is scalable to large and diverse real-world datasets.
This paper proposes a novel light-weight module, the Attentive WaveBlock (AWB), which can be integrated into the dual networks of mutual learning to enhance the complementarity and further depress noise in the pseudo-labels.
Ranked #2 on Unsupervised Domain Adaptation on Duke to MSMT
In this paper, we study the challenging unconstrained set-based face recognition problem where each subject face is instantiated by a set of media (images and videos) instead of a single image.
1 code implementation • 2 Sep 2018 • Jian Zhao, Yu Cheng, Yi Cheng, Yang Yang, Haochong Lan, Fang Zhao, Lin Xiong, Yan Xu, Jianshu Li, Sugiri Pranata, ShengMei Shen, Junliang Xing, Hengzhu Liu, Shuicheng Yan, Jiashi Feng
Benchmarking our model on one of the most popular unconstrained face recognition datasets IJB-C additionally verifies the promising generalizability of AIM in recognizing faces in the wild.
Ranked #1 on Age-Invariant Face Recognition on MORPH Album2
This paper proposes a novel Dynamic Conditional Convolutional Network (DCCN) to handle conditional few-shot learning, i. e, only a few training samples are available for each condition.
In this paper, we propose a novel weakly supervised model, Multi-scale Anchored Transformer Network (MATN), to accurately localize free-form textual phrases with only image-level supervision.
To this end, we propose a Pose Invariant Model (PIM) for face recognition in the wild, with three distinct novelties.
In particular, we employ an off-the-shelf 3D face model as a simulator to generate profile face images with varying poses.
Ranked #1 on Face Verification on IJB-A
Specifically, iFAN achieves an overall F-score of 91. 15% on the Helen dataset for face parsing, a normalized mean error of 5. 81% on the MTFL dataset for facial landmark localization and an accuracy of 45. 73% on the BNU dataset for emotion recognition with a single model.
The first one, named multi-scale spatial LSTM encoder, reads facial patches of various scales sequentially to output a latent representation, and occlusion-robustness is achieved owing to the fact that the influence of occlusion is only upon some of the patches.
To address this essentially ill-posed problem, we introduce a Deep Edge Guided REcurrent rEsidual~(DEGREE) network to progressively recover the high-frequency details.
Research efforts have been devoted to learning compact binary codes that preserve semantic similarity based on labels.
Unstructured social group activity recognition in web videos is a challenging task due to 1) the semantic gap between class labels and low-level visual features and 2) the lack of labeled training data.