State-of-the-art object detectors exploit multi-branch structure and predict objects at several different scales, although substantially boosted accuracy is acquired, low efficiency is inevitable as fragmented structure is hardware unfriendly.
It can work in a purely data-driven manner and thus is capable of auto-creating a group of suitable convolutions for geometric shape modeling.
Previous methods for skeleton-based gesture recognition mostly arrange the skeleton sequence into a pseudo picture or spatial-temporal graph and apply deep Convolutional Neural Network (CNN) or Graph Convolutional Network (GCN) for feature extraction.
In recent years, deep learning methods bring incredible progress to the field of object detection.
Specifically, KTNet is constructed on a base detector with intrinsic knowledge mining and relational knowledge constraints.
To address these issues, we propose a novel framework named Structure Learning Convolution (SLC) that enables to extend the traditional convolutional neural network (CNN) to graph domains and learn the graph structure for traffic forecasting.
Ranked #2 on Traffic Prediction on METR-LA
In this paper, we begin by first analyzing the design defects of feature pyramid in FPN, and then introduce a new feature pyramid architecture named AugFPN to address these problems.
Neural architecture search (NAS) is inherently subject to the gap of architectures during searching and validating.
Transferring existing image-based detectors to the video is non-trivial since the quality of frames is always deteriorated by part occlusion, rare pose, and motion blur.
To address these problems, we propose an Adaptive Semantic Guidance Network (ASGN), which instantiates the whole video semantics to different POS-aware semantics with the supervision of part of speech (POS) tag.
In our model, we decouple character images into style representation and content representation, which facilitates more precise control of these two types of variables, thereby improving the quality of the generated results.
no code implementations • • Dawei Du, Pengfei Zhu, Longyin Wen, Xiao Bian, Haibin Lin, QinGhua Hu, Tao Peng, Jiayu Zheng, Xinyao Wang, Yue Zhang, Liefeng Bo, Hailin Shi, Rui Zhu, Aashish Kumar, Aijin Li, Almaz Zinollayev, Anuar Askergaliyev, Arne Schumann, Binjie Mao, Byeongwon Lee, Chang Liu, Changrui Chen, Chunhong Pan, Chunlei Huo, Da Yu, Dechun Cong, Dening Zeng, Dheeraj Reddy Pailla, Di Li, Dong Wang, Donghyeon Cho, Dongyu Zhang, Furui Bai, George Jose, Guangyu Gao, Guizhong Liu, Haitao Xiong, Hao Qi, Haoran Wang, Heqian Qiu, Hongliang Li, Huchuan Lu, Ildoo Kim, Jaekyum Kim, Jane Shen, Jihoon Lee, Jing Ge, Jingjing Xu, Jingkai Zhou, Jonas Meier, Jun Won Choi, Junhao Hu, Junyi Zhang, Junying Huang, Kaiqi Huang, Keyang Wang, Lars Sommer, Lei Jin, Lei Zhang
Results of 33 object detection algorithms are presented.
Point cloud processing is very challenging, as the diverse shapes formed by irregular points are often indistinguishable.
Ranked #9 on 3D Part Segmentation on ShapeNet-Part
For network architecture search (NAS), it is crucial but challenging to simultaneously guarantee both effectiveness and efficiency.
Traditional clustering methods often perform clustering with low-level indiscriminative representations and ignore relationships between patterns, resulting in slight achievements in the era of deep learning.
Under the new schema, the proposed method can achieve superior accuracy (WIDER FACE Val/Test -- Easy: 0. 910/0. 896, Medium: 0. 881/0. 865, Hard: 0. 780/0. 770; FDDB -- discontinuous: 0. 973, continuous: 0. 724).
Ranked #7 on Face Detection on FDDB
Specifically, the convolutional weight for local point set is forced to learn a high-level relation expression from predefined geometric priors, between a sampled point from this point set and the others.
Ranked #17 on 3D Part Segmentation on ShapeNet-Part (Instance Average IoU metric)
Instead of relying on optical flow, this paper proposes a novel module called Progressive Sparse Local Attention (PSLA), which establishes the spatial correspondence between features across frames in a local region with progressively sparser stride and uses the correspondence to propagate features.
Convolutional neural networks (CNNs) are inherently subject to invariable filters that can only aggregate local inputs with the same topological structures.
Here our goal is to automatically find a compact neural network model with high performance that is suitable for mobile devices.
Specifically, given the image-level annotations, (1) we first develop a weakly-supervised detection (WSD) model, and then (2) construct an end-to-end multi-label image classification framework augmented by a knowledge distillation module that guides the classification model by the WSD model according to the class-level predictions for the whole image and the object-level visual features for object RoIs.
Ranked #8 on Multi-Label Classification on NUS-WIDE
This paper proposes a segment-free method for geometric rectification of a distorted document image captured by a hand-held camera.
Specifically, for confusing manmade objects, ScasNet improves the labeling coherence with sequential global-to-local contexts aggregation.
To obtain a comprehensive evaluation, we choose to include both float type features and binary ones.
(2) A multi-integer-embedding is employed for compressing the whole database, which is modeled by binary sparse representation with fixed sparsity.
Binary features have been incrementally popular in the past few years due to their low memory footprints and the efficient computation of Hamming distance between binary descriptors.
In this paper, we propose an efficient method for accurate extraction of these virtual visual cues from a curved document image.
Finally, to overcome the ineffectiveness of current methods in the road intersection, a fitting based road centerline connection algorithm is proposed.
A new approach to the problem has been raised which intends to match features of different modalities directly.
Subset selection from massive data with noised information is increasingly popular for various applications.
Ranked #6 on Named Entity Recognition on SciERC (using extra training data)
Based on this observation, we exploit a learning-based sparsity method to simultaneously learn the HU results and a sparse guidance map.
With this constraint, our method can learn a compact space, where highly similar pixels are grouped to share correlated sparse representations.
Hyperspectral unmixing, the process of estimating a common set of spectral bases and their corresponding composite percentages at each pixel, is an important task for hyperspectral analysis, visualization and understanding.