This task is challenging because 3D scenes exhibit diverse patterns, ranging from continuous ones, such as object sizes and the relative poses between pairs of shapes, to discrete patterns, such as occurrence and co-occurrence of objects with symmetrical relationships.
In this paper, we rethink how a DNN encodes visual concepts of different complexities from a new perspective, i. e. the game-theoretic multi-order interactions between pixels in an image.
no code implementations • 30 Mar 2021 • Florian Laurent, Manuel Schneider, Christian Scheller, Jeremy Watson, Jiaoyang Li, Zhe Chen, Yi Zheng, Shao-Hung Chan, Konstantin Makhnev, Oleg Svidchenko, Vladimir Egorov, Dmitry Ivanov, Aleksei Shpilman, Evgenija Spirovska, Oliver Tanevski, Aleksandar Nikov, Ramon Grunder, David Galevski, Jakov Mitrovski, Guillaume Sartoretti, Zhiyao Luo, Mehul Damani, Nilabha Bhattacharya, Shivam Agarwal, Adrian Egli, Erik Nygren, Sharada Mohanty
However, the coordination of hundreds of agents in a real-life setting like a railway network remains challenging and the Flatland environment used for the competition models these real-world properties in a simplified manner.
Clinical diagnosis, which aims to assign diagnosis codes for a patient based on the clinical note, plays an essential role in clinical decision-making.
We introduce the task of historical text summarisation, where documents in historical forms of a language are summarised in the corresponding modern language.
A general three-dimensional (3D) non-stationary massive multiple-input multiple-output (MIMO) geometry-based stochastic model (GBSM) for the sixth generation (6G) communication systems is proposed in the paper.
A new label smoothing method that makes use of prior knowledge of a language at human level, homophone, is proposed in this paper for automatic speech recognition (ASR).
Our text spotting framework, called UHTA, combines UHT with the state-of-the-art text recognition system ASTER.
Ranked #3 on Scene Text Detection on Total-Text
We also propose a scrape-by-keywords methodology and used it to scrape ~30, 000 images and their captions of 38 Kenyan food types.
Recent years have witnessed increasing attention in cartoon media, powered by the strong demands of industrial applications.
Furthermore, we draw a series of heuristic conclusions from the intrinsic information hidden in true images.
no code implementations • 19 Nov 2018 • Yuanliu Liu, Bo Peng, Peipei Shi, He Yan, Yong Zhou, Bing Han, Yi Zheng, Chao Lin, Jianbin Jiang, Yin Fan, Tingwei Gao, Ganwen Wang, Jian Liu, Xiangju Lu, Danming Xie
Multi-modal person identification is a more promising way that we can jointly utilize face, head, body, audio features, and so on.
In this paper we develop a new framework that captures the common landscape underlying the common non-convex low-rank matrix problems including matrix sensing, matrix completion and robust PCA.