no code implementations • 30 Jun 2022 • Yuting Wang, Hangning Zhou, Zhigang Zhang, Chen Feng, Huadong Lin, Chaofei Gao, Yizhi Tang, Zhenting Zhao, Shiyu Zhang, Jie Guo, Xuefeng Wang, Ziyao Xu, Chi Zhang
This technical report presents an effective method for motion prediction in autonomous driving.
Ranked #15 on Motion Forecasting on Argoverse CVPR 2020
This paper describes a CNN-based multi-frame post-processing approach based on a perceptually-inspired Generative Adversarial Network architecture, CVEGAN.
We are interested in anticipating as early as possible the target location of a person's object manipulation action in a 3D workspace from egocentric vision.
In this study, we focus on post-training quantization (PTQ) algorithms that quantize a model to low-bit (less than 8-bit) precision with only a small set of calibration data and benchmark them on different tinyML use cases.
In recent years, deep learning techniques have been widely applied to video quality assessment (VQA), showing significant potential to achieve higher correlation performance with subjective opinions compared to conventional approaches.
Vehicle-to-everything (V2X), which denotes the collaboration between a vehicle and any entity in its surrounding, can fundamentally improve the perception in self-driving systems.
Our finding shows that a leading factor in determining recognition versus reconstruction is how dispersed the training data is.
Despite the large progress in supervised learning with Neural Networks, there are significant challenges in obtaining high-quality, large-scale and accurately labeled datasets.
Ranked #1 on Learning with noisy labels on ANIMAL
Our approach is validated on V2X-Sim 1. 0, a large-scale multi-agent perception dataset that we synthesized using CARLA and SUMO co-simulation.
Visual place recognition (VPR) is critical in not only localization and mapping for autonomous driving vehicles, but also assistive navigation for the visually impaired population.
Based on the fact that self-supervised learning is helpful when a large number of data are available, we propose two self-supervised learning approaches to improve the baseline performance for view consistency reasoning and camera pose reasoning tasks on the SPARE3D dataset.
We need intelligent robots for mobile construction, the process of navigating in an environment and modifying its structure according to a geometric design.
LiDAR point clouds collected from a moving vehicle are functions of its trajectories, because the sensor motion needs to be compensated to avoid distortions.
We need intelligent robots to perform mobile construction, the process of moving in an environment and modifying its geometry according to a design plan.
In the domain of visual tracking, most deep learning-based trackers highlight the accuracy but casting aside efficiency.
In recent years, video compression techniques have been significantly challenged by the rapidly increased demands associated with high quality and immersive video content.
By repressing the response of distractors in the regressor learning, we can dynamically and adaptively alter our regression target to leverage the tracking robustness as well as adaptivity.
Current unmanned aerial vehicle (UAV) visual tracking algorithms are primarily limited with respect to: (i) the kind of size variation they can deal with, (ii) the implementation speed which hardly meets the real-time requirement.
In this paper, we present a novel framework for jointly predicting landmark locations, associated uncertainties of these predicted locations, and landmark visibilities.
Ranked #1 on Face Alignment on MERL-RAV
Our experiments show that although convolutional networks have achieved superhuman performance in many visual learning tasks, their spatial reasoning performance on SPARE3D tasks is either lower than average human performance or even close to random guesses.
Recently, the speaker clustering model based on aggregation hierarchy cluster (AHC) is a common method to solve two main problems: no preset category number clustering and fix category number clustering.
First, we propose a new incentive analysis that takes the network capacity into account, showing that Bitcoin-NG can still maintain incentive compatibility against the microblock mining attack even under limited network capacity.
Cryptography and Security Distributed, Parallel, and Cluster Computing
Inspired by the Thomson problem in physics where the distribution of multiple propelling electrons on a unit sphere can be modeled via minimizing some potential energy, hyperspherical energy minimization has demonstrated its potential in regularizing neural networks and improving their generalization power.
0-1 knapsack is of fundamental importance in computer science, business, operations research, etc.
The experimental results show that (1) the proposed networks outperform the state-of-the-art methods in various tasks; (2) a graph topology can be inferred as auxiliary information without specific supervision on graph topology inference; and (3) graph filtering refines the reconstruction, leading to better performances.
In this paper we propose a novel O-D prediction framework combining heterogeneous prediction in graph neural networks and Kalman filter to recognize spatial and temporal patterns simultaneously.
no code implementations • 15 Apr 2019 • Sergei Alyamkin, Matthew Ardi, Alexander C. Berg, Achille Brighton, Bo Chen, Yiran Chen, Hsin-Pai Cheng, Zichen Fan, Chen Feng, Bo Fu, Kent Gauen, Abhinav Goel, Alexander Goncharenko, Xuyang Guo, Soonhoi Ha, Andrew Howard, Xiao Hu, Yuanjun Huang, Donghyun Kang, Jaeyoun Kim, Jong Gook Ko, Alexander Kondratyev, Junhyeok Lee, Seungjae Lee, Suwoong Lee, Zichao Li, Zhiyu Liang, Juzheng Liu, Xin Liu, Yang Lu, Yung-Hsiang Lu, Deeptanshu Malik, Hong Hanh Nguyen, Eunbyung Park, Denis Repin, Liang Shen, Tao Sheng, Fei Sun, David Svitov, George K. Thiruvathukal, Baiwu Zhang, Jingchi Zhang, Xiaopeng Zhang, Shaojie Zhuo
In addition to mobile phones, many autonomous systems rely on visual data for making decisions and some of these systems have limited energy (such as unmanned aerial vehicles also called drones and mobile robots).
Soft bodies made from flexible and deformable materials are popular in many robotics applications, but their proprioceptive sensing has been a long-standing challenge.
The IEEE Low-Power Image Recognition Challenge (LPIRC) is an annual competition started in 2015 that encourages joint hardware and software solutions for computer vision systems with low latency and power.
We propose DeepMapping, a novel registration framework using deep neural networks (DNNs) as auxiliary functions to align multiple point clouds from scratch to a globally consistent frame.
no code implementations • 3 Oct 2018 • Sergei Alyamkin, Matthew Ardi, Achille Brighton, Alexander C. Berg, Yiran Chen, Hsin-Pai Cheng, Bo Chen, Zichen Fan, Chen Feng, Bo Fu, Kent Gauen, Jongkook Go, Alexander Goncharenko, Xuyang Guo, Hong Hanh Nguyen, Andrew Howard, Yuanjun Huang, Donghyun Kang, Jaeyoun Kim, Alexander Kondratyev, Seungjae Lee, Suwoong Lee, Junhyeok Lee, Zhiyu Liang, Xin Liu, Juzheng Liu, Zichao Li, Yang Lu, Yung-Hsiang Lu, Deeptanshu Malik, Eunbyung Park, Denis Repin, Tao Sheng, Liang Shen, Fei Sun, David Svitov, George K. Thiruvathukal, Baiwu Zhang, Jingchi Zhang, Xiaopeng Zhang, Shaojie Zhuo
The Low-Power Image Recognition Challenge (LPIRC, https://rebootingcomputing. ieee. org/lpirc) is an annual competition started in 2015.
To identify and fit geometric primitives (e. g., planes, spheres, cylinders, cones) in a noisy point cloud is a challenging yet beneficial task for fields such as robotics and reverse engineering.
Edge detection is among the most fundamental vision problems for its role in perceptual grouping and its wide applications.
In this paper, we propose VLASE, a framework to use semantic edge features from images to achieve on-road localization.
As deep learning (DL) is being rapidly pushed to edge computing, researchers invented various ways to make inference computation more efficient on mobile/IoT devices, such as network pruning, parameter compression, and etc.
Unlike on images, semantic learning on 3D point clouds using a deep network is challenging due to the naturally unordered data structure.
Recent deep networks that directly handle points in a point set, e. g., PointNet, have been state-of-the-art for supervised learning tasks on point clouds such as classification and segmentation.
Ranked #6 on 3D Point Cloud Linear Classification on ModelNet40
Often multiple instances of an object occur in the same scene, for example in a warehouse.
To this end, we propose a novel end-to-end deep semantic edge learning architecture based on ResNet and a new skip-layer architecture where category-wise edge activations at the top convolution layer share and are fused with the same set of bottom layer features.
Ranked #1 on Edge Detection on Cityscapes test
We use a general feature-extraction operator to represent application-dependent features and propose a general reconstruction error to evaluate the quality of resampling.