We demonstrate that learning different abstaining penalties, apart from point-wise penalty, for different types of (synthesized) outliers can further improve the performance.
We investigate the effects of these two types of domain gaps and propose a novel uncertainty-aware vision transformer to effectively relief the Deployment Gap and an agent-based feature adaptation module with inter-agent and ego-agent discriminators to reduce the Feature Gap.
This paper introduces a method for robot action sequence generation from instruction videos using (1) an audio-visual Transformer that converts audio-visual features and instruction speech to a sequence of robot actions called dynamic movement primitives (DMPs) and (2) style-transfer-based training that employs multi-task learning with video captioning and weakly-supervised learning with a semantic classifier to exploit unpaired video-action data.
To the best of our knowledge, this paper represents the first comprehensive survey on the topic of the deep transfer learning for intelligent vehicle perception.
While state-of-the-art NLP models have demonstrated excellent performance for aspect based sentiment analysis (ABSA), substantial evidence has been presented on their lack of robustness.
Comprehensive experiments demonstrate EfficientViT outperforms existing efficient models, striking a good trade-off between speed and accuracy.
no code implementations • 29 Apr 2023 • Zhenxiang Xiao, Yuzhong Chen, Lu Zhang, Junjie Yao, Zihao Wu, Xiaowei Yu, Yi Pan, Lin Zhao, Chong Ma, Xinyu Liu, Wei Liu, Xiang Li, Yixuan Yuan, Dinggang Shen, Dajiang Zhu, Tianming Liu, Xi Jiang
Prompts have been proven to play a crucial role in large language models, and in recent years, vision models have also been using prompts to improve scalability for multiple downstream tasks.
Thanks to the impressive progress of large-scale vision-language pretraining, recent recognition models can classify arbitrary objects in a zero-shot and open-set manner, with a surprisingly high accuracy.
To bridge the gap, we propose an end-to-end transformer-based architecture, ADAPT (Action-aware Driving cAPtion Transformer), which provides user-friendly natural language narrations and reasoning for each decision making step of autonomous vehicular control and action.
Due to the beneficial Vehicle-to-Vehicle (V2V) communication, the deep learning based features from other agents can be shared to the ego vehicle so as to improve the perception of the ego vehicle.
no code implementations • 7 Dec 2022 • Yinpeng Dong, Peng Chen, Senyou Deng, Lianji L, Yi Sun, Hanyu Zhao, Jiaxing Li, Yunteng Tan, Xinyu Liu, Yangyi Dong, Enhui Xu, Jincai Xu, Shu Xu, Xuelin Fu, Changfeng Sun, Haoliang Han, Xuchong Zhang, Shen Chen, Zhimin Sun, Junyi Cao, Taiping Yao, Shouhong Ding, Yu Wu, Jian Lin, Tianpeng Wu, Ye Wang, Yu Fu, Lin Feng, Kangkang Gao, Zeyu Liu, Yuanzhe Pang, Chengqi Duan, Huipeng Zhou, Yajie Wang, Yuhang Zhao, Shangbo Wu, Haoran Lyu, Zhiyu Lin, YiFei Gao, Shuang Li, Haonan Wang, Jitao Sang, Chen Ma, Junhao Zheng, Yijia Li, Chao Shen, Chenhao Lin, Zhichao Cui, Guoshuai Liu, Huafeng Shi, Kun Hu, Mengxin Zhang
The security of artificial intelligence (AI) is an important research area towards safe, reliable, and trustworthy AI systems.
Domain Adaptive Object Detection (DAOD) models a joint distribution of images and labels from an annotated source domain and learns a domain-invariant transformation to estimate the target labels with the given target domain images.
To overcome these challenges, we propose a novel SemantIc-complete Graph MAtching (SIGMA) framework for DAOD, which completes mismatched semantics and reformulates the adaptation with graph matching.
In this paper, we propose to extend Neural Architecture Search (NAS) technique for designing an optimal model for multiple facial attributes-based depression recognition, which can be efficiently and robustly implemented in a small dataset.
Existing VQA models can answer a compositional question well, but cannot work well in terms of reasoning consistency in answering the compositional question and its sub-questions.
We compare our method of mapping natural language task specifications to intermediate contextual queries against state-of-the-art CopyNet models capable of translating natural language to LTL, by evaluating whether correct LTL for manipulation and navigation task specifications can be output, and show that our method outperforms the CopyNet model on unseen object references.
1 code implementation • 7 Sep 2021 • Michaela Hardt, Xiaoguang Chen, Xiaoyi Cheng, Michele Donini, Jason Gelman, Satish Gollaprolu, John He, Pedro Larroy, Xinyu Liu, Nick McCarthy, Ashish Rathi, Scott Rees, Ankit Siva, ErhYuan Tsai, Keerthan Vasist, Pinar Yilmaz, Muhammad Bilal Zafar, Sanjiv Das, Kevin Haas, Tyler Hill, Krishnaram Kenthapadi
We present Amazon SageMaker Clarify, an explainability feature for Amazon SageMaker that launched in December 2020, providing insights into data and ML models by identifying biases and explaining predictions.
In this paper, a 3D-RegNet-based neural network is proposed for diagnosing the physical condition of patients with coronavirus (Covid-19) infection.
Lightweight or mobile neural networks used for real-time computer vision tasks contain fewer parameters than normal networks, which lead to a constrained performance.
With the aim of providing a comprehensive overview for researchers who are interested in developing a deep-learning-based analysis system for power lines inspection data, this paper conducts a thorough review of the current literature and identifies the challenges for future research.
This paper studies an unsupervised deep learning-based numerical approach for solving partial differential equations (PDEs).
Recent years has witnessed dramatic progress of neural machine translation (NMT), however, the method of manually guiding the translation procedure remains to be better explored.
We propose a nested recurrent neural network (nested RNN) model for English spelling error correction and generate pseudo data based on phonetic similarity to train it.
Vacuum-based end effectors are widely used in industry and are often preferred over parallel-jaw and multifinger grippers due to their ability to lift objects with a single point of contact.
To reduce data collection time for deep learning of robust robotic grasp plans, we explore training from a synthetic dataset of 6. 7 million point clouds, grasps, and analytic grasp metrics generated from thousands of 3D models from Dex-Net 1. 0 in randomized poses on a table.