Recently, significant progress has been made in the development of Vision Language Models (VLMs), expanding the capabilities of LLMs and enabling them to execute more diverse instructions.
In intent detection tasks, leveraging meaningful semantic information from intent labels can be particularly beneficial for few-shot scenarios.
In this work, we seek to address these challenges and propose Spectral Adversarial Training (SAT), a simple yet effective adversarial training approach for GNNs.
To address this issue, we propose a neural architecture search method named FlowNAS to automatically find the better encoder architecture for flow estimation task.
1 code implementation • 30 May 2022 • Kaicheng Yu, Tang Tao, Hongwei Xie, Zhiwei Lin, Zhongwei Wu, Zhongyu Xia, TingTing Liang, Haiyang Sun, Jiong Deng, Dayang Hao, Yongtao Wang, Xiaodan Liang, Bing Wang
There are two critical sensors for 3D perception in autonomous driving, the camera and the LiDAR.
Fusing the camera and LiDAR information has become a de-facto standard for 3D object detection tasks.
Recently, conversational OpenQA is proposed to address these issues with the abundant contextual information in the conversation.
With multi-scale testing, we push the current best single model result to a new record of 60. 1% box AP and 52. 3% mask AP without using extra training data.
Ranked #6 on Object Detection on COCO-O (using extra training data)
Encouraged by the success, we propose a novel One-Shot Path Aggregation Network Architecture Search (OPANAS) algorithm, which significantly improves both searching efficiency and detection accuracy.
Our studies show that mixup is a domain-independent data augmentation technique to pre-trained language models, resulting in significant performance improvement for transformer-based models.
To efficiently extract spatiotemporal features of video for action recognition, most state-of-the-art methods integrate 1D temporal convolution into a conventional 2D CNN backbone.