no code implementations • 30 Oct 2024 • Jyh-Jing Hwang, Runsheng Xu, Hubert Lin, Wei-Chih Hung, Jingwei Ji, Kristy Choi, Di Huang, Tong He, Paul Covington, Benjamin Sapp, James Guo, Dragomir Anguelov, Mingxing Tan
We show that co-training EMMA with planner trajectories, object detection, and road graph tasks yields improvements across all three domains, highlighting EMMA's potential as a generalist model for autonomous driving applications.
no code implementations • 5 May 2024 • Zhaoqi Leng, Pei Sun, Tong He, Dragomir Anguelov, Mingxing Tan
3D object detectors for point clouds often rely on a pooling-based PointNet to encode sparse points into grid-like voxels or pillars.
no code implementations • 30 Apr 2024 • Longlong Jing, Ruichi Yu, Xu Chen, Zhengli Zhao, Shiwei Sheng, Colin Graber, Qi Chen, Qinru Li, Shangxuan Wu, Han Deng, Sangjin Lee, Chris Sweeney, Qiurui He, Wei-Chih Hung, Tong He, Xingyi Zhou, Farshid Moussavi, Zijian Guo, Yin Zhou, Mingxing Tan, Weilong Yang, CongCong Li
In this paper, we propose STT, a Stateful Tracking model built with Transformers, that can consistently track objects in the scenes while also predicting their states accurately.
no code implementations • 28 Sep 2023 • Tong He, Pei Sun, Zhaoqi Leng, Chenxi Liu, Dragomir Anguelov, Mingxing Tan
We propose a late-to-early recurrent feature fusion scheme for 3D object detection using temporal LiDAR point clouds.
no code implementations • 7 Apr 2023 • Kan Chen, Runzhou Ge, Hang Qiu, Rami Ai-Rfou, Charles R. Qi, Xuanyu Zhou, Zoey Yang, Scott Ettinger, Pei Sun, Zhaoqi Leng, Mustafa Baniodeh, Ivan Bogun, Weiyue Wang, Mingxing Tan, Dragomir Anguelov
To study the effect of these modular approaches, design new paradigms that mitigate these limitations, and accelerate the development of end-to-end motion forecasting models, we augment the Waymo Open Motion Dataset (WOMD) with large-scale, high-quality, diverse LiDAR data for the motion forecasting task.
no code implementations • 24 Oct 2022 • Zhaoqi Leng, Guowang Li, Chenxi Liu, Ekin Dogus Cubuk, Pei Sun, Tong He, Dragomir Anguelov, Mingxing Tan
Data augmentations are important in training high-performance 3D object detectors for point clouds.
no code implementations • 24 Oct 2022 • Zhaoqi Leng, Shuyang Cheng, Benjamin Caine, Weiyue Wang, Xiao Zhang, Jonathon Shlens, Mingxing Tan, Dragomir Anguelov
To alleviate the cost of hyperparameter tuning and iterative pseudo labeling, we develop a population-based data augmentation framework for 3D detection, named AutoPseudoAugment.
no code implementations • 13 Oct 2022 • Pei Sun, Mingxing Tan, Weiyue Wang, Chenxi Liu, Fei Xia, Zhaoqi Leng, Dragomir Anguelov
3D object detection in point clouds is a core component for modern robotics and autonomous driving systems.
no code implementations • 10 Oct 2022 • Chenxi Liu, Zhaoqi Leng, Pei Sun, Shuyang Cheng, Charles R. Qi, Yin Zhou, Mingxing Tan, Dragomir Anguelov
Developing neural models that accurately understand objects in 3D point clouds is essential for the success of robotics and autonomous driving.
8 code implementations • ICLR 2022 • Zhaoqi Leng, Mingxing Tan, Chenxi Liu, Ekin Dogus Cubuk, Xiaojie Shi, Shuyang Cheng, Dragomir Anguelov
Cross-entropy loss and focal loss are the most common choices when training deep neural networks for classification problems.
Ranked #102 on Image Classification on ImageNet
no code implementations • 23 Mar 2022 • Tianjian Meng, Golnaz Ghiasi, Reza Mahjourian, Quoc V. Le, Mingxing Tan
It is commonly believed that high internal resolution combined with expensive operations (e. g. atrous convolutions) are necessary for accurate semantic segmentation, resulting in slow speed and large memory usage.
1 code implementation • CVPR 2022 • Yingwei Li, Adams Wei Yu, Tianjian Meng, Ben Caine, Jiquan Ngiam, Daiyi Peng, Junyang Shen, Bo Wu, Yifeng Lu, Denny Zhou, Quoc V. Le, Alan Yuille, Mingxing Tan
In this paper, we propose two novel techniques: InverseAug that inverses geometric-related augmentations, e. g., rotation, to enable accurate geometric alignment between lidar points and image pixels, and LearnableAlign that leverages cross-attention to dynamically capture the correlations between image and lidar features during fusion.
no code implementations • 8 Mar 2022 • Reza Mahjourian, Jinkyu Kim, Yuning Chai, Mingxing Tan, Ben Sapp, Dragomir Anguelov
We propose Occupancy Flow Fields, a new representation for motion forecasting of multiple agents, an important task in autonomous driving.
no code implementations • 19 Nov 2021 • Hieu Pham, Zihang Dai, Golnaz Ghiasi, Kenji Kawaguchi, Hanxiao Liu, Adams Wei Yu, Jiahui Yu, Yi-Ting Chen, Minh-Thang Luong, Yonghui Wu, Mingxing Tan, Quoc V. Le
Second, while increasing the dataset size and the model size has been the defacto method to improve the performance of deep learning models like BASIC, the effect of a large contrastive batch size on such contrastive-trained image-text models is not well-understood.
14 code implementations • NeurIPS 2021 • Zihang Dai, Hanxiao Liu, Quoc V. Le, Mingxing Tan
Transformers have attracted increasing interests in computer vision, but they still fall behind state-of-the-art convolutional networks.
Ranked #1 on Image Classification on GasHisSDB
21 code implementations • 1 Apr 2021 • Mingxing Tan, Quoc V. Le
By pretraining on the same ImageNet21k, our EfficientNetV2 achieves 87. 3% top-1 accuracy on ImageNet ILSVRC2012, outperforming the recent ViT by 2. 0% accuracy while training 5x-11x faster using the same computing resources.
Ranked #3 on Image Classification on Stanford Cars
1 code implementation • CVPR 2021 • Xiangning Chen, Cihang Xie, Mingxing Tan, Li Zhang, Cho-Jui Hsieh, Boqing Gong
Data augmentation has become a de facto component for training high-performance deep image classifiers, but its potential is under-explored for object detection.
Ranked #17 on Object Detection on COCO-O
3 code implementations • CVPR 2021 • Dan Kondratyuk, Liangzhe Yuan, Yandong Li, Li Zhang, Mingxing Tan, Matthew Brown, Boqing Gong
We present Mobile Video Networks (MoViNets), a family of computation and memory efficient video networks that can operate on streaming video for online inference.
Ranked #3 on Action Classification on Charades
no code implementations • 17 Feb 2021 • Yanqi Zhou, Xuanyi Dong, Berkin Akin, Mingxing Tan, Daiyi Peng, Tianjian Meng, Amir Yazdanbakhsh, Da Huang, Ravi Narayanaswami, James Laudon
In our work, we target the optimization of hardware and software configurations on an industry-standard edge accelerator.
no code implementations • CVPR 2021 • Sheng Li, Mingxing Tan, Ruoming Pang, Andrew Li, Liqun Cheng, Quoc Le, Norman P. Jouppi
On top of our DC accelerator optimized neural architecture search space, we further propose a latency-aware compound scaling (LACS), the first multi-objective compound scaling method optimizing both accuracy and latency.
8 code implementations • 7 Feb 2021 • Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh
The scalability of Nystr\"{o}mformer enables application to longer sequences with thousands of tokens.
Ranked #13 on Semantic Textual Similarity on MRPC (F1 metric)
no code implementations • NeurIPS 2020 • Daiyi Peng, Xuanyi Dong, Esteban Real, Mingxing Tan, Yifeng Lu, Hanxiao Liu, Gabriel Bender, Adam Kraft, Chen Liang, Quoc V. Le
As a result, AutoML can be reformulated as an automated process of symbolic manipulation.
no code implementations • 1 Jan 2021 • Yanqi Zhou, Xuanyi Dong, Daiyi Peng, Ethan Zhu, Amir Yazdanbakhsh, Berkin Akin, Mingxing Tan, James Laudon
In this paper, we study the importance of co-designing neural architectures and hardware accelerators.
no code implementations • 30 Oct 2020 • Arissa Wongpanich, Hieu Pham, James Demmel, Mingxing Tan, Quoc Le, Yang You, Sameer Kumar
EfficientNets are a family of state-of-the-art image classification models based on efficiently scaled convolutional neural networks.
no code implementations • ECCV 2020 • Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Yin Cui, Mingxing Tan, Quoc Le, Xiaodan Song
Furthermore, SpineNet is built with a uniform resource distribution over operations.
1 code implementation • ICLR 2021 • Yingwei Li, Qihang Yu, Mingxing Tan, Jieru Mei, Peng Tang, Wei Shen, Alan Yuille, Cihang Xie
To prevent models from exclusively attending on a single cue in representation learning, we augment training data with images with conflicting shape and texture information (eg, an image of chimpanzee shape but with lemon texture) and, most importantly, provide the corresponding supervisions from shape and texture simultaneously.
Ranked #618 on Image Classification on ImageNet
no code implementations • ICML 2020 • Denny Zhou, Mao Ye, Chen Chen, Tianjian Meng, Mingxing Tan, Xiaodan Song, Quoc Le, Qiang Liu, Dale Schuurmans
This is achieved by layerwise imitation, that is, forcing the thin network to mimic the intermediate outputs of the wide network from layer to layer.
1 code implementation • 25 Jun 2020 • Cihang Xie, Mingxing Tan, Boqing Gong, Alan Yuille, Quoc V. Le
SAT also works well with larger networks: it helps EfficientNet-L1 to achieve 82. 2% accuracy and 58. 6% robustness on ImageNet, outperforming the previous state-of-the-art defense by 9. 5% for accuracy and 11. 6% for robustness.
no code implementations • 5 Jun 2020 • Xuanyi Dong, Mingxing Tan, Adams Wei Yu, Daiyi Peng, Bogdan Gabrys, Quoc V. Le
Efficient hyperparameter or architecture search methods have shown remarkable results, but each of them is only applicable to searching for either hyperparameters (HPs) or architectures.
no code implementations • 1 May 2020 • Dan Kondratyuk, Mingxing Tan, Matthew Brown, Boqing Gong
Ensembling is a simple and popular technique for boosting evaluation performance by training multiple models (e. g., with different initializations) and aggregating their predictions.
4 code implementations • CVPR 2021 • Yunyang Xiong, Hanxiao Liu, Suyog Gupta, Berkin Akin, Gabriel Bender, Yongzhe Wang, Pieter-Jan Kindermans, Mingxing Tan, Vikas Singh, Bo Chen
By incorporating regular convolutions in the search space and directly optimizing the network architectures for object detection, we obtain a family of object detection models, MobileDets, that achieve state-of-the-art results across mobile accelerators.
1 code implementation • ECCV 2020 • Jiahui Yu, Pengchong Jin, Hanxiao Liu, Gabriel Bender, Pieter-Jan Kindermans, Mingxing Tan, Thomas Huang, Xiaodan Song, Ruoming Pang, Quoc Le
Without extra retraining or post-processing steps, we are able to train a single set of shared weights on ImageNet and use these weights to obtain child models whose sizes range from 200 to 1000 MFLOPs.
Ranked #30 on Neural Architecture Search on ImageNet
13 code implementations • CVPR 2020 • Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song
We propose SpineNet, a backbone with scale-permuted intermediate features and cross-scale connections that is learned on an object detection task by Neural Architecture Search.
Ranked #9 on Image Classification on iNaturalist
6 code implementations • CVPR 2020 • Cihang Xie, Mingxing Tan, Boqing Gong, Jiang Wang, Alan Yuille, Quoc V. Le
We show that AdvProp improves a wide range of models on various image recognition tasks and performs better when the models are bigger.
Ranked #4 on Domain Generalization on VizWiz-Classification
63 code implementations • CVPR 2020 • Mingxing Tan, Ruoming Pang, Quoc V. Le
Model efficiency has become increasingly important in computer vision.
Ranked #7 on Object Detection on COCO minival (APS metric)
no code implementations • CVPR 2020 • Yu Liu, Xuhui Jia, Mingxing Tan, Raviteja Vemulapalli, Yukun Zhu, Bradley Green, Xiaogang Wang
Standard Knowledge Distillation (KD) approaches distill the knowledge of a cumbersome teacher model into the parameters of a student model with a pre-defined architecture.
no code implementations • 25 Sep 2019 • Krzysztof Maziarz, Mingxing Tan, Andrey Khorlin, Kuang-Yu Samuel Chang, Andrea Gesmundo
We show that the Evo-NAS agent outperforms both neural and evolutionary agents when applied to architecture search for a suite of text and image classification benchmarks.
no code implementations • 25 Sep 2019 • Jiahui Yu, Pengchong Jin, Hanxiao Liu, Gabriel Bender, Pieter-Jan Kindermans, Mingxing Tan, Thomas Huang, Xiaodan Song, Quoc Le
In this work, we propose BigNAS, an approach that simplifies this workflow and scales up neural architecture search to target a wide range of model sizes simultaneously.
13 code implementations • 22 Jul 2019 • Mingxing Tan, Quoc V. Le
In this paper, we systematically study the impact of different kernel sizes, and observe that combining the benefits of multiple kernel sizes can lead to better accuracy and efficiency.
Ranked #756 on Image Classification on ImageNet
2 code implementations • ICLR 2020 • Michael S. Ryoo, AJ Piergiovanni, Mingxing Tan, Anelia Angelova
Learning to represent videos is a very challenging task both algorithmically and computationally.
137 code implementations • ICML 2019 • Mingxing Tan, Quoc V. Le
Convolutional Neural Networks (ConvNets) are commonly developed at a fixed resource budget, and then scaled up for better accuracy if more resources are available.
Ranked #1 on Medical Image Classification on NCT-CRC-HE-100K
63 code implementations • ICCV 2019 • Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, Quoc V. Le, Hartwig Adam
We achieve new state of the art results for mobile classification, detection and segmentation.
Ranked #9 on Dichotomous Image Segmentation on DIS-TE1
no code implementations • 27 Dec 2018 • Stanisław Jastrzębski, Quentin de Laroussilhe, Mingxing Tan, Xiao Ma, Neil Houlsby, Andrea Gesmundo
However, the success of NAS depends on the definition of the search space.
no code implementations • 24 Nov 2018 • Krzysztof Maziarz, Mingxing Tan, Andrey Khorlin, Marin Georgiev, Andrea Gesmundo
We show that the Evo-NAS agent outperforms both neural and evolutionary agents when applied to architecture search for a suite of text and image classification benchmarks.
28 code implementations • CVPR 2019 • Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, Quoc V. Le
In this paper, we propose an automated mobile neural architecture search (MNAS) approach, which explicitly incorporate model latency into the main objective so that the search can identify a model that achieves a good trade-off between accuracy and latency.
Ranked #854 on Image Classification on ImageNet