Video understanding requires reasoning at multiple spatiotemporal resolutions -- from short fine-grained motions to events taking place over longer durations.
Ranked #1 on Action Classification on Kinetics-400 (using extra training data)
For the goal of automated design of high-performance deep convolutional neural networks (CNNs), Neural Architecture Search (NAS) methodology is becoming increasingly important for both academia and industries. Due to the costly stochastic gradient descent (SGD) training of CNNs for performance evaluation, most existing NAS methods are computationally expensive for real-world deployments.
Dense video captioning aims to generate multiple associated captions with their temporal locations from the video.
Ranked #1 on Dense Video Captioning on YouCook2
Recent advancements in deep neural networks have made remarkable leap-forwards in dense image prediction.
Ranked #6 on Semantic Segmentation on ADE20K (using extra training data)
Under this family, we study Mask R-CNN and discover that instead of its default strategy of training the mask-head with a combination of proposals and groundtruth boxes, training the mask-head with only groundtruth boxes dramatically improves its performance on novel classes.
To address this issue, we introduce a new method for pre-training video action recognition models using queried web videos.
In the recent past, neural architecture search (NAS) has attracted increasing attention from both academia and industries.
In recent years, many works in the video action recognition literature have shown that two stream models (combining spatial and temporal input streams) are necessary for achieving state of the art performance.
Ranked #2 on Action Recognition on UCF101
Based on this observation, we propose to use text as a method for learning video representations.
In this paper, we propose an efficient NAS algorithm for generating task-specific models that are competitive under multiple competing objectives.
Ranked #4 on Neural Architecture Search on ImageNet
At the same time, the architecture search and transfer is orders of magnitude more efficient than existing NAS methods.
Ranked #1 on Neural Architecture Search on FGVC Aircraft
In contrast, we propose a general-purpose method that works on both indoor and outdoor scenes.
To overcome this limitation, we present MUXConv, a layer that is designed to increase the flow of information by progressively multiplexing channel and spatial information in the network, while mitigating computational complexity.
Ranked #4 on Neural Architecture Search on CIFAR-100
Traditionally multi-object tracking and object detection are performed using separate systems with most prior works focusing exclusively on one of these aspects over the other.
Ranked #1 on Multiple Object Tracking on Waymo Open Dataset
While existing approaches have achieved competitive performance in image classification, they are not well suited to problems where the computational budget is limited for two reasons: (1) the obtained architectures are either solely optimized for classification performance, or only for one deployment scenario; (2) the search process requires vast computational resources in most approaches.
Ranked #1 on Pneumonia Detection on ChestX-ray14
This paper introduces NSGA-Net -- an evolutionary approach for neural architecture search (NAS).
This paper introduces NSGA-Net, an evolutionary approach for neural architecture search (NAS).