We investigate applying convolutional neural network (CNN) architecture to facilitate aerial hyperspectral scene understanding and present a new hyperspectral dataset-AeroRIT-that is large enough for CNN training. To date the majority of hyperspectral airborne have been confined to various sub-categories of vegetation and roads and this scene introduces two new categories: buildings and cars. To the best of our knowledge, this is the first comprehensive large-scale hyperspectral scene with nearly seven million pixel annotations for identifying cars, roads, and buildings. We compare the performance of three popular architectures - SegNet, U-Net, and Res-U-Net, for scene understanding and object identification via the task of dense semantic segmentation to establish a benchmark for the scene. To further strengthen the network, we add squeeze and excitation blocks for better channel interactions and use self-supervised learning for better encoder initialization. Aerial hyperspectral image analysis has been restricted to small datasets with limited train/test splits capabilities and we believe that AeroRIT will help advance the research in the field with a more complex object distribution to perform well on. The full dataset, with flight lines in radiance and reflectance domain, is available for download at https://github.com/aneesh3108/AeroRIT. This dataset is the first step towards developing robust algorithms for hyperspectral airborne sensing that can robustly perform advanced tasks like vehicle tracking and occlusion handling.