Point clouds are a key modality used for perception in autonomous vehicles, providing the means for a robust geometric understanding of the surrounding environment. However despite the sensor outputs from autonomous vehicles being naturally temporal in nature, there is still limited exploration of exploiting point cloud sequences for 3D seman-tic segmentation. In this paper we propose a novel Sparse Temporal Local Attention (STELA) module which aggregates intermediate features from a local neighbourhood in previous point cloud frames to provide a rich temporal context to the decoder. Using the sparse local neighbourhood enables our approach to gather features more flexibly than those which directly match point features, and more efficiently than those which perform expensive global attention over the whole point cloud frame. We achieve a competitive mIoU of 64.3% on the SemanticKitti dataset, and demonstrate significant improvement over the single-frame baseline in our ablation studies.