After the 2017 TuSimple Lane Detection Challenge, its evaluation based on accuracy and F1 score has become the de facto standard to measure the performance of lane detection methods. In this work, we conduct the first large-scale empirical study to evaluate the robustness of state-of-the-art lane detection methods under physical-world adversarial attacks in autonomous driving. We evaluate 4 major types of lane detection approaches with the conventional evaluation and end-to-end evaluation in autonomous driving scenarios and then discuss the security proprieties of each lane detection model. We demonstrate that the conventional evaluation fails to reflect the robustness in end-to-end autonomous driving scenarios. Our results show that the most robust model on the conventional metrics is the least robust in the end-to-end evaluation. Although the competition dataset and its metrics have played a substantial role in developing performant lane detection methods along with the rapid development of deep neural networks, the conventional evaluation is becoming obsolete and the gap between the metrics and practicality is critical. We hope that our study will help the community make further progress in building a more comprehensive framework to evaluate lane detection models.