Teaching precursors to data science in introductory and second courses in statistics

14 Jan 2014  ·  Nicholas J Horton, Benjamin S Baumer, Hadley Wickham ·

Statistics students need to develop the capacity to make sense of the staggering amount of information collected in our increasingly data-centered world. Data science is an important part of modern statistics, but our introductory and second statistics courses often neglect this fact. This paper discusses ways to provide a practical foundation for students to learn to "compute with data" as defined by Nolan and Temple Lang (2010), as well as develop "data habits of mind" (Finzer, 2013). We describe how introductory and second courses can integrate two key precursors to data science: the use of reproducible analysis tools and access to large databases. By introducing students to commonplace tools for data management, visualization, and reproducible analysis in data science and applying these to real-world scenarios, we prepare them to think statistically in the era of big data.

PDF Abstract

Categories


Computation Computers and Society Other Statistics 62-07

Datasets


  Add Datasets introduced or used in this paper