Statistics for data scientist

Materials accompanying the book “Statistics for Data Scientists” by Maurits Kaptein & Edwin van den Heuvel.

The book appeared early 2022. For more information, or purchasing or downloading the book, please see the book’s page on Springerlink.

Data sets:

The following data sets are used in the book and are available for download:

The data files are also available in a single .zip file.

Replication [R] scripts:

We heavily use [R] throughout the book. All the R code used in the book can be downloaded here.

Note: The [R] scripts contain materials for the original 9 Chapters of the book (see also video lectures below) and thus contain material on modeling that did not make it into the final version book; we hope these are helpful.

Answers to the assignments

You can find answers to most of the assignments in the Answer manual created by Florian Boeing Messing which can be downloaded here.

Note: The answer manual was created based on a preliminary version of the book and some of the question order has moved since; please do reach out if you are missing any of the answers.

Videos / lectures

During 2020 and 2021 we used parts of the book in a lecture series. We recorded simple videos to accompany the book’s materials:


If you find any errors or have any questions regarding the book, please send us an email at m.c.kaptein [at] uvt [dot] nl.