An early look at Gelman et als new book, Regression and Other Stories, which is an update to their seminal 2006 book, Data Analysis Using Regression and Multilevel Hierarchical Models.
Much has changed in applied statistics since 2006 (when the book was first released). The primary software used at the time and in the book to fit Bayesian models was BUGS (Bayesian inference Using Gibbs Sampling).
However, both BUGS and some of the R code in the first edition are now outdated. The new edition updates the R code and contains intuitive instructions on how to fit simple and complex models using the probabilistic programming language, Stan (also developed by Gelman and colleagues), which is now used in several fields (even for studying wine!).
Indeed, running a Bayesian regression model in R is now as simple as
Code
# I use the sample PlantGrowth dataset in Rlibrary("rstanarm")library("bayesplot")pg <- PlantGrowthmodel1 <-stan_glm(weight ~ group, data = pg, refresh =0)summary(model1)#> #> Model Info:#> function: stan_glm#> family: gaussian [identity]#> formula: weight ~ group#> algorithm: sampling#> sample: 4000 (posterior sample size)#> priors: see help('prior_summary')#> observations: 30#> predictors: 3#> #> Estimates:#> mean sd 10% 50% 90%#> (Intercept) 5.0 0.2 4.8 5.0 5.3 #> grouptrt1 -0.4 0.3 -0.7 -0.4 0.0 #> grouptrt2 0.5 0.3 0.1 0.5 0.9 #> sigma 0.6 0.1 0.5 0.6 0.8 #> #> Fit Diagnostics:#> mean sd 10% 50% 90%#> mean_PPD 5.1 0.2 4.9 5.1 5.3 #> #> The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).#> #> MCMC diagnostics#> mcse Rhat n_eff#> (Intercept) 0.0 1.0 2534 #> grouptrt1 0.0 1.0 2503 #> grouptrt2 0.0 1.0 2719 #> sigma 0.0 1.0 3035 #> mean_PPD 0.0 1.0 3484 #> log-posterior 0.0 1.0 1681 #> #> For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).plot(model1)
Another key difference between the first edition and the new edition is that the 2006 book attempted to cover several topics at once. It contained instructions on how to fit simple models in a classical framework all the way up to multilevel models in a Bayesian framework. The new edition attempts to reduce this information overload by splitting itself into two volumes.
The first volume (Regression and Other Stories) covers fitting simple and complex models using R and Stan, and is oriented towards the applied researcher or statistician, who wants a smooth introduction to fitting Bayesian models using Stan without diving into much theory or math.
A draft copy of the table of contents in the new edition can be found here, though it’s very likely that the published edition will have some changes.
The book does not cover much of multilevel modeling, which is reserved for the second volume, Advanced Regression and Multilevel Models (planned to be released in the next year or two).
Make no mistake, although both of these books are unlikely to touch on a serious amount of theory or math, they are not books that can be read without serious engagement and practice. Every chapter contains enough math for the reader to understand the concepts being discussed with exercises at the end to solidify these concepts.
The chapter exercises are incredibly similar to these exam questions that Gelman created for his Applied Regression class.
I suspect that many of the commenters of the blog who had some difficulty with these questions would have had an easier time had they had the opportunity to read the book.
The new edition also covers several news stories from the past few years (some of which long-time blog readers will be familiar with) and gives readers a set of tools to think critically about these stories and how proper statistical thinking could’ve prevented mishaps. In addition, it incorporates concepts that Gelman and colleagues have developed and solidified over the years, since the first edition was published, such as the concept of Type-M and Type-S errors.
Overall, the book is quite comprehensive and will leave the reader with a rich set of tools to think critically about statistics and to fit models in the real world. I look forward to grabbing a hard copy once the book is out, which seems to be in the summer to fall of 2020.
---layout: posttitle: "Book Review: Regression and Other Stories by Gelman, Hill, and Vehtari"author: 'Zad Rafi'date: 2019-06-11lastmod: 2020-08-09description: An early look at Gelman et als new book, Regression and Other Stories, which is an update to their seminal 2006 book, Data Analysis Using Regression and Multilevel Hierarchical Models.archives: statisticsslug: regression-storiesurl: statistics/regression-storiesimage: https://res.cloudinary.com/less-likely/image/upload/f_auto,q_auto/v1587793393/Site/gelman_raos_2020.jpgog_image: https://res.cloudinary.com/less-likely/image/upload/f_auto,q_auto/v1587793393/Site/gelman_raos_2020.jpgzotero: true codefolding_show: "show" tags:- statistics- books- review keywords:- classical statistics- bayesian statistics output: blogdown::html_page: toc: true---------------------------------------------------------------------------Over a decade ago, [AndrewGelman](https://statmodeling.stat.columbia.edu/) and [JenniferHill](https://steinhardt.nyu.edu/faculty/Jennifer_L_Hill) gave appliedresearchers a comprehensive book *([Data Analysis Using Regression andMultilevel/HierarchicalModels](https://www.amazon.com/Analysis-Regression-Multilevel-Hierarchical-Models/dp/052168689X))*on fitting simple and complex statistical models in `R` both from aclassical framework and a Bayesian one. Now they’re back with an updatedversion and a new author ([Aki Vehtari](https://users.aalto.fi/~ave/)).Much has changed in applied statistics since 2006 (when the book wasfirst released). The primary software used at the time and in the bookto fit Bayesian models was `BUGS` (Bayesian inference Using GibbsSampling).However, both `BUGS` and some of the `R` code in the first edition arenow outdated. The new edition updates the `R` code and containsintuitive instructions on how to fit simple and complex models using theprobabilistic programming language, [`Stan`](https://mc-stan.org/) (alsodeveloped by Gelman and [colleagues](https://mc-stan.org/about/team/)),which is now used in several fields (even for [studyingwine!](https://statmodeling.stat.columbia.edu/2019/04/24/postdoctoral-position-in-vancouver-using-stan-working-on-wine-for-reals/)).Indeed, running a Bayesian regression model in `R` is now as simple as------------------------------------------------------------------------```{r include=FALSE}knitr::opts_chunk$set(collapse =TRUE,dev =c('svglite'),numberLines =TRUE,comment ="#>",echo =TRUE)library(brms)library(ggplot2)library(rstan)library(bayesplot)``````{r message=FALSE, warning=FALSE}# I use the sample PlantGrowth dataset in Rlibrary("rstanarm")library("bayesplot")pg <- PlantGrowthmodel1 <-stan_glm(weight ~ group, data = pg, refresh =0)summary(model1)plot(model1)```------------------------------------------------------------------------Another key difference between the first edition and the new edition isthat the 2006 book attempted to cover several topics at once. Itcontained instructions on how to fit simple models in a classicalframework all the way up to multilevel models in a Bayesian framework.The new edition attempts to reduce this information overload bysplitting itself into two volumes.The first volume *(Regression and Other Stories)* covers fitting simpleand complex models using `R` and `Stan`, and is oriented towards theapplied researcher or statistician, who wants a smooth introduction tofitting Bayesian models using `Stan` without diving into much theory ormath.A draft copy of the table of contents in the new edition can be [foundhere](/uploads/regressiontoc.pdf), though it's very likely that thepublished edition will have some changes.The book does not cover much of multilevel modeling, which is reservedfor the second volume, *Advanced Regression and Multilevel Models*(planned to be released in the next year or two).Make no mistake, although both of these books are unlikely to touch on aserious amount of theory or math, they are not books that can be readwithout serious engagement and practice. Every chapter contains enoughmath for the reader to understand the concepts being discussed withexercises at the end to solidify these concepts.The chapter exercises are incredibly similar to these exam questionsthat Gelman created for his Applied Regression class.Question:[1](https://statmodeling.stat.columbia.edu/2019/06/01/question-1-of-our-applied-regression-final-exam/),[2](https://statmodeling.stat.columbia.edu/2019/06/02/question-2-of-our-applied-regression-final-exam-and-solution-to-question-1/),[3](https://statmodeling.stat.columbia.edu/2019/06/03/question-3-of-our-applied-regression-final-exam-and-solution-to-question-2/),[4](https://statmodeling.stat.columbia.edu/2019/06/04/question-4-of-our-applied-regression-final-exam-and-solution-to-question-3/),[5](https://statmodeling.stat.columbia.edu/2019/06/05/question-5-of-our-applied-regression-final-exam-and-solution-to-question-4/),[6](https://statmodeling.stat.columbia.edu/2019/06/06/question-6-of-our-applied-regression-final-exam-and-solution-to-question-5/),[7](https://statmodeling.stat.columbia.edu/2019/06/07/question-7-of-our-applied-regression-final-exam-and-solution-to-question-6/),[8](https://statmodeling.stat.columbia.edu/2019/06/08/question-8-of-our-applied-regression-final-exam-and-solution-to-question-7/),[9](https://statmodeling.stat.columbia.edu/2019/06/09/question-9-of-our-applied-regression-final-exam-and-solution-to-question-8/),[10](https://statmodeling.stat.columbia.edu/2019/06/10/question-10-of-our-applied-regression-final-exam-and-solution-to-question-9/),[11](https://statmodeling.stat.columbia.edu/2019/06/11/question-11-of-our-applied-regression-final-exam-and-solution-to-question-10/),[12](https://statmodeling.stat.columbia.edu/2019/06/12/question-12-of-our-applied-regression-final-exam-and-solution-to-question-11/),[13](https://statmodeling.stat.columbia.edu/2019/06/13/question-13-of-our-applied-regression-final-exam-and-solution-to-question-12/),[14](https://statmodeling.stat.columbia.edu/2019/06/14/question-14-of-our-applied-regression-final-exam-and-solution-to-question-13/),[15](https://statmodeling.stat.columbia.edu/2019/06/15/question-15-of-our-applied-regression-final-exam-and-solution-to-question-14/),[solution to15](https://statmodeling.stat.columbia.edu/2019/06/16/were-done-with-our-applied-regression-final-exam-and-solution-to-question-15/).I suspect that many of the commenters of the blog who had somedifficulty with these questions would have had an easier time had theyhad the opportunity to read the book.The new edition also covers several news stories from the past few years(some of which long-time blog readers will be familiar with) and givesreaders a set of tools to think critically about these stories and howproper statistical thinking could’ve prevented mishaps. In addition, itincorporates concepts that Gelman and colleagues have developed andsolidified over the years, since the first edition was published, suchas the concept of [Type-M and Type-Serrors](https://doi.org/10.1177%2F1745691614551642).Overall, the book is quite comprehensive and will leave the reader witha rich set of tools to think critically about statistics and to fitmodels in the real world. I look forward to grabbing a hard copy oncethe book is out, which seems to be in the [summer to fall of2020](https://twitter.com/avehtari/status/1251188631487774720).------------------------------------------------------------------------> Update: Looks like the [book is> out!](https://www.amazon.com/Regression-Stories-Analytical-Methods-Research/dp/110702398X)