Coding style and output format—small tricks in R

2020-12-14
4 min read
R

Some one write their code causually which leads to the difficulties to read. And it may be inevitable to be lazy in programming and not paying attention to code specifications. That will cause some inconvenience to no matter others read our code or ourselves check our own work.

There are some coding style guide such as Google’s R style guide or Style guide in \(Advanced\) \(R\) by Hadley Wickham. In simple terms, R style guide includes Naming and Syntax. You can read the two documents by the links here. But what I focus on is how to obey the rules when I’m annoyed by programming progamming. These two packages may help.

styler

This package follows the tidyverse rules and can be installed by:

install.packages("styler")

Or install the development version:

install.packages("remotes")
remotes::install_github("r-lib/styler")

Please notice, when the first time to use, it’s better to authorize the R.cache to set your file. After, you can use it through be button of \(Addins\) in the top sidebar. Just click on Style active file, then the file will be formatted into tidyverse style.

formatR

The other useful package is \(formatR\) by Yihui. It has an offical document formatR Format R code automatically. You can get detailed information throught the link. Here we just simply introduce the function to format R code.

library(formatR)
tidy_source("X.R")

Input the R file you want to format, and run the tidy_source function.

There also are some function such as tidy_shiny for shiny, tidy_dir for all files under working directory. But the another function I want to perform is tidy_eval, which can be used to organize the output.

Here is the example to explain the use. If you don’t use the function:

set.seed(123)
text = c("a<-1+1;a  # print the value", "matrix(rnorm(10),5)")
text
## [1] "a<-1+1;a  # print the value" "matrix(rnorm(10),5)"

It just plays the text.

library(formatR)
set.seed(123)
tidy_eval(text = c("a<-1+1;a  # print the value", "matrix(rnorm(10),5)"))
## a <- 1 + 1
## a  # print the value
## ## [1] 2
## 
## matrix(rnorm(10), 5)
## ##             [,1]       [,2]
## ## [1,] -0.56047565  1.7150650
## ## [2,] -0.23017749  0.4609162
## ## [3,]  1.55870831 -1.2650612
## ## [4,]  0.07050839 -0.6868529
## ## [5,]  0.12928774 -0.4456620

While using the funtion, you can get an output, whose result combine the two code paragragh.

broom

The other package for output is broom. As we all known, the summary result of regression, anova and some other models are not neat. It means that we can not obtain a table that can be write down as a csv file or some else useful format. The \(broom\) package can solve the problem. And there are two main functions.

fit <- lm(mpg~wt,mtcars)
summary(fit)
## 
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5432 -2.3647 -0.1252  1.4096  6.8727 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
## wt           -5.3445     0.5591  -9.559 1.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7446 
## F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10

Then you can obtain the clean format:

library(broom)
tidy(fit)
## # A tibble: 2 x 5
##   term        estimate std.error statistic  p.value
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)    37.3      1.88      19.9  8.24e-19
## 2 wt             -5.34     0.559     -9.56 1.29e-10

Much better, isn’t it?

As the name, glance is usually applied to check the overall situation of models.

glance(fit)
## # A tibble: 1 x 12
##   r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
##       <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
## 1     0.753         0.745  3.05      91.4 1.29e-10     1  -80.0  166.  170.
## # ... with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

In addition, there are alse some else packages to make it easy to organize the output format such as xtable and so on. But it is mainly for the Latex, which is not today’s point. I will introduce the format in Markdown and Latex Specially.