使用 wakefield 生成数据，tableone 做基线表（baseline） - 《上校的猫-生信分享》

总体比较简单，要注意变量类型，是连续型变量（continous variables）还是分类变量（categorical variables）。其次注意变量的分布情况，连续型变量是否符合正态分布，样本量是不是太少，最终选择不同的检验方法。

As you can see in the previous table, when there are two or more groups group comparison p-values are printed along with the table (well, let’s not argue the appropriateness of hypothesis testing for table 1 in an RCT for now.). Very small p-values are shown with the less than sign. The hypothesis test functions used by default are chisq.test() for categorical variables (with continuity correction) and oneway.test() for continous variables (with equal variance assumption, i.e., regular ANOVA). Two-group ANOVA is equivalent of t-test.

You may be worried about the nonnormal variables and small cell counts in the stage variable. In such a situation, you can use the nonnormal argument like before as well as the exact (test) argument in the print() method. Now kruskal.test() is used for the nonnormal continous variables and fisher.test() is used for categorical variables specified in the exact argument. kruskal.test() is equivalent to wilcox.test() in the two-group case. The column named test is to indicate which p-values were calculated using the non-default tests.

library(wakefield)
dat1 <- r_data_frame(100,
                     age(x=20:80),
                     sex(prob = c(0.8,0.2)),
                     smokes,
                     income,
                     animal,
                     likert(x=c("group1"),prob=c(1),name = "group")
                     )
dat2 <- r_data_frame(100,
                     age(x=30:100),
                     sex(prob = c(0.5,0.5)),
                     smokes,
                     income,
                     animal,
                     likert(x=c("group2"),prob=c(1),name = "group")
                     )
dat <- rbind(dat1,dat2)
summary(dat)
dput(names(dat))
a=CreateTableOne(vars=c("Age", "Sex", "Smokes", "Income"), 
                 data = dat,
                 strata="group", 
                 factorVars=c("Sex","Smokes")) 
?print.TableOne
summary(a)
print(a,showAllLevels = TRUE) 
print(a, nonnormal = c("Income"),
      exact =c("Sex"),
      smd=T) 
a_csv<- print(a, nonnormal = c("Income"),
              exact =c("Sex"),
              smd=T, 
              showAllLevels = TRUE,
              quote = FALSE, 
              noSpaces = TRUE, 
              printToggle = FALSE)
library("knitr")
kable(a_csv,  
      align = 'c', 
      caption = 'Table 1: Comparison of unmatched samples')
write.csv(a_csv, file = "myTable.csv")

	level	group1	group2	p	test	SMD
n		100	100
Age (mean (SD))		47.54 (18.05)	64.28 (19.30)	<0.001		0.896
Sex (%)	Male	76 (76.0)	57 (57.0)	0.007	exact	0.411
	Female	24 (24.0)	43 (43.0)
Smokes (%)	FALSE	86 (86.0)	79 (79.0)	0.264		0.185
	TRUE	14 (14.0)	21 (21.0)
Income (median [IQR])		33853.50 [23196.55, 53600.29]	31957.53 [16484.16, 53772.67]	0.320	nonnorm	0.046