主要学习stringr包的用法。
构建字符串
安装stringr包,构建示例字符串。
> rm(list = ls())> if(!require(stringr))install.packages('stringr')> library(stringr)> x <- "The birch canoe slid on the smooth planks."> x[1] "The birch canoe slid on the smooth planks."
检测长度
主要是str_length()函数。
> str_length(x) #空格和符号也计算在内[1] 42> length(x)[1] 1
字符串拆分与组合
主要是str_split()函数,拆分后可以取子集。
> x[1] "The birch canoe slid on the smooth planks."> str_split(x," ")[[1]][1] "The" "birch" "canoe" "slid" "on" "the" "smooth" "planks."> x2 = str_split(x," ")[[1]]> x2[1] "The" "birch" "canoe" "slid" "on" "the" "smooth" "planks."
注意simplify参数的用法,可以形成一个矩阵。
> y = c("jimmy 150","nicker 140","tony 152")> str_split(y," ")[[1]][1] "jimmy" "150"[[2]][1] "nicker" "140"[[3]][1] "tony" "152"> str_split(y," ",simplify = T)[,1] [,2][1,] "jimmy" "150"[2,] "nicker" "140"[3,] "tony" "152"
str_c()函数可以实现字符串的连接。
collapse参数可以作为分隔符,sep是把向量中的每一个元素都与另一字符串相连。
> x2[1] "The" "birch" "canoe" "slid" "on" "the" "smooth" "planks."> str_c(x2,collapse = " ")[1] "The birch canoe slid on the smooth planks."> str_c(x2,1234,sep = "+")[1] "The+1234" "birch+1234" "canoe+1234" "slid+1234" "on+1234"[6] "the+1234" "smooth+1234" "planks.+1234"
提取字符串的一部分
str_sub()函数可以从起始位置到终止位置取字符。
> x[1] "The birch canoe slid on the smooth planks."> str_sub(x,5,9)[1] "birch"
字符定位
str_locate()在向量中定位需要检测的字符。
> x2[1] "The" "birch" "canoe" "slid" "on" "the" "smooth" "planks."> str_locate(x2,"th")start end[1,] NA NA[2,] NA NA[3,] NA NA[4,] NA NA[5,] NA NA[6,] 1 2[7,] 5 6[8,] NA NA> str_locate(x2,"h")start end[1,] 2 2[2,] 5 5[3,] NA NA[4,] NA NA[5,] NA NA[6,] 2 2[7,] 6 6[8,] NA NA
字符检测
str_detect()检测向量中每个字符串是否含有待检测的字符。
> x2[1] "The" "birch" "canoe" "slid" "on" "the" "smooth" "planks."> str_detect(x2,"h")[1] TRUE TRUE FALSE FALSE FALSE TRUE TRUE FALSE> ###与sum和mean连用,可以统计匹配的个数和比例> sum(str_detect(x2,"h"))[1] 4> mean(str_detect(x2,"h"))[1] 0.5
字符替换
> x2[1] "The" "birch" "canoe" "slid" "on" "the" "smooth" "planks."> str_replace(x2,"o","A")[1] "The" "birch" "canAe" "slid" "An" "the" "smAoth" "planks."> str_replace_all(x2,"o","A")[1] "The" "birch" "canAe" "slid" "An" "the" "smAAth" "planks."
提取匹配到的字符
> x2[1] "The" "birch" "canoe" "slid" "on" "the" "smooth" "planks."> str_extract(x2,"o|e")[1] "e" NA "o" NA "o" "e" "o" NA> str_extract_all(x2,"o|e")[[1]][1] "e"[[2]]character(0)[[3]][1] "o" "e"[[4]]character(0)[[5]][1] "o"[[6]][1] "e"[[7]][1] "o" "o"[[8]]character(0)> str_extract_all(x2,"o|e",simplify = T)[,1] [,2][1,] "e" ""[2,] "" ""[3,] "o" "e"[4,] "" ""[5,] "o" ""[6,] "e" ""[7,] "o" "o"[8,] "" ""
字符删除
> x[1] "The birch canoe slid on the smooth planks."> str_remove(x," ")[1] "Thebirch canoe slid on the smooth planks."> str_remove_all(x," ")[1] "Thebirchcanoeslidonthesmoothplanks."
