grep {base} 系列
grep 系列返回符合正则条件的元素在向量中位置、本身、或者逻辑值。
grep("a",c("a1","a2","b1","b2"))## [1] 1 2grep("a",c("a1","a2","b1","b2"),value = T)## [1] "a1" "a2"grepl("a",c("a1","a2","b1","b2"))## [1] TRUE TRUE FALSE FALSE
sub 替换第一次匹配的元素,gsub是贪婪模式,替换所有匹配到的。
sub("a",replacement = "A",x=c("a1a","a2","b1","b2"))## [1] "A1a" "A2" "b1" "b2"gsub("a",replacement = "A",x=c("a1a","a2","b1","b2"))## [1] "A1A" "A2" "b1" "b2"
regexpr 和 gregexpr 不是返回在向量中的位置,而是分别返回在每个元素中的位置。
regexpr("a",c("1aa","12aba","123abca"))## [1] 2 3 4## attr(,"match.length")## [1] 1 1 1## attr(,"index.type")## [1] "chars"## attr(,"useBytes")## [1] TRUEgregexpr("a",c("1aa","12aba","123abca"))# [[1]]# [1] 2 3# attr(,"match.length")# [1] 1 1# attr(,"index.type")# [1] "chars"# attr(,"useBytes")# [1] TRUE## [[2]]# [1] 3 5# attr(,"match.length")# [1] 1 1# attr(,"index.type")# [1] "chars"# attr(,"useBytes")# [1] TRUE## [[3]]# [1] 4 7# attr(,"match.length")# [1] 1 1# attr(,"index.type")# [1] "chars"# attr(,"useBytes")# [1] TRUE
substr {base} 系列
提取或者替换元素中起始位置之间的内容。
x <- c("abc123","abc456","abc789")substr(x, start=2, stop=4)# [1] "bc1" "bc4" "bc7"substring(x, first=2) # stop = 1000000L# [1] "bc123" "bc456" "bc789"substr(x, start=2, stop=4) <- "***"x# [1] "a***23" "a***56" "a***89"
paste() 和 strsplit()
粘合和分割字符串
paste("a","b",sep="-")# [1] "a-b"strsplit(c("a-b","c-d"),split="-")# [[1]]# [1] "a" "b"## [[2]]# [1] "c" "d"
学了这些发现提取特定模式里的字符串用基础函数还是很麻烦,还是去学习stringr包吧。
