grep {base} 系列
grep 系列返回符合正则条件的元素在向量中位置、本身、或者逻辑值。
grep("a",c("a1","a2","b1","b2"))
## [1] 1 2
grep("a",c("a1","a2","b1","b2"),value = T)
## [1] "a1" "a2"
grepl("a",c("a1","a2","b1","b2"))
## [1] TRUE TRUE FALSE FALSE
sub 替换第一次匹配的元素,gsub是贪婪模式,替换所有匹配到的。
sub("a",replacement = "A",x=c("a1a","a2","b1","b2"))
## [1] "A1a" "A2" "b1" "b2"
gsub("a",replacement = "A",x=c("a1a","a2","b1","b2"))
## [1] "A1A" "A2" "b1" "b2"
regexpr 和 gregexpr 不是返回在向量中的位置,而是分别返回在每个元素中的位置。
regexpr("a",c("1aa","12aba","123abca"))
## [1] 2 3 4
## attr(,"match.length")
## [1] 1 1 1
## attr(,"index.type")
## [1] "chars"
## attr(,"useBytes")
## [1] TRUE
gregexpr("a",c("1aa","12aba","123abca"))
# [[1]]
# [1] 2 3
# attr(,"match.length")
# [1] 1 1
# attr(,"index.type")
# [1] "chars"
# attr(,"useBytes")
# [1] TRUE
#
# [[2]]
# [1] 3 5
# attr(,"match.length")
# [1] 1 1
# attr(,"index.type")
# [1] "chars"
# attr(,"useBytes")
# [1] TRUE
#
# [[3]]
# [1] 4 7
# attr(,"match.length")
# [1] 1 1
# attr(,"index.type")
# [1] "chars"
# attr(,"useBytes")
# [1] TRUE
substr {base} 系列
提取或者替换元素中起始位置之间的内容。
x <- c("abc123","abc456","abc789")
substr(x, start=2, stop=4)
# [1] "bc1" "bc4" "bc7"
substring(x, first=2) # stop = 1000000L
# [1] "bc123" "bc456" "bc789"
substr(x, start=2, stop=4) <- "***"
x
# [1] "a***23" "a***56" "a***89"
paste() 和 strsplit()
粘合和分割字符串
paste("a","b",sep="-")
# [1] "a-b"
strsplit(c("a-b","c-d"),split="-")
# [[1]]
# [1] "a" "b"
#
# [[2]]
# [1] "c" "d"
学了这些发现提取特定模式里的字符串用基础函数还是很麻烦,还是去学习stringr包吧。