grep {base} 系列

grep 系列返回符合正则条件的元素在向量中位置、本身、或者逻辑值。

  1. grep("a",c("a1","a2","b1","b2"))
  2. ## [1] 1 2
  3. grep("a",c("a1","a2","b1","b2"),value = T)
  4. ## [1] "a1" "a2"
  5. grepl("a",c("a1","a2","b1","b2"))
  6. ## [1] TRUE TRUE FALSE FALSE

sub 替换第一次匹配的元素,gsub是贪婪模式,替换所有匹配到的。

  1. sub("a",replacement = "A",x=c("a1a","a2","b1","b2"))
  2. ## [1] "A1a" "A2" "b1" "b2"
  3. gsub("a",replacement = "A",x=c("a1a","a2","b1","b2"))
  4. ## [1] "A1A" "A2" "b1" "b2"

regexpr 和 gregexpr 不是返回在向量中的位置,而是分别返回在每个元素中的位置。

  1. regexpr("a",c("1aa","12aba","123abca"))
  2. ## [1] 2 3 4
  3. ## attr(,"match.length")
  4. ## [1] 1 1 1
  5. ## attr(,"index.type")
  6. ## [1] "chars"
  7. ## attr(,"useBytes")
  8. ## [1] TRUE
  9. gregexpr("a",c("1aa","12aba","123abca"))
  10. # [[1]]
  11. # [1] 2 3
  12. # attr(,"match.length")
  13. # [1] 1 1
  14. # attr(,"index.type")
  15. # [1] "chars"
  16. # attr(,"useBytes")
  17. # [1] TRUE
  18. #
  19. # [[2]]
  20. # [1] 3 5
  21. # attr(,"match.length")
  22. # [1] 1 1
  23. # attr(,"index.type")
  24. # [1] "chars"
  25. # attr(,"useBytes")
  26. # [1] TRUE
  27. #
  28. # [[3]]
  29. # [1] 4 7
  30. # attr(,"match.length")
  31. # [1] 1 1
  32. # attr(,"index.type")
  33. # [1] "chars"
  34. # attr(,"useBytes")
  35. # [1] TRUE

substr {base} 系列

提取或者替换元素中起始位置之间的内容。

  1. x <- c("abc123","abc456","abc789")
  2. substr(x, start=2, stop=4)
  3. # [1] "bc1" "bc4" "bc7"
  4. substring(x, first=2) # stop = 1000000L
  5. # [1] "bc123" "bc456" "bc789"
  6. substr(x, start=2, stop=4) <- "***"
  7. x
  8. # [1] "a***23" "a***56" "a***89"

paste() 和 strsplit()

粘合和分割字符串

  1. paste("a","b",sep="-")
  2. # [1] "a-b"
  3. strsplit(c("a-b","c-d"),split="-")
  4. # [[1]]
  5. # [1] "a" "b"
  6. #
  7. # [[2]]
  8. # [1] "c" "d"

学了这些发现提取特定模式里的字符串用基础函数还是很麻烦,还是去学习stringr包吧。