题目

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: “ACGAATTCCG”. When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

Example:

  1. Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"
  2. Output: ["AAAAACCCCC", "CCCCCAAAAA"]

题意

给定一个字符串s,找出其中所有出现至少2次的长度为10的子串。

思路

比较直接的方法是使用两个HashSet去处理,一个保存已经遍历过的子串,另一个保存答案子串。

在此基础上可以使用位运算进行优化。分别用二进制的00、01、10、11来表示’A’、’C’、’G’、’T’,则一个长度为10的字符串就可以用一个长度为20的二进制数字来表示,每一次获取新的子串只需要将原来的二进制串左移2位,并将最低的两位换成新加入的字符,类似于滑动窗口的操作。其他步骤与HashSet方法相同。


代码实现

Java

HashSet

  1. class Solution {
  2. public List<String> findRepeatedDnaSequences(String s) {
  3. Set<String> one = new HashSet<>();
  4. Set<String> two = new HashSet<>();
  5. for (int i = 0; i < s.length() - 9; i++) {
  6. String t = s.substring(i, i + 10);
  7. if (two.contains(t)) {
  8. continue;
  9. } else if (one.contains(t)) {
  10. two.add(t);
  11. } else {
  12. one.add(t);
  13. }
  14. }
  15. return new ArrayList<>(two);
  16. }
  17. }

位运算优化

  1. class Solution {
  2. public List<String> findRepeatedDnaSequences(String s) {
  3. if (s.length() < 10) {
  4. return new ArrayList<>();
  5. }
  6. Set<String> two = new HashSet<>();
  7. Set<Integer> one = new HashSet<>(); // key类型换成整数
  8. int[] hash = new int[26];
  9. hash['A' - 'A'] = 0;
  10. hash['C' - 'A'] = 1;
  11. hash['G' - 'A'] = 2;
  12. hash['T' - 'A'] = 3;
  13. int cur = 0;
  14. // 创建初始的长度为9的子串
  15. for (int i = 0; i < 9; i++) {
  16. cur = cur << 2 | hash[s.charAt(i) - 'A'];
  17. }
  18. for (int i = 9; i < s.length(); i++) {
  19. // 每次只需要保留低20位
  20. cur = cur << 2 & 0xfffff | hash[s.charAt(i) - 'A'];
  21. if (one.contains(cur)) {
  22. two.add(s.substring(i - 9, i + 1));
  23. } else {
  24. one.add(cur);
  25. }
  26. }
  27. return new ArrayList<>(two);
  28. }
  29. }

JavaScript

  1. /**
  2. * @param {string} s
  3. * @return {string[]}
  4. */
  5. var findRepeatedDnaSequences = function (s) {
  6. let set1 = new Set()
  7. let set2 = new Set()
  8. for (let i = 0; i + 10 <= s.length; i++) {
  9. let t = s.slice(i, i + 10)
  10. if (set1.has(t)) {
  11. continue
  12. } else if (set2.has(t)) {
  13. set1.add(t)
  14. } else {
  15. set2.add(t)
  16. }
  17. }
  18. return Array.from(set1)
  19. }