Problem

  1. Problem
  2. Given two strings s and t, t is a substring of s if t is contained as a contiguous collection of symbols in s (as a result, t must be no longer than s).
  3. The position of a symbol in a string is the total number of symbols found to its left, including itself (e.g., the positions of all occurrences of 'U' in "AUGCUUCAGAAAGGUCUUACG" are 2, 5, 6, 15, 17, and 18). The symbol at position i of s is denoted by s[i].
  4. A substring of s can be represented as s[j:k], where j and k represent the starting and ending positions of the substring in s; for example, if s = "AUGCUUCAGAAAGGUCUUACG", then s[2:5] = "UGCU".
  5. The location of a substring s[j:k] is its beginning position j; note that t will have multiple locations in s if it occurs more than once as a substring of s (see the Sample below).

确定DNA序列中给定子序列的位置信息

问题:
input: 对于给定的序列seq和字串target_seq
output: target_seqseq的位置

  1. input:
  2. seq = GATATATGCATATACTT
  3. target_seq = ATAT
  4. output: 2, 4, 10

思路:

  • 利用暴力搜索或者其他更快的算法实现,如kmp

    solution -1

    暴力搜索算法 ```python def motif_search_BF(seq, target_seq, start_base = 1): loc_all = [] for i in range(len(seq) - len(target_seq) + 1):
    1. if target_seq == seq[i:i + len(target_seq)]:
    2. loc_all.append(i)
    loc_all = list(map(lambda x: x + start_base, loc_all)) return loc_all

def main(): seq = “GATATATGCATATACTT” target_seq = “ATAT” motif_search_BF(seq, target_seq)

if name == ‘main‘: main()

  1. <a name="vYyPi"></a>
  2. ## solution-2
  3. `kmp`算法
  4. ```python
  5. import numpy as np
  6. def motif_search_KMP(t, p, start_base = 1):
  7. """
  8. t: target
  9. p: pattern
  10. """
  11. #return all matching positions of p in t
  12. next = [0]
  13. j = 0
  14. for i in range(1, len(p)):
  15. while j > 0 and p[j] != p[i]:
  16. j = next[j - 1]
  17. if p[j] == p[i]:
  18. j += 1
  19. next.append(j)
  20. # the search part and build part is almost identical.
  21. ans = []
  22. j = 0
  23. for i in range(len(t)):
  24. while j > 0 and t[i] != p[j]:
  25. j = next[j - 1]
  26. if t[i] == p[j]:
  27. j += 1
  28. if j == len(p):
  29. ans.append(i - (j - 1))
  30. j = next[j - 1]
  31. if list != []:
  32. ans = list(np.array(ans) + start_base)
  33. return ans
  34. def main():
  35. seq = "GATATATGCATATACTT"
  36. target_seq = "ATAT"
  37. motif_search_KMP(seq, target_seq)
  38. if __name__ == '__main__':
  39. main()