Problem
ProblemGiven two strings s and t, t is a substring of s if t is contained as a contiguous collection of symbols in s (as a result, t must be no longer than s).The position of a symbol in a string is the total number of symbols found to its left, including itself (e.g., the positions of all occurrences of 'U' in "AUGCUUCAGAAAGGUCUUACG" are 2, 5, 6, 15, 17, and 18). The symbol at position i of s is denoted by s[i].A substring of s can be represented as s[j:k], where j and k represent the starting and ending positions of the substring in s; for example, if s = "AUGCUUCAGAAAGGUCUUACG", then s[2:5] = "UGCU".The location of a substring s[j:k] is its beginning position j; note that t will have multiple locations in s if it occurs more than once as a substring of s (see the Sample below).
确定DNA序列中给定子序列的位置信息
问题:input: 对于给定的序列seq和字串target_seqoutput: target_seq在seq的位置
input:seq = GATATATGCATATACTTtarget_seq = ATAToutput: 2, 4, 10
思路:
- 利用暴力搜索或者其他更快的算法实现,如
kmpsolution -1
暴力搜索算法 ```python def motif_search_BF(seq, target_seq, start_base = 1): loc_all = [] for i in range(len(seq) - len(target_seq) + 1):
loc_all = list(map(lambda x: x + start_base, loc_all)) return loc_allif target_seq == seq[i:i + len(target_seq)]:loc_all.append(i)
def main(): seq = “GATATATGCATATACTT” target_seq = “ATAT” motif_search_BF(seq, target_seq)
if name == ‘main‘: main()
<a name="vYyPi"></a>## solution-2`kmp`算法```pythonimport numpy as npdef motif_search_KMP(t, p, start_base = 1):"""t: targetp: pattern"""#return all matching positions of p in tnext = [0]j = 0for i in range(1, len(p)):while j > 0 and p[j] != p[i]:j = next[j - 1]if p[j] == p[i]:j += 1next.append(j)# the search part and build part is almost identical.ans = []j = 0for i in range(len(t)):while j > 0 and t[i] != p[j]:j = next[j - 1]if t[i] == p[j]:j += 1if j == len(p):ans.append(i - (j - 1))j = next[j - 1]if list != []:ans = list(np.array(ans) + start_base)return ansdef main():seq = "GATATATGCATATACTT"target_seq = "ATAT"motif_search_KMP(seq, target_seq)if __name__ == '__main__':main()
