- Generate the k-mer Composition of a String">Generate the k-mer Composition of a String
- Reconstruct a String from its Genome Path">Reconstruct a String from its Genome Path
- Construct the Overlap Graph of a Collection of k-mers">Construct the Overlap Graph of a Collection of k-mers
- Construct the De Bruijn Graph of a String">Construct the De Bruijn Graph of a String
- Find an Eulerian Cycle in a Graph">Find an Eulerian Cycle in a Graph
Generate the k-mer Composition of a String
1、所有k-mer子串排序
from typing import Listdef StringComposition(k: int, text: str) -> List[str]:kmers = []for i in range(len(text) - k + 1):kmers.append(text[i:i+k])kmers.sort()return kmers# return sorted(text[i:i+k] for i in range(len(text) - k + 1))s = """5CAATCCAAC"""k, text = s.splitlines()print(*StringComposition(int(k), text), sep="\n")
复杂度分析:设 text 长度为 n
Reconstruct a String from its Genome Path
1、遍历图
from typing import Listdef StringSpelled2GenomePath(patterns: List[str]) -> str:genome = patterns[0]for i in range(1, len(patterns)):genome += patterns[i][-1]return genomes = """ACCGACCGAACGAAGGAAGCAAGCT"""patterns = s.splitlines()print(StringSpelled2GenomePath(patterns), sep="\n")
Construct the Overlap Graph of a Collection of k-mers
1、O(n^2)建有向图
本题所要求的两点之间有一条有向边的条件是
。但是现在不清楚两个节点之间关系,因此我们需要枚举所有
对,最暴力方式是双层循环
最后需要注意答案是以邻接表实现,也就是最终答案是以有向图的起点升序。
from typing import List, Tupledef OverlapGraph(patterns: List[str]) -> List[Tuple[str]]:graph = []n = len(patterns)for i in range(n):for j in range(i + 1, n):u, v = patterns[i], patterns[j]# u -> vif u[1:] == v[:-1]:graph.append((u, v))# v -> uif v[1:] == u[:-1]:graph.append((v, u))# adjacency listgraph.sort()return graphs = """ATGCGGCATGCATGCAGGCAGGCAT"""patterns = s.splitlines()for u, v in OverlapGraph(patterns):print(f"{u} -> {v}")
复杂度分析:设 patterns 长度为 n,graph 长度为 n^2
- 时间复杂度:
,对 n^2 个元素排序。
- 空间复杂度:
,邻接表
Construct the De Bruijn Graph of a String
1、从Text串获取k-mer串建De Bruijn图
```python from typing import List, Tuple from collections import defaultdict
def deBruijnGraph(k: int, text: str) -> List[Tuple[str, List[str]]]: graph = defaultdict(list) # 边个数不变,因此list
# 取k-mer前k-1个字符作节点,边是相邻k-mer的overlapfor i in range(len(text) - k + 1): # 最后一个k-mer只有 k-1 个graph[text[i:i+k-1]].append(text[i+1:i+k])# adjacency listreturn [(node, sorted(graph[node])) for node in sorted(graph)]
s = “””4 AAGATTCTCTAC””” k, text = s.splitlines() for u, v in deBruijnGraph(int(k), text): print(f”{u} -> {‘,’.join(v)}”)
<a name="dUgyG"></a># [Construct the De Bruijn Graph of a Collection of k-mers](https://rosalind.info/problems/ba3e/)<a name="AWCoN"></a>## 1、从k-mers中建De Bruijn图由于输入都是某文本串`text`的所有`k-mer`子串,他们顺序是乱的,但是现在要从它们之中构建出De Bruijn图<br />节点:所有串的前缀和后缀构成的无重复集合。<br />边:长度为`k`的子串`s`的前缀`s[0,n-2]`和后缀`s[1,n-1]`之间构成一条边。```pythonfrom typing import List, Tuplefrom collections import defaultdictdef deBruijnGraphFromKmers(patterns: List[str]) -> List[Tuple[str, List[str]]]:graph = defaultdict(list)for pattern in patterns:graph[pattern[:-1]].append(pattern[1:])# adjacency listreturn [(node, sorted(graph[node])) for node in sorted(graph)]s = """GAGGCAGGGGGGGGGACAGGAGGGGGAG"""patterns = s.splitlines()for u, v in deBruijnGraphFromKmers(patterns):print(f"{u} -> {','.join(v)}")
Find an Eulerian Cycle in a Graph
6->8->7->9->6->5->4->2->1->0->3->2->6
