- Generate the k-mer Composition of a String">Generate the k-mer Composition of a String
- Reconstruct a String from its Genome Path">Reconstruct a String from its Genome Path
- Construct the Overlap Graph of a Collection of k-mers">Construct the Overlap Graph of a Collection of k-mers
- Construct the De Bruijn Graph of a String">Construct the De Bruijn Graph of a String
- Find an Eulerian Cycle in a Graph">Find an Eulerian Cycle in a Graph
Generate the k-mer Composition of a String
1、所有k-mer子串排序
from typing import List
def StringComposition(k: int, text: str) -> List[str]:
kmers = []
for i in range(len(text) - k + 1):
kmers.append(text[i:i+k])
kmers.sort()
return kmers
# return sorted(text[i:i+k] for i in range(len(text) - k + 1))
s = """5
CAATCCAAC
"""
k, text = s.splitlines()
print(*StringComposition(int(k), text), sep="\n")
复杂度分析:设 text 长度为 n
Reconstruct a String from its Genome Path
1、遍历图
from typing import List
def StringSpelled2GenomePath(patterns: List[str]) -> str:
genome = patterns[0]
for i in range(1, len(patterns)):
genome += patterns[i][-1]
return genome
s = """ACCGA
CCGAA
CGAAG
GAAGC
AAGCT
"""
patterns = s.splitlines()
print(StringSpelled2GenomePath(patterns), sep="\n")
Construct the Overlap Graph of a Collection of k-mers
1、O(n^2)建有向图
本题所要求的两点之间有一条有向边的条件是
。但是现在不清楚两个节点之间关系,因此我们需要枚举所有
对,最暴力方式是双层循环
最后需要注意答案是以邻接表实现,也就是最终答案是以有向图的起点升序。
from typing import List, Tuple
def OverlapGraph(patterns: List[str]) -> List[Tuple[str]]:
graph = []
n = len(patterns)
for i in range(n):
for j in range(i + 1, n):
u, v = patterns[i], patterns[j]
# u -> v
if u[1:] == v[:-1]:
graph.append((u, v))
# v -> u
if v[1:] == u[:-1]:
graph.append((v, u))
# adjacency list
graph.sort()
return graph
s = """ATGCG
GCATG
CATGC
AGGCA
GGCAT
"""
patterns = s.splitlines()
for u, v in OverlapGraph(patterns):
print(f"{u} -> {v}")
复杂度分析:设 patterns 长度为 n,graph 长度为 n^2
- 时间复杂度:
,对 n^2 个元素排序。
- 空间复杂度:
,邻接表
Construct the De Bruijn Graph of a String
1、从Text串获取k-mer串建De Bruijn图
```python from typing import List, Tuple from collections import defaultdict
def deBruijnGraph(k: int, text: str) -> List[Tuple[str, List[str]]]: graph = defaultdict(list) # 边个数不变,因此list
# 取k-mer前k-1个字符作节点,边是相邻k-mer的overlap
for i in range(len(text) - k + 1): # 最后一个k-mer只有 k-1 个
graph[text[i:i+k-1]].append(text[i+1:i+k])
# adjacency list
return [(node, sorted(graph[node])) for node in sorted(graph)]
s = “””4 AAGATTCTCTAC””” k, text = s.splitlines() for u, v in deBruijnGraph(int(k), text): print(f”{u} -> {‘,’.join(v)}”)
<a name="dUgyG"></a>
# [Construct the De Bruijn Graph of a Collection of k-mers](https://rosalind.info/problems/ba3e/)

<a name="AWCoN"></a>
## 1、从k-mers中建De Bruijn图
由于输入都是某文本串`text`的所有`k-mer`子串,他们顺序是乱的,但是现在要从它们之中构建出De Bruijn图<br />节点:所有串的前缀和后缀构成的无重复集合。<br />边:长度为`k`的子串`s`的前缀`s[0,n-2]`和后缀`s[1,n-1]`之间构成一条边。
```python
from typing import List, Tuple
from collections import defaultdict
def deBruijnGraphFromKmers(patterns: List[str]) -> List[Tuple[str, List[str]]]:
graph = defaultdict(list)
for pattern in patterns:
graph[pattern[:-1]].append(pattern[1:])
# adjacency list
return [(node, sorted(graph[node])) for node in sorted(graph)]
s = """GAGG
CAGG
GGGG
GGGA
CAGG
AGGG
GGAG"""
patterns = s.splitlines()
for u, v in deBruijnGraphFromKmers(patterns):
print(f"{u} -> {','.join(v)}")
Find an Eulerian Cycle in a Graph
6->8->7->9->6->5->4->2->1->0->3->2->6