Certain Point Mutations are More Common
Point mutations occurring in DNA can be divided into two types: transitions and transversions. A transition substitutes one purine for another (A↔G) or one pyrimidine for another (C↔T); that is, a transition does not change the structure of the nucleobase. Conversely, a transversion is the interchange of a purine for a pyrimidine base, or vice-versa. See Figure 1. Transitions and transversions can be defined analogously for RNA mutations. | Figure 1. Illustration of transitions and transversions. |
---|---|
Because transversions require a more drastic change to the base’s chemical structure, they are less common than transitions. Across the entire genome, the ratio of transitions to transversions is on average about 2. However, in coding regions, this ratio is typically higher (often exceeding 3) because a transition appearing in coding regions happens to be less likely to change the encoded amino acid, particularly when the substituted base is the third member of a codon (feel free to verify this fact using the DNA codon table). Such a substitution, in which the organism’s protein makeup is unaffected, is known as a silent substitution.
Because of its potential for identifying coding DNA, the ratio of transitions to transversions between two strands of DNA offers a quick and useful statistic for analyzing genomes.
Problem
For DNA strings s1 and s having the same length, their transition/transversion ratio R(s1,s2) is the ratio of the total number of transitions to the total number of transversions, where symbol substitutions are inferred from mismatched corresponding symbols as when calculating Hamming distance (see “Counting Point Mutations”).
Given: Two DNA strings s1 and s2 of equal length (at most 1 kbp).
Return: The transition/transversion ratio R(s1,s2).
Sample Dataset
Rosalind_0209
GCAACGCACAACGAAAACCCTTAGGGACTGGATTATTTCGTGATCGTTGTAGTTATTGGA
AGTACGGGCATCAACCCAGTT
>Rosalind_2200
TTATCTGACAAAGAAAGCCGTCAACGGCTGGATAATTTCGCGATCGTGCTGGTTACTGGC
GGTACGAGTGTTCCTTTGGGTSample Output
Solution:
本题就是两个生物学概念:
1. transition:嘌呤或者嘧啶之间互换。例如A<->G
,C<->T
1. transversion:嘌呤与嘧啶之间互换。例如{A, G}<->{C, T}
然后让求
from typing import List
class Solution:
def transitionTransversion(self, s: str, t: str) -> float:
transitions = transversions = 0
d = {'A': 1, 'G': 1, 'C': -1, 'T': -1}
for base1, base2 in zip(s, t):
if base1 != base2:
if d[base1] + d[base2] == 0: # transversions
transversions += 1
else:
transitions += 1
return f'{transitions / transversions:.11f}' # 11 位小数
seqs = """
>Rosalind_0209
GCAACGCACAACGAAAACCCTTAGGGACTGGATTATTTCGTGATCGTTGTAGTTATTGGA
AGTACGGGCATCAACCCAGTT
>Rosalind_2200
TTATCTGACAAAGAAAGCCGTCAACGGCTGGATAATTTCGCGATCGTGCTGGTTACTGGC
GGTACGAGTGTTCCTTTGGGT
"""
import re
s, t = (seq.replace('\n', '') for seq in re.split(r'>.*', seqs) if seq.replace('\n', '')) # 第一个是 `\n`
print(Solution().transitionTransversion(s, t))