Rearrangements Power Large-Scale Genomic Changes

Perhaps the most common type of genome rearrangement is an inversion, which flips an entire interval of DNA found on the same chromosome. As in the case of calculating Hamming distance (see “Counting Point Mutations”), we would like to determine the minimum number of inversions that have occurred on the evolutionary path between two chromosomes. To do so, we will use the model introduced in “Enumerating Gene Orders” in which a chromosome is represented by a permutation of its synteny blocks.
Problem
A reversal of a permutation creates a new permutation by inverting some interval of the permutation; (5,2,3,1,4)(5,2,3,1,4), (5,3,4,1,2)(5,3,4,1,2), and (4,1,2,3,5)(4,1,2,3,5) are all reversals of (5,3,2,1,4)(5,3,2,1,4). The reversal distance between two permutations ππ and σσ, written drev(π,σ)drev(π,σ), is the minimum number of reversals required to transform π into σ (this assumes that ππ and σσ have the same length).
Given: A collection of at most 5 pairs of permutations, all of which have length 10.
Return: The reversal distance between each permutation pair.
Sample Dataset

  1. 1 2 3 4 5 6 7 8 9 10
  2. 3 1 5 2 7 4 9 6 10 8
  3. 3 10 8 2 5 4 7 1 6 9
  4. 5 2 3 1 7 4 10 8 6 9
  5. 8 6 7 9 4 1 3 10 2 5
  6. 8 2 7 6 9 1 5 3 10 4
  7. 3 9 10 4 1 8 6 7 5 2
  8. 2 9 8 5 1 7 3 4 6 10
  9. 1 2 3 4 5 6 7 8 9 10
  10. 1 2 3 4 5 6 7 8 9 10

Sample Output
9 4 5 7 0

Hint

Don’t be afraid to try an ugly solution.

贪心错误!

题意规定 A reversal of a permutation 意思是某个全排列区间 [i, j] 的所有数字发生翻转。然后 reversal distance 定义是一个全排列 π 通过连续进行 n 次区间 [i, j] (可重叠)翻转获得另一个全排列 σ 的次数。

  1. 3 10 8 2 5 4 7 1 6 9
  2. 5 2 3 1 7 4 10 8 6 9
  3. 3 10 8 2 5 4 7 1 6 9
  4. 5 2 3 1 7 4 10 8 6 9
  5. |---| 将末尾 3 反转到头部
  6. 3 10 8 2 5 4 7 1 6 9
  7. 3 2 5 1 7 4 10 8 6 9
  8. |---------| 将末尾 10 反转到头部
  9. 3 10 8 2 5 4 7 1 6 9
  10. 3 10 4 7 1 5 2 8 6 9
  11. |---------| 将末尾 8 反转到头部
  12. 3 10 8 2 5 4 7 1 6 9
  13. 3 10 8 2 5 1 7 4 6 9
  14. |---| 将末尾 4 反转到头部
  15. 3 10 8 2 5 4 7 1 6 9
  16. 3 10 8 2 5 4 7 1 6 9

因此暴力的方法就是假设区间 Reversal Distance - 图1 内都有 Reversal Distance - 图2 成立,再依次检测 Reversal Distance - 图3 是否相等,不相等则在 Reversal Distance - 图4 中找到 Reversal Distance - 图5。由于题目给定全排列不包含重复数字且有答案,因此 Reversal Distance - 图6 是唯一存在的!因此将 Reversal Distance - 图7 进行翻转使得 Reversal Distance - 图8

  1. from typing import List
  2. class Solution:
  3. def reversalDistance(self, s: List[int], t: List[int]) -> int:
  4. distance = 0
  5. for i in range(len(s)):
  6. if s[i] != t[i]:
  7. distance += 1
  8. for j in range(i + 1, len(t)):
  9. if s[i] == t[j]:
  10. t[i:j+1] = reversed(t[i:j+1])
  11. break
  12. return distance
  13. data = """
  14. 8 6 7 9 4 1 3 10 2 5
  15. 8 2 7 6 9 1 5 3 10 4
  16. """
  17. data = [line for line in data.splitlines() if line]
  18. # print(data)
  19. ss = Solution()
  20. for i in range(0, len(data), 2):
  21. s = list(map(int, data[i].split()))
  22. t = list(map(int, data[i+1].split()))
  23. print(s, t)
  24. print(ss.reversalDistance(s, t))

但是上面输出 7 是错误的!

  1. s : [8, 6, 7, 9, 4, 1, 3, 10, 2, 5]
  2. t1: [8, 2, 7, 6, 9, 1, 5, 3, 10, 4]
  3. |-----|
  4. t2: [8, 6, 7, 2, 9, 1, 5, 3, 10, 4]
  5. s : [8, 6, 7, 9, 4, 1, 3, 10, 2, 5]
  6. t1: [8, 6, 7, 2, 9, 1, 5, 3, 10, 4]
  7. |--|
  8. t2: [8, 6, 7, 9, 2, 1, 5, 3, 10, 4]
  9. s : [8, 6, 7, 9, 4, 1, 3, 10, 2, 5]
  10. t1: [8, 6, 7, 9, 2, 1, 5, 3, 10, 4]
  11. |---------------|
  12. t2: [8, 6, 7, 9, 4, 10, 3, 5, 1, 2]
  13. s : [8, 6, 7, 9, 4, 1, 3, 10, 2, 5]
  14. t1: [8, 6, 7, 9, 4, 10, 3, 5, 1, 2]
  15. |---------|
  16. t2: [8, 6, 7, 9, 4, 1, 5, 3, 10, 2]
  17. s : [8, 6, 7, 9, 4, 1, 3, 10, 2, 5]
  18. t1: [8, 6, 7, 9, 4, 1, 5, 3, 10, 2]
  19. |--|
  20. t2: [8, 6, 7, 9, 4, 1, 3, 5, 10, 2]
  21. s : [8, 6, 7, 9, 4, 1, 3, 10, 2, 5]
  22. t1: [8, 6, 7, 9, 4, 1, 3, 5, 10, 2]
  23. |---|
  24. t2: [8, 6, 7, 9, 4, 1, 3, 10, 5, 2]
  25. s : [8, 6, 7, 9, 4, 1, 3, 10, 2, 5]
  26. t1: [8, 6, 7, 9, 4, 1, 3, 10, 5, 2]
  27. |--|
  28. t2: [8, 6, 7, 9, 4, 1, 3, 10, 2, 5]

单源 BFS

  1. from typing import List
  2. class Solution:
  3. def reversalDistance(self, s: List[int], t: List[int]) -> int:
  4. if s == t: return 0
  5. s, t = tuple(s), tuple(t)
  6. # 1. 单源 BFS
  7. from collections import deque
  8. distance = 0
  9. visited = {s, } # 图节点从源节点翻转次数
  10. q = deque([s, ]) # 两个队列一起变换
  11. while q:
  12. distance += 1 # 进队列就多义词转换
  13. for _ in range(len(q)):
  14. s = q.popleft() # 弹出图节点
  15. # 2. 枚举当前序列所有翻转可能
  16. for start in range(len(s)):
  17. for end in range(start + 1, len(s)):
  18. ss = s[:start] + s[start:end+1][::-1] + s[end+1:]
  19. if ss == t:
  20. return distance
  21. if ss not in visited:
  22. visited.add(ss)
  23. q.append(ss)
  24. return -1
  25. data = """
  26. 1 2 3 4 5 6 7 8 9 10
  27. 3 1 5 2 7 4 9 6 10 8
  28. 3 10 8 2 5 4 7 1 6 9
  29. 5 2 3 1 7 4 10 8 6 9
  30. 8 6 7 9 4 1 3 10 2 5
  31. 8 2 7 6 9 1 5 3 10 4
  32. 3 9 10 4 1 8 6 7 5 2
  33. 2 9 8 5 1 7 3 4 6 10
  34. 1 2 3 4 5 6 7 8 9 10
  35. 1 2 3 4 5 6 7 8 9 10
  36. """
  37. data = [line for line in data.splitlines() if line]
  38. ss = Solution()
  39. for i in range(0, len(data), 2):
  40. s = list(map(int, data[i].split()))
  41. t = list(map(int, data[i+1].split()))
  42. print(s, t)
  43. print(ss.reversalDistance(s, t))

双源 BFS

  1. from typing import List
  2. class Solution:
  3. def reversalDistance(self, s: List[int], t: List[int]) -> int:
  4. if s == t: return 0
  5. s, t = tuple(s), tuple(t)
  6. # 1. 双源 BFS
  7. from collections import deque
  8. distance = 0
  9. d1, d2 = {s: 0}, {t: 0} # 图节点从源节点翻转次数
  10. q1, q2 = deque([s, ]), deque([t, ]) # 两个队列一起变换
  11. while q1 or q2:
  12. distance += 1 # 进队列就多义词转换
  13. for _ in range(len(q1)):
  14. s = q1.popleft() # 弹出图节点
  15. # 2. 枚举当前序列所有翻转可能
  16. for start in range(len(s)):
  17. for end in range(start + 1, len(s)):
  18. ss = s[:start] + s[start:end+1][::-1] + s[end+1:]
  19. if ss in d2:
  20. return distance + d2[ss]
  21. if ss not in d1:
  22. d1[ss] = distance
  23. q1.append(ss)
  24. for _ in range(len(q2)):
  25. t = q2.popleft() # 弹出图节点
  26. # 2. 枚举当前序列所有翻转可能
  27. for start in range(len(t)):
  28. for end in range(start + 1, len(t)):
  29. tt = t[:start] + t[start:end+1][::-1] + t[end+1:]
  30. if tt in d1:
  31. return distance + d1[tt]
  32. if tt not in d2:
  33. d2[tt] = distance
  34. q2.append(tt)
  35. return -1
  36. data = """
  37. 1 2 3 4 5 6 7 8 9 10
  38. 3 1 5 2 7 4 9 6 10 8
  39. 3 10 8 2 5 4 7 1 6 9
  40. 5 2 3 1 7 4 10 8 6 9
  41. 8 6 7 9 4 1 3 10 2 5
  42. 8 2 7 6 9 1 5 3 10 4
  43. 3 9 10 4 1 8 6 7 5 2
  44. 2 9 8 5 1 7 3 4 6 10
  45. 1 2 3 4 5 6 7 8 9 10
  46. 1 2 3 4 5 6 7 8 9 10
  47. """
  48. data = [line for line in data.splitlines() if line]
  49. ss = Solution()
  50. for i in range(0, len(data), 2):
  51. s = list(map(int, data[i].split()))
  52. t = list(map(int, data[i+1].split()))
  53. print(s, t)
  54. print(ss.reversalDistance(s, t))

速度对比:

  1. b07@SB:~$ time python ./test.py
  2. [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] [3, 1, 5, 2, 7, 4, 9, 6, 10, 8]
  3. 9
  4. [3, 10, 8, 2, 5, 4, 7, 1, 6, 9] [5, 2, 3, 1, 7, 4, 10, 8, 6, 9]
  5. 4
  6. [8, 6, 7, 9, 4, 1, 3, 10, 2, 5] [8, 2, 7, 6, 9, 1, 5, 3, 10, 4]
  7. 5
  8. [3, 9, 10, 4, 1, 8, 6, 7, 5, 2] [2, 9, 8, 5, 1, 7, 3, 4, 6, 10]
  9. 7
  10. [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
  11. 0
  12. real 3m7.231s
  13. user 3m6.420s
  14. sys 0m0.810s
  15. b07@SB:~$ time python ./test.py
  16. [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] [3, 1, 5, 2, 7, 4, 9, 6, 10, 8]
  17. 9
  18. [3, 10, 8, 2, 5, 4, 7, 1, 6, 9] [5, 2, 3, 1, 7, 4, 10, 8, 6, 9]
  19. 4
  20. [8, 6, 7, 9, 4, 1, 3, 10, 2, 5] [8, 2, 7, 6, 9, 1, 5, 3, 10, 4]
  21. 5
  22. [3, 9, 10, 4, 1, 8, 6, 7, 5, 2] [2, 9, 8, 5, 1, 7, 3, 4, 6, 10]
  23. 7
  24. [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
  25. 0
  26. real 0m2.132s
  27. user 0m2.072s
  28. sys 0m0.060s

路径求解

我们知道上面翻转次数,但是如何求解最优路径呢?

  1. 9
  2. 01 02 03 04 05 06 07 08 09 10
  3. ^^ ^^ 0 1
  4. 02 01 03 04 05 06 07 08 09 10
  5. ^^ ^^ ^^ 0 2
  6. 03 01 02 04 05 06 07 08 09 10
  7. ^^ ^^ 2 3
  8. 03 01 04 02 05 06 07 08 09 10
  9. ^^ ^^ ^^ 2 4
  10. 03 01 05 02 04 06 07 08 09 10
  11. ^^ ^^ 4 5
  12. 03 01 05 02 06 04 07 08 09 10
  13. ^^ ^^ ^^ 7 9
  14. 03 01 05 02 06 04 07 10 09 08
  15. ^^ ^^ ^^ 5 7
  16. 03 01 05 02 06 10 07 04 09 08
  17. ^^ ^^ ^^ ^^ 5 8
  18. 03 01 05 02 06 09 04 07 10 08
  19. ^^ ^^ ^^ ^^ 4 7
  20. 03 01 05 02 07 04 09 06 10 08
  21. 4
  22. 03 10 08 02 05 04 07 01 06 09
  23. ^^ ^^ ^^ ^^ 1 4
  24. 03 05 02 08 10 04 07 01 06 09
  25. ^^ ^^ ^^ ^^ ^^ 3 7
  26. 03 05 02 01 07 04 10 08 06 09
  27. ^^ ^^ ^^ 0 2
  28. 02 05 03 01 07 04 10 08 06 09
  29. ^^ ^^ 0 1
  30. 05 02 03 01 07 04 10 08 06 09
  31. 5
  32. 08 06 07 09 04 01 03 10 02 05
  33. ^^ ^^ ^^ ^^ ^^ ^^ 3 8
  34. 08 06 07 02 10 03 01 04 09 05
  35. ^^ ^^ ^^ 1 3
  36. 08 02 07 06 10 03 01 04 09 05
  37. ^^ ^^ ^^ 7 9
  38. 08 02 07 06 10 03 01 05 09 04
  39. ^^ ^^ 6 7
  40. 08 02 07 06 10 03 05 01 09 04
  41. ^^ ^^ ^^ ^^ ^^ 4 8
  42. 08 02 07 06 09 01 05 03 10 04
  43. 7
  44. 03 09 10 04 01 08 06 07 05 02
  45. ^^ ^^ ^^ 0 2
  46. 10 09 03 04 01 08 06 07 05 02
  47. ^^ ^^ ^^ ^^ ^^ ^^ 0 5
  48. 08 01 04 03 09 10 06 07 05 02
  49. ^^ ^^ ^^ ^^ ^^ ^^ ^^ 1 7
  50. 08 07 06 10 09 03 04 01 05 02
  51. ^^ ^^ ^^ ^^ 1 4
  52. 08 09 10 06 07 03 04 01 05 02
  53. ^^ ^^ ^^ 4 6
  54. 08 09 10 06 04 03 07 01 05 02
  55. ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ 2 9
  56. 08 09 02 05 01 07 03 04 06 10
  57. ^^ ^^ ^^ 0 2
  58. 02 09 08 05 01 07 03 04 06 10
  59. 0

请见下一个题目