缘起

最近阅读<<我的第一本算法书>>(【日】石田保辉;宫崎修一)
本系列笔记拟采用golang练习之

归并排序

  1. 归并排序算法会把序列分成长度相同的两个子序列,
  2. 当无法继续往下分时(也就是每个子序列中只有一个数据时),
  3. 就对子序列进行归并。
  4. 归并指的是把两个排好序的子序列合并成一个有序序列。
  5. 该操作会一直重复执行,直到所有子序列都归并为一个整体为止。
  6. 总的运行时间为O(nlogn),这与前面讲到的堆排序相同。
  7. 摘自 <<我的第一本算法书>> 【日】石田保辉;宫崎修一

流程

  1. 给定待排序数组data[N]
  2. 创建缓冲区buffer[N]
  3. 设定初始的归并步长(已排序的子序列长度), span=1, 也就是将每个元素视作已排序的一个子序列
  4. 根据当前步长, 将所有已排序的子序列两两合并到缓冲区
    1. 设定q1为子序列1的头部下标, q2为子序列2的头部下标, p指向缓冲区的对应区域
    2. 比较q1与q2位置的值大小
    3. 如果q1的值小, 则取q1的值, 放入p, 然后q1和p都加1
    4. 如果q2的值小, 则取q2的值, 放入p, 然后q2和p都加1
    5. 如果子序列已经取完, 则复制尚未取完的子序列到p
  5. 交换data与buffer的指针, 将data视为缓冲区, buffer视为待归并数据
  6. span = span*2
  7. 重复步骤3-6, 直到span>N

设计

  • ISorter: 定义排序器接口. 定义值比较函数以兼容任意数值类型, 通过调整比较函数实现倒序排序
  • tMergeSort: 归并排序器, 实现ISorter接口.

    单元测试

    merge_sort_test.go. 归并排序是比较快的, 因此设定测试规模为10万元素. ```go package sorting

import ( “fmt” “learning/gooop/sorting” “learning/gooop/sorting/merge_sort” “math/rand” “testing” “time” )

func Test_MergeSort(t *testing.T) { fnAssertTrue := func(b bool, msg string) { if !b { t.Fatal(msg) } }

  1. reversed := false
  2. fnCompare := func(a interface{}, b interface{}) sorting.CompareResult {
  3. i1 := a.(int)
  4. i2 := b.(int)
  5. if i1 < i2 {
  6. if reversed {
  7. return sorting.GREATER
  8. } else {
  9. return sorting.LESS
  10. }
  11. } else if i1 == i2 {
  12. return sorting.EQUAL
  13. } else {
  14. if reversed {
  15. return sorting.LESS
  16. } else {
  17. return sorting.GREATER
  18. }
  19. }
  20. }
  21. fnTestSorter := func(sorter sorting.ISorter) {
  22. reversed = false
  23. // test simple array
  24. samples := []interface{} { 2,3,1,5,4,7,6 }
  25. samples = sorter.Sort(samples, fnCompare)
  26. fnAssertTrue(fmt.Sprintf("%v", samples) == "[1 2 3 4 5 6 7]", "expecting 1,2,3,4,5,6,7")
  27. t.Log("pass sorting [2 3 1 5 4 7 6] >> [1 2 3 4 5 6 7]")
  28. // test 10000 items sorting
  29. rnd := rand.New(rand.NewSource(time.Now().UnixNano()))
  30. for plus := 0;plus < 3;plus++ {
  31. sampleCount := 100 * 1000 + plus
  32. t.Logf("prepare large array with %v items", sampleCount)
  33. samples = make([]interface{}, sampleCount)
  34. for i := 0; i < sampleCount; i++ {
  35. samples[i] = rnd.Intn(sampleCount * 10)
  36. }
  37. t.Logf("sorting large array with %v items", sampleCount)
  38. t0 := time.Now().UnixNano()
  39. samples = sorter.Sort(samples, fnCompare)
  40. cost := time.Now().UnixNano() - t0
  41. for i := 1; i < sampleCount; i++ {
  42. fnAssertTrue(fnCompare(samples[i-1], samples[i]) != sorting.GREATER, "expecting <=")
  43. }
  44. t.Logf("end sorting large array, cost = %v ms", cost/1000000)
  45. }
  46. // test 0-20
  47. sampleCount := 20
  48. t.Log("sorting 0-20")
  49. samples = make([]interface{}, sampleCount)
  50. for i := 0;i < sampleCount;i++ {
  51. for {
  52. p := rnd.Intn(sampleCount)
  53. if samples[p] == nil {
  54. samples[p] = i
  55. break
  56. }
  57. }
  58. }
  59. t.Logf("unsort = %v", samples)
  60. samples = sorter.Sort(samples, fnCompare)
  61. t.Logf("sorted = %v", samples)
  62. fnAssertTrue(fmt.Sprintf("%v", samples) == "[0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19]", "expecting 0-20")
  63. t.Log("pass sorting 0-20")
  64. // test special
  65. samples = []interface{} {}
  66. samples = sorter.Sort(samples, fnCompare)
  67. t.Log("pass sorting []")
  68. samples = []interface{} { 1 }
  69. samples = sorter.Sort(samples, fnCompare)
  70. t.Log("pass sorting [1]")
  71. samples = []interface{} { 3,1 }
  72. samples = sorter.Sort(samples, fnCompare)
  73. fnAssertTrue(fmt.Sprintf("%v", samples) == "[1 3]", "expecting 1,3")
  74. t.Log("pass sorting [1 3]")
  75. reversed = true
  76. samples = []interface{} { 2, 3,1 }
  77. samples = sorter.Sort(samples, fnCompare)
  78. fnAssertTrue(fmt.Sprintf("%v", samples) == "[3 2 1]", "expecting 3,2,1")
  79. t.Log("pass sorting [3 2 1]")
  80. }
  81. t.Log("\ntesting MergeSort")
  82. fnTestSorter(merge_sort.MergeSort)

}

  1. <a name="rnGi1"></a>
  2. # 测试输出
  3. - 归并排序相当的快, 比堆排序还快1倍左右
  4. - 比冒泡,选择,插入等有指数级的提升, 符合理论分析
  5. - 代价与堆排序一致, 需要额外分配大小为N的空间, 用做归并缓冲区
  6. - 比堆排序快的主要原因, 推测是因为堆排序初始化时, 批量push操作, 可能引发多次数组扩容.

$ go test -v merge_sort_test.go === RUN Test_MergeSort merge_sort_test.go:111: testing MergeSort merge_sort_test.go:48: pass sorting [2 3 1 5 4 7 6] >> [1 2 3 4 5 6 7] merge_sort_test.go:54: prepare large array with 100000 items merge_sort_test.go:60: sorting large array with 100000 items merge_sort_test.go:67: end sorting large array, cost = 35 ms merge_sort_test.go:54: prepare large array with 100001 items merge_sort_test.go:60: sorting large array with 100001 items merge_sort_test.go:67: end sorting large array, cost = 36 ms merge_sort_test.go:54: prepare large array with 100002 items merge_sort_test.go:60: sorting large array with 100002 items merge_sort_test.go:67: end sorting large array, cost = 33 ms merge_sort_test.go:72: sorting 0-20 merge_sort_test.go:83: unsort = [6 10 8 9 14 1 12 4 19 7 11 16 15 17 0 2 18 3 5 13] merge_sort_test.go:86: sorted = [0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19] merge_sort_test.go:88: pass sorting 0-20 merge_sort_test.go:93: pass sorting [] merge_sort_test.go:97: pass sorting [1] merge_sort_test.go:102: pass sorting [1 3] merge_sort_test.go:108: pass sorting [3 2 1] —- PASS: Test_MergeSort (0.12s) PASS ok command-line-arguments 0.127s

  1. <a name="o3aWp"></a>
  2. # ISorter.go
  3. 定义排序器接口. 定义值比较函数以兼容任意数值类型, 通过调整比较函数实现倒序排序
  4. ```go
  5. package sorting
  6. type ISorter interface {
  7. Sort(data []interface{}, comparator CompareFunction) []interface{}
  8. }
  9. type CompareFunction func(a interface{}, b interface{}) CompareResult
  10. type CompareResult int
  11. const LESS CompareResult = -1
  12. const EQUAL CompareResult = 0
  13. const GREATER CompareResult = 1

tMergeSort.go

归并排序器, 实现ISorter接口.

  1. package merge_sort
  2. import (
  3. "learning/gooop/sorting"
  4. )
  5. type tMergeSort struct {}
  6. func newMergeSort() sorting.ISorter {
  7. return &tMergeSort{}
  8. }
  9. func (me *tMergeSort) Sort(data []interface{}, comparator sorting.CompareFunction) []interface{} {
  10. if data == nil {
  11. return nil
  12. }
  13. size := len(data)
  14. if size <= 1 {
  15. return data
  16. }
  17. var result []interface{} = nil
  18. buffer := make([]interface{}, size)
  19. for span := 1; span <= size;span *= 2 {
  20. for i := 0;i < size;i += span * 2 {
  21. merge(size, data, i, i + span, span, buffer, i, comparator)
  22. }
  23. result = buffer
  24. data, buffer = buffer, data
  25. }
  26. if result == nil {
  27. result = data
  28. }
  29. return result
  30. }
  31. // 合并data数组中的两个子序列: [q1:q1+span), [q2:q2+span), 到目标数组result的offset位置
  32. func merge(size int, data []interface{}, q1 int, q2 int, span int, result []interface{}, offset int, comparator sorting.CompareFunction) {
  33. e1 := min(q1 + span, size)
  34. e2 := min(q2 + span, size)
  35. j := -1
  36. k := -1
  37. for i := 0;i < span*2;i++ {
  38. if q1 >= e1 {
  39. j = q2
  40. k = e2
  41. } else if q2 >= e2 {
  42. j = q1
  43. k = e1
  44. }
  45. if j >= 0 {
  46. for p := j;p < k;p++ {
  47. result[offset] = data[p]
  48. offset++
  49. }
  50. break
  51. }
  52. v1 := data[q1]
  53. v2 := data[q2]
  54. if lessEqual(v1, v2, comparator) {
  55. result[offset] = v1
  56. q1++
  57. } else {
  58. result[offset] = v2
  59. q2++
  60. }
  61. offset++
  62. }
  63. }
  64. func lessEqual(v1 interface{}, v2 interface{}, comparator sorting.CompareFunction) bool {
  65. return comparator(v1, v2) != sorting.GREATER
  66. }
  67. func min(a,b int) int {
  68. if a <= b {
  69. return a
  70. } else {
  71. return b
  72. }
  73. }
  74. var MergeSort = newMergeSort()

(end)